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Foreword 


Accurate  and  timely  global  crop  forecasts  are  of  the  utmost  importance  to  all  countries  in 
managing  their  most  important  resource  — food.  Information  about  agricultural  production 
is  crucial  to  a wide  range  of  decisions  by  agribusiness,  policymakers,  resource  planners,  and 
agriculture  technologists.  Decisions  made  on  the  basis  of  inadequate  information  regarding 
the  food  supply,  its  distribution,  and  the  expectation  from  new  harvests  can  have  severe  eco- 
nomic  and  social  impact. 

The  United  States  is  a major  partner  in  a large  and  interdependent  worldwide  network  of 
importing  and  exporting  nations.  The  United  States  is  the  foremost  grain  exporter  to  a global 
market  increasing  to  meet  an  expanding  world  demand.  The  resupply  from  new  harvests  is 
extremely  variable  within  each  year  and  between  years.  The  world’s  most  important  food 
grain,  wheat,  is  in  the  process  of  being  planted  and  harvested  in  different  regions  of  the 
world  throughout  each  year.  This  crop  is  grown  mostly  in  semiarid  regions  that  have 
marginal  weather  conditions,  where  disaster  years  followed  by  years  of  bumper  crops  are 
common.  Although  such  organizations  as  the  United  Nations  Food  and  Agriculture 
Organization  (FAO)  and  the  United  States  Department  of  Agriculture  (USDA)  were  char- 
tered to  provide  information  on  global  food  production,  their  reports  have  been  heavily 
reliant  on  information  generated  by  the  countries  themselves.  This  information  is  derived 
from  crop  survey  systems  that  are  often  inadequate  or,  in  some  cases,  nonexistent. 

Aerospace  remote-sensing  technology  emerging  from  several  decades  of  research  is 
beginning  to  provide  a means  to  economically  provide  better  crop  forecasts. 

In  1974,  the  Large  Area  Crop  Inventory  Experiment  (LACIE)  — a joint  effort  of  the  Na- 
tional Aeronautics  and  Space  Administration  (NASA),  the  USDA,  and  the  National 
Oceanic  and  Atmospheric  Administration  (NOAA)  — began  to  apply  this  technology  on  an 
experimental  basis  to  forecasting  harvests  in  important  wheat-production  areas.  Following 
completion  of  the  analysis  of  data  acquired  over  3 global  crop  years,  the  results  were  docu- 
mented and  reported  in  a 4-day  symposium  held  in  October  1978  at  the  NASA  Johnson 
Space  Center  in  Houston,  Texas.  Prior  to  the  symposium,  a team  consisting  of  approx- 
imately 40  independent  university,  industry,  and  government  scientists  and  researchers  as- 
sembled periodically  from  March  through  July  1978  at  the  Johnson  Space  Center  to  review 
the  LACIE  results  in  considerable  detail.  These  peer-review  results  were  also  reported  at  the 
symposium. 

The  material  contained  in  this  document  consists  of  the  proceedings  of  the  technical  ses- 
sions of  that  symposium.  The  overview  and  peer-review  papers  are  published  in  other 
volumes. 


July  1979 


ROBERT  B.  MACDONALD 
Man.  ger,  LACIE  Project 
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Experiment  Design 


FOREWORD 

In  general,  experiment  design  refers  to  the  design 
used  to  ensure  that  information  collected  in  an  ex- 
periment will  be  relevant  to  the  problems  under  in- 
vestigation. It  is  the  complete  sequence  of  steps 
taken  ahead  of  time  to  ensure  that  appropriate  data 
will  be  obtained  to  permit  valid  inferences  concern- 
ing these  problems,  ideally,  an  experiment  design 
should  arrange  the  data  collection  and  analysis  to 
answer  the  expc  imental  questions  as  efficiently  as 
possible.  The  principles  of  good  experimental  design 
were  given  heavy  emphasis  in  LACIE  planning  ac- 
tivities, and  every  effort  was  made  to  team  as  much 
as  possible  given  the  available  time,  money,  person- 
nel, and  experimental  material. 

The  general  theme  of  the  LACIE  design  consisted 
of  (l)  identifying  the  state-of-the-art  technology  that 
would  permit  inventory  of  crops  using  satellite  and 
meteorological  data,  (2)  making  necessary  tests  and 
evaluations  to  develop  procedures  that  used  state-of- 
the-art  technology  and  to  determine  how  well  these 
procedures  worked,  and  (3)  subjecting  the  designed 
system  to  use  in  a quasi-operational  environment  for 
final  performance  assessment  and  identification  of 
needed  refinements  and  improvements. 

The  LACIE  design  activities  were  structured  into 
five  major  technical  components:  (1)  sampling  and 
aggregation,  (2)  growth  stage  estimation,  (3) 
classification  and  mensuration,  (4)  yield  estimation, 
and  (S)  accuracy  assessment. 

The  complexity  and  interdependency  of  these 
components  and  their  supporting  systems  necessi- 
tated that  a major  part  of  the  overall  design  be  com- 
posed of  the  structuring  and  dovetailing  of  the  con- 
stituent parts  into  a quasi-operational  system  that 
supported  the  LACIE  objectives  while  making  every 
effort  to  conserve  time,  money,  personnel,  and  ex- 
perimental material.  The  purpose  of  the  Experiment 
Design  Session  is  to  detail  the  LACIE  technical 


design,  including  the  relationships  between  compo- 
nents and  (for  each  component)  (I)  the  initial  state- 
of-the-art  methodology  in  LACIE,  (2)  the  test  and 
evaluation  procedures  used  to  identify  or  improve 
the  state  of  the  art,  (3)  the  performance  assessment 
procedures  applied  to  the  overall  system,  and  (4)  the 
chronology  of  the  state  of  the  art  as  it  evolved  during 
the  3 years  of  LACIE. 


lampllng  and  Aggregation 

Throughout  LACIE,  the  only  feasible  cost-effec- 
tive way  to  meet  project  objectives  was  to  determine 
total  wheat  production  over  an  area  by  looking  at 
only  a subset  of  the  area.  Consequently,  LACIE  tech- 
nology drew  heavily  from  statistical  survey 
methodology  (supported  by  a broad  base  of  remote- 
sensing technology).  The  paper  by  Hallum  et  al.  en- 
titled “Sampling,  Aggregation,  and  Variance  Estima- 
tion for  Area,  Yield,  and  Production  in  LACIE"  pro- 
vides an  overview  of  the  LACIE  sampling  tech- 
nology used  in  a quasi-operational  mode  throughout 
Phases  I,  II,  and  III.  A general  description,  the  ra- 
tionale, design  restrictions,  and  other  design  charac- 
teristics are  given  for  the  sampling  design  and  the  ag- 
gregation procedures  for  estimating  wheat  area, 
yield,  and  production,  along  with  an  overview  of 
their  associated  prediction  error  estimates.  Specific 
details  are  provided  in  the  supporting  papers. 


Crop  Dovolopmont  Stag#  Estimation 

In  the  early  preparation  for  LACIE,  it  was  ap- 
parent that  year-to-year  variations  in  the  seasons 
matte  the  use  of  unadjusted  crop  calendars  to  dis- 
tinguish whea’  from  other  crops  a questionable  pro- 
cedure. It  was  further  recognized  that  because  yields 
could  be  drastically  affected  by  unusual  events  at 


Yield  estimation 


critical  times  in  wheat  development  (i.e.,  high  tern* 
peratures  at  heading)  yield  models  to  be  developed 
would  moat  likely  require  a good  estimation  of  the 
true  or  actual  development  stage  of  the  crop 
throughout  the  year.  The  paper  by  Whitehead  and 
Phinney  entitled  “Growth  Stage  Estimation"  pro- 
vides an  overview  of  the  LACIE  growth  stage 
estimation  technology,  including  a general  descrip- 
tion, the  rationale,  design  restrictions,  and  other 
design  characteristics. 

Claaalftnatlon  and  llwnaiif  tlofi 

One  of  the  LACIE  goals  was  to  estimate  wheat 
acreage  using  Landsat  as  the  primary  data  source  and 
without  using  ground  enumerative  data.  A funda- 
mental approach  resulting  from  the  LACIE  design 
uses  a machine  classification  technique  to  separate 
the  wheat  area  in  each  of  a number  of  5-  by  6-nauti- 
cal-mile segments.  It  was  apparent  at  the  outset  of 
LACIF  that  a limited  amount  of  manual  interpreta- 
tion would  be  required  for  the  method  to  work.  The 
paper  by  Heydorn  et  al.  entitled  “Classification  and 
Mensuration  of  LACIE  Segments"  provides  the 
details  of  that  part  of  the  LACIE  design.  A vastly  im- 
proved technology  for  the  classification  of  complex 
data  structures  inherent  in  multidate  acquisition  of 
multispectral  data  has  evolved  from  use  of  this  ap- 
proach. The  major  result  of  this  evolution  is  the 
availability  of  a nearly  optimum  man/machine-proc- 
essing  procedure. 


The  piper  by  Slrommcn  et  al.  entitled  “Develop- 
ment of  LACIE  CCEA-I  Wcathcr/Whcat  Yield 
Models"  provides  details  of  the  design  used  to  iden- 
tify and  evaluate  yield  estimation  models  oriented 
toward  supporting  project  objectives.  Included  is  a 
discussion  of  the  rationale,  the  design  restrictions, 
and  the  chronology  of  the  evolving  yield  model 
development  during  Phases  I,  II,  and  III  of  LACIE. 

Accuracy  Assaeemant 

An  important  function  in  LACIE  is  the  evalua- 
tion of  results  obtained  at  various  stages  of  the  ex- 
periment. The  objective  of  LACIE  is  not  only  to 
demonstrate  the  technological  feasibility  for  estimat- 
ing large-area  wheat  production  using  the  LACIE  ap- 
proach but  also  to  produce  estimates  which  satisfy 
certain  accuracy  and  reliability  goals.  The  accuracy 
assessment  effort  is  designed  to  check  the  accuracy 
of  the  produm  of  the  experimental  operations 
throughout  th*crop  growing  season  and  to  deter- 
mine whether  the  procedures  used  are  adequate  to 
accomplish  the  desired  accuracy  and  reliability  goals. 
The  paper  by  Houston  et  al.  entitled  "Accuracy 
Assessment:  The  Statistical  Approach  to  Perform- 
ance Evaluation  > LACIE"  describes  the 
methodology  tor  dressing  the  accuracy  for  area, 
yield,  and  production. 
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CROP  INVENTORY:  A STATISTICAL  SURVEY 


Classical  Characteristics 

Limited  resources,  a strong  demand  for  breadth 
and  timeliness  of  coverage,  and  recent  advances  in 
sample  survey  methodology  are  but  a few  reasons 
that  crop  inventory  efforts  have  become  heavily 
reliant  on  statistical  survey  methodology.  This  fact 
comes  as  no  surprise  since  the  task  in  the  majority  of 
such  efforts  1$  to  determine  the- total  crop  area  from 
information  obtained  over  a su^t  of  the  area — a 
special  case  of  the  classical  definition  of  a sample 
survey  (ref.  I ).  In  this  case,  two  questions  arise:  (I) 
how  to  select  the  "part”  from  the  "whole"  and  (2) 
how  to  generalize  from  the  selected  part  to  the 
whole.  The  problem  is  one  of  finding  that  combina- 
tion of  selection  and  estimation  procedures  which 
minimizes  the  cost,  ensuring  at  the  same  time  a 
specified  accuracy  for  the  inference  from  a part  to 
the  whole. 

Until  the  last  40  years,  little  attention  had  been 
given  to  the  problems  of  how  to  obtain  a good  sample 
and  how  to  draw  sound  conclusions  from  the  results. 
If  the  distribution  from  which  one  is  sampling  is 
uniform,  then  practically  any  sample  will  suffice; 
however,  in  the  case  of  a crop  inventory  where  the 
distribution  is  far  from  uniform,  the  method  by 
which  the  sample  is  obtained  is  critical,  and  the  study 
of  techniques  that  ensure  a trustworthy  sample 
becomes  extremely  important. 

In  some  cases,  it  may  seem  feasible  to  obtain  the 
information  desired  about  a particular  population  by 
taking  a complete  enumeration  or  census.  Adminis- 
trators accustomed  to  dealing  with  censuses  tended 
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to  be  suspicious  of  samples  and  reluctant  to  use  them 
in  place  of  censuses.  Although  this  attitude  no  longer 
persists,  it  would  be  worthwhile  to  list  the  principal 
advantages  of  sampling  as  compared  with  complete 
enumeration. 

1.  Reduced  cost— Securing  data  from  a small  frac- 
tion of  the  population  costs  less  than  making  a com- 
plete enumeration. 

2.  Greater  speed — Data  can  be  collected  and  sum- 
marized more  quickly  with  a sample  than  with  a cen- 
sus. 

3.  Greater  scope — Surveys  that  rely  on  sampling 
have  more  scope  and  flexibility  regarding  the  types 
of  information  that  can  be  obtained;  the  area  of 
coverage  can  be  more  extensive  than  in  a census. 

4.  Greater  accuracy — Because  personnel  of  higher 
quality  can  be  employed  and  given  intensive  training 
and  because  more  careful  supervision  of  the  field- 
work anb  processing  of  results  becomes  feasible 
when  the  volume  of  work  is  reduced,  a sampic  may 
actually  produce  more  accurate  results  than  a com- 
plete enumeration. 

Upon  examining  the  various  steps  required  to  per- 
form a sample  survey  net.  1),  it  becomes  quite  clear 
that  sampling  is  a practical  business  that  calls  for 
several  different  types  of  skills.  In  making  a crop  in- 
ventory, before-  sampling  theory  can  be  applied,  it  is 
necessary  determine  which  crops  are  to  be  con- 
sidered, which  geographical  areas  arc  to  be  surveyed, 
how  data  measurements  arc  to  be  made,  and  how 
fieldwork  is  to  be  organized.  Although  these  topics 
are  not  discussed  further  in  this  piljf&r,  their  impor- 
tance should  be  emphasized.  Sampling  demands  at- 
tention to  all  phases  of  the  activity — poor  work  in 
one  phase  may  ruin  a survey  in  which  everything 
else  is  done  well. 

The  purpose  of  sampling  theory  is  to  make  sam- 
pling more  efficient  It  attempts  to  develop  methods 
of  sampic  selection  and  of  estimation  that  provide,  at 
minimum  cost,  estimates  that  are  precise  enough  to 
satisfy  project  objectives.  In  order  to^pply  this  prim 
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ciple,  one  must  be  able  to  predict  the  precision  and 
the  expected  cost.  So  Tar  as  precision  is  concerned, 
the  magnitude  or  estimation  error  in  any  specific 
situation  cannot  be  foretold  since  this  would  require 
a knowledge  of  the  true  value  for  the  population.  One 
of  the  more  standard  ways  of  judging  the  precision, 
however,  is  to  examine  the  frequency  distributions 
of  the  estimates  from  a sampling  procedure  that  has 
been  applied  repeatedly  to  the  same  population.  A 
further  simplification  may  be  introduced  in  situa- 
tions where  the  sample  sizes  are  large  enough  that 
there  is  good  reason  to  expect  that  sample  estimates 
are  approximately  normally  distributed  (e.g.,  for  ac- 
curacy assessment  purposes,  this  is  a key  assump- 
tion— and,  seemingly,  not  a bad  one — made  in  regard 
to  the  distribution  of  the  production  estimator  dis- 
cussed later  in  this  paper).  In  this  case,  the  frequency 
distribution  can  be  established  precisely  from  the 
mean  n and  the  standard  deviation  cr,  both  of  which 
can  be  estimated  from  sample  survey  theory.  If  a 
sample  is  taken  by  a procedure  known  to  give  an  un- 
biased estimate  A of  m with  standard  deviation  cr  *, 
then,  although  the  exact  value  of  the  error  A — m is 
unknown,  from  the  properties  of  the  normal  dis- 
tribution, the  chances  (probabilities)  are  0.32  (about 
1 in  3)  that  the  absolute  error  |A  — mI  exceeds  o-*; 
0.05  (1  in  20)  that  the  absolute  error  | A — mI  exceeds 
1.96  a »;  and  0.01  (1  in  100)  that  the  absolute  error  | A 
- mI  exceeds  2.58  <r*. 

The  preceding  discussion  assumes  <r*,  as  com- 
puted from  the  sample,  is  known  exactly.  Actually 
o-a,  like  m.  is  estimated  subject  to  a sampling  error, 
but  for  large  sample  sizes,  the  preceding  results  still 
hold. 

When  estimates  are  biased,  a useful  criterion  is  the 
mean  squareu  aror  (MSE)  of  the  estimate.  In  partic- 
ular, the  MSE  of  m,  denoted  by  MSE  (A)  is  given  by 

MSE(m'  = (variance  of  M ) + (bias)2  (1) 

In  statistical  terminology,  the  term  “precision” 
refers  to  the  repeatability  of  a measurement.  “Low 
precision"  means  that  there  is  wide  variation  in  re- 
peated measurements  of  the  same  object,  whereas 
“high  precision”  means  that  there  is  little  variation 
between  repeated  measurements.  “Accuracy,” 
however,  refers  to  the  MSE;  the  smaller  the  value  of 
the  MSE,  the  more  accurate  the  estimator.  Thus,  low 
accuracy  could  result  from  a large-bias  term  with 
either  low  or  high  precision,  or  from  a small-bias 
term  coupled  with  low  precision,  as  illustrated  in 


figure  I.  High  accuracy  results  when  the  bias  term  is 
small  or  zero  and  precision  is  high. 

In  the  case  of  L ACIE,  where  the  level  of  accuracy 
is  stipulated,  the  best  sample  design  is  that  for  which 
the  cost  of  the  survey  is  minimum.  However,  if  the 
cost  of  the  survey  is  specified,  the  best  sample  design 
is  that  which  gives  the  highest  accuracy.  This  rule  is 
the  guiding  principle  of  classical  sample  survey 
methodology. 


Overall  Phllosophy/Aapects 
Unique  to  LACIE 

The  first  systematic  attempt  to  collect  agricultural 
statistics  dates  back  more  than  a century  to  the  Cen- 
sus of  1840  (ref.  2).  From  that  date  forward,  an  in- 
creasing volume  of  agricultural  statistics  has  been 
collected  periodically  in  the  U.S.  Census  enumera- 
tions every  10  years  to  1920  and  every  5 years 
thereafter.  A rudimentary  system  of  annual 
agricultural  estimation  was  also  begun  about  1840  in 
the  Patent  Office.  Upon  Commissioner  Ellsworth's 
resignation  in  1845,  however,  interest  in  agricultural 
statistics  subsided  in  the  Patent  Office,  and  it  was  not 
until  after  the  U.S.  Department  of  Agriculture 
(USDA)  was  organized  in  1862  that  annual  intercen- 
sus estimates  were  again  revived  (ref.  3). 

Current  monthly  reports  on  crop  conditions  also 
predated  the  establishment  of  the  Department  of 
Agriculture  by  a few  months.  Orange  Judd,  editor  of 
the  American  Agriculturalist,  published  summaries 
of  crop  condition  reports  submitted  voluntarily  by 
subscribers  to  his  paper  for  5 months,  May  through 
September  1862  (ref,  3).  Judd's  efforts  were  the 
forerunner  to  the  USDA  program  of  monthly  reports 
on  crop  prospects.  These  reports  have  been  issued 


FKil  KK  I. — Graphical  representation  of  precision  and  ac- 
cttracj.  Circles  denute  estimates  from  repeated  trials  »ilh  the 
center  <if  the  taruet  representini!  the  location  of  the  true  quantitt 
bchif!  estimated. 
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regularly  during  (he  growing  season  since  the  first 
publication  in  July  1863, 

Since  1863,  crop  surveys  in  the  Department  of 
Agriculture  have  expanded  greatly  until  today  a large 
volume  of  agricultural  estimates  is  published  on  a 
periodic  basis,  1'his  s-t^'inii#'  expansion  in  the 
volume  of  agricultural  data  has  not  been  paralleled 
by  major  improvements  in  estimation  methods, 
which  is  somewhat  distressing  in  view  of  the  signifi- 
cant developments  in  the  theory  of  sample  design— 
particularly  in  the  past  40  years.  Until  recent  efforts 
of  the  Statistical  Reporting  Service  (SRS)  and 
LACIE,  the  typical  procedure  has  used  mailed 
inquiries  for  collecting  basic  data  and  an  assortment 
of  techniques  for  removing  bias  in  the  transforma- 
tion of  raw  data  into  published  estimates. 

A complete  evaluation  of  agricultural  statistics 
must  embrace  such  characteristics  as  breadth  of 
coverage,  geographical  detail  of  estimates,  timeliness, 
and  frequency  of  releases.  However,  the  criterion  of 
reliability  should  probably  have  top  priority  over  all 
other  criteria  of  evaluation.  The  choice  of  methods  of 
estimation  when  the  estimating  agency  is  faced  with 
limited  resources  and  a strong  demand  for  breadth 
and  timeliness  of  coverage  may  well  dictate  some 
sacrifice  of  precision  in  the  estimates. 

At  the  outset  of  LACIE,  remote-sensing  tech- 
nology without  supporting  ground  data  appeared  to 
offer  a cost-effective  approach  to  a global  crop 
estimation  system  that  could  provide  improved  in- 
formation to  USD  A and  NASA.  More  specifically, 
LACIE  was  the  first  attempt  to  survey  an  important 
crop  (wheat)  on  u large  (quasi-global)  scale  at  re- 
peated intervals  over  a wide  range  of  conditions. 

Given  the  project  objectives,  with  specific 
emphasis  on  large-area  estimation  and  the  relative 
importance  of  more  timely  and  more  accurate  esti- 
mates of  foreign  production,  the  LACIE  design  was 
restricted  to  readily  available  foreign  data  (e.g„  Land- 
sat  imagery,  meteorological  data.  National  Oceanic 
and  Atmospheric  Administration  (NOAA)  satellite 
data,  and  published  historical  data  for  political  sub- 
divisions within  countries).  In  addition,  the  I.AClb 
system  was  designed  within  a framework  of  con- 
straints originating  from  several  sources.  Examples 
of  these  constraints  are  the  acquisition  frequency 
restrictions  specified  b>  the  NASA  Goddard  Space 
fight  Center  (GSFC)  and  the  requirement  that  im- 
plemented classification  technology  be  used,  Addi- 
tional considerations  were  costs,  schedule 
milestones,  available  resources  for  system  imple- 
mentation, the  specified  performance  criterion,'  and 


the  volume  of  Landsat  data  that  could  be  stored  and 
processed. 

Some  of  she  questions  to  be  answered  at  the  outset 
of  LACIE  were 

1.  Can  a sampling  strategy  for  acquisition  of 
Landsat  data  be  designed  to  achieve  the  required  ac- 
curacy with  a manageable  data  load? 

2.  How  can  the  geographic  wheat  distribution 
characteristics  (e.g.,  within-strata  variances)  best  be 
determined  so  as  to  efficiently  sample? 

3.  What  is  a good  configuration  for  the  primary 
sampling  unit  and  what  should  the  sampling  frame 
be? 

4.  Does  loss  of  the  sampling  unit  due  to  cloud 
cover  cause  excessive  errors,  such  as  bias? 

To  develop  and  evaluate  the  LACIE  survey 
system,  the  experiment  was  planned  to  consider  first 
the  wheat-growing  regions  of  the  United  States, 
where  reliable,  independent  survey  estimates  and 
ground  data  would  be  available.  In  Phase  I,  major 
emphasis  was  devoted  to  identifying  significant 
problems  and  incorporating  necessary  changes  into 
the  on-line  system. 

To  simplify  the  explanation  of  the  LACIE  sam- 
pling and  aggregation  approach,  it  will  be  worthwhile 
at  this  point  to  define  the  hierarchical  structure  of 
the  units  into  which  each  country  is  subdivided. 
Each  country  is  considered  to  be  subdivided,  first  of 
all,  into  regions.  Regions  are  the  most  “coarse"  polit- 
ical subdivisions  of  a country  (e.g„  the  U.S,  Great 
Plains  (USGP)  is  a U.S.  "region”).  Regions  are 
further  subdivided  into  zones.  Zones  are  states,  for 
example,  in  the  United  States  and  subcollections  of 
oblasts  in  the  U.S.S.R.;  in  any  case,  they  are  elements 
of  the  regional-level  subdivisions.  Zones  are  further 
subdivided  into  strata.  In  the  United  States,  crop- 
reporting districts  (CRD's)  are  the  strata;  they  are 
subdivisions  of  the  states.  In  the  U.S.S.R.,  the  strata 
are  the  oblasts.  Finally,  in  countries  with  detailed 
historical  data  (i.e.,  in  countries  with  historical  data 
available  at  a level  below  the  strata),  the  strata  are 
further  subdivided  into  units  referred  to  as  substrata. 
No  further  subdivision  is  made  below  the  substrata 


' The  l At'll  performance  goal  for  accuracy  is  lo  obtain  coun- 
try at-harvest  production  estimates  which  arc  within  10  percent  of 
the  actual  production  90  percent  of  the  time  This  is  referred  to  as 
the  OO/qo  criterion  and  is  used  to  determine  the  L At'll'  error 
budget.  The  major  requirement  for  the  system  was  an  uhility  to 
produce  current  estimates  of  wheat  area,  yield, and  production  for 
selected  major  wheat-producing  regions  cm  a scheduled  basis 
throughout  the  crop  season. 


level.  Examples  of  substrata  are  counties  in  the 
Un  ted  States,  shires  in  Australia,  and  municipalities 
in  Canada.  Figure  2 depicts  the  overall  hierarchical 
st  uctutc. 

Based  on  known  constraints,  a total  allocation  of 
4800  sample  segments  was  divided  among  the 
selected  countries  using  the  criteria  and  procedures 
discussed  in  the  paper  by  Feiveson  entitled  "LACIE 
Sample  Design." 

Although  certain  engineering  constraints  affected 
the  implementation  of  the  initial  I.ACIE  sampling 
strategy,  the  evidence  examined  in  Phase  I indicated 
that  these  factors  did  not  significantly  affect  ac- 
curacy. The  majority  of  these  constraints  were 
removed  from  the  Phase  III  design  and  are  no  longer 
inherent  in  the  system.  Various  sampling  and  ag- 
gregation problems  encountered  throughout  Phases 
I,  II,  and  III  of  LACIE  are  discussed  later  in  this 
paper. 

In  summary,  LACIE’s  task  has  been  to  determine 
the  total  wheat  production  in  an  area  by  looking  at 
only  a subset  of  that  area;  consequently.  LACIE 
technology  draws  heavily  from  statistical  survey 
methodology  which,  in  turn,  is  supported  by  a broad 


base  of  remote-sensing  technology.  The  remainder  of 
this  paper  is  an  overview  of  the  initial  LACIE  sam- 
pling technology  used  in  a quasi-operational  mode 
throughout  Phases  1, 11,  and  III.  A general  descrip- 
tion will  be  given  of  the  sampling  scheme  and  the  ag- 
gregation procedures  for  estimating  wheat  area, 
yield,  and  production,  along  with  a brief  discussion 
of  their  associated  prediction  error  estimates. 
Specific  details  are  restricted  to  the  supporting  papers 
in  this  session. 


SAMPLING,  ESTIMATION,  AND 
AGGREGATION  FOR  AREA 

The  LACIE  sampling  technology  is  designed  to 
cost-effectively  estimate  wheat  area  and  production 
in  countries  of  interest  with  a predesignated  preci- 
sion level.  The  level  of  precision  is  dependent  on 
many  factors  including  (I)  the  configuration  and 
geographical  extent  of  the  basic  sampling  unit  (sam- 
ple segment),  (2)  the  sample  selection  procedure,  (3) 
the  number  and  distribution  of  the  sample  segments 
(i.e,  the  allocation  procedure),  and  (4)  the  aggrega- 
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lion  procedures  for  estimating  wheat  area,  yield,  and 
production.  The  following  sections  briefly  summar- 
ize the  m^ior  aspects  of  the  LACIE  sampling  tech- 
nology. 

Configuration  and  Geographical  Extent 
of  the  Sampling  Unit 

The  sampling  unit  used  in  LACIE  is  a 5-  by  6- 
nautical-mile  rectangle.  A key  factor  leading  to  this 
choice  originated  from  GSFC  engineering  con- 
straints. In  particular,  initial  considerations  were 
limited  to  areas  no  larger  than  25  miles  on  a side 
since  GSFC  could  not  register  areas  larger  than  this 
to  within  ± 1-pixel  accuracy.  Furthermore,  GSFC's 
ability  to  handle  a maximum  of  4800  segments 
placed  initial  restrictions  on  the  minimum  segment 
size  that  could  be  tolerated.  The  two  most  important 
considerations,  however,  leading  to  this  choice  are  at- 
tributable to  the  analysts'  needs  and  to  the  required 
sampling  precision,  initial  indications  were  that  an 
area  of  30  square  nautical  miles  was  sufficiently  large 
to  provide  the  analyst  with  a good  perspective  on  the 
variety  and  distribution  of  crops  within  a given 
locality.  A smaller  segment  size  tended  to  make  the 
classification  task  more  difficult  to  perform  without 
evidencing  any  significant  benefits.  Retrospectively, 
the  5-  by  6-nautical-milc  segment  has  proved  to  be 
satisfactory  in  terms  of  both  serving  the  analysts' 
needs  and  permitting  required  sampling  precision 
without  creating  an  unmanageable  data  load.  Finally, 
the  rectangular  configuration  was  especially  amena- 
ble to  computer  storage  and  manipulation. 

Sample  Selection  Procedure 

The  LACIE  sampling  strategy  depends  on 
whether  a country  has  detailed  historical  data  (e.g.. 
United  States,  Canada,  and  Australia)  or  whether  it 
has  data  at  only  one  level  smaller  than  the  country  it- 
self (e.g.,  U.S.S.R.,  China,  Argentina,  Brazil,  and  In- 
dia). In  the  latter  case,  a standard  stratified  sampling 
scheme  is  employed,  whereas  in  the  first  situation, 
the  sampling  strategy  consists  of  a two-stage 
stratified  random  sample  in  which  "substrata” 
(smallest  political  area  for  which  acreage,  yield,  and 
crop  calendars  arc  published)  are  the  primary  sam- 
pling units.  The  5-  by  6-nauticul-mile  segments  are 
the  secondary  units 

The  sampling  rramc  consists  of  the  agricultural 

area  w ithin  the  major  wheat-producing  regions  of  a 


country.  It  is  a collection  of  5-  by  6-nautical-mile  seg- 
ments in  agricultural  areas  as  determined  by  an 
"ag/non-ag”  delineation  created  from  Landsat  imag- 
ery and/or  USDA  Foreign  Agricultural  Service 
(FAS)  land  use  maps.  Each  stratum/substratum, 
then,  is  a collection  of  segments.  Figures  3 and  4 il- 
lustrate typical  strata  in  countries  with  and  without 
detailed  historical  data,  respectively.  In  Phase  1,  the 
Landsat  imagery  over  some  areas  to  be  sampled  was 
of  insufficient  quality  to  support  the  sampling  frame 
generation:  over  such  ureas,  use  was  made  of  existing 
maps  (e.g.,  land  use,  topogruphical)  lo  help  deter- 
mine the  ag/non-ag  areas.  The  tendency  in  areas  not 
having  quality  Landsat  coverage  was  lo  be  conserva- 
tive and  to  retain  areas  that  were  questionable  as 
agricultural.  (In  some  cases,  only  cities  and  moun- 
tainous areas  were  excluded.)  Although  this  ap- 
proach increases  the  chances  of  including  all  the 
wheat  in  the  sampling  frame,  it  can  result  in  a higher 
percentage  of  segments  w ith  little  or  no  wheat  as  well 
as  a higher  wheat  area  variance.  This  situation  im- 


Util  RE  4 — Stratum  vampling  frame. 
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proved  in  Phase  ill  with  the  availability  of  improved 
Landsat  data  coverage  over  such  areas. 


Allocation  of  Segment* 

Hackyrowul  for  estimation  of  total  country  sample 
size. — At  the  outset  of  LACIE,  a determination  was 
made  that  more  than  600  sample  segments  would  be 
required  in  the  United  States  to  achieve  an  expected 
country-level  sampling  error  of  approximately  2.5 
percent  for  wheat.  On  the  basis  of  this  determina- 
tion, a proportional  (to  the  wheat  acreage  from  an 
epoch  year)  allocation  was  performed  to  seven  other 
major  wheat-producing  countries  (U.S.S.R.,  Brazil, 
India,  Canada,  Australia,  China,  and  Argentina)  to 
determine  a worldwide  (i.e.,  eight  country)  alloca- 
tion. The  resulting  segment  total  was  slightly  less 
than  the  aetermined  system  capacity  (because  of 
hardware,  data  manipulation  capabilities,  etc.)  of  ap- 
proximately 4800  segments.  Consequently,  early  in 
the  project,  a decision  was  made  to  perform  a world- 
wide allocation  (to  the  previously  mentioned  eight 
countries)  of  4800  segments,  divided  among  the 
countries  in  proportion  to  their  wheat  acreages  in  an 
epoch  year.  Specifically,  using  the  1972  wheat  area 
for  the  eight  countries  (obtained  from  FAS 
agricultural  attaches  and  others),  the  country-level 
allocations,  based  on  wheat  area  in  thousands  of  hec- 
tares, were  as  follows. 


< 1‘lltltl  1 

Vwit/iA  „ 

0 hi'iV  id.  i, 

United  States 

637 

19  138 

U.S.S.R. 

1949 

58  492 

Brazil 

47 

1 500 

India 

bib 

19  139 

Canada 

283 

8 640 

Australia 

257 

7 776 

China 

810 

24  400 

Argentina 

165 

4 965 

Alterations  to  this  allocation  were  made  beginning 
in  Phase  II  which  resulted  in  different  country  totals 
for  the  United  States  in  Phases  11  and  III  and  for  the 
United  States  and  the  U.S.S.R.  in  the  Transition 
Year.  In  particular,  during  Phase  II,  a significant  un- 
derestimate of  the  wheat  area  was  observed  in  North 
Dakota.  Further  analysis  indicated  that  the  major 
problem  was  with  the  sample  placement  rather  than 
with  the  classification.  Indicated  solutions  were  the 
allocation  of  additional  segments  or  improved 
stratification  to  reduce  agricultural  area  variability,  or 
both.  Consequently,  20  additional  segments  were 


alloccted  to  North  Dakota  in  Phase  II,  resulting  in  a 
significantly  improved  wheat  area  estimate  for  that 
zone.  These  results  were  the  primary  driver  for  the 
decision  to  make  a revised  allocation  over  the 
yardstick  region  (the  USGP  in  this  case)  for  Phase 
111.  Additional  motivation  for  performing  a revised 
allocation  included  the  following: 

1 . To  reduce  the  sampling  error  to  approximately 

2.2  percent  (i.e.,  to  a point  of  relative  insignificance  • 

compared  to  the  classification  error,  then  shift 
emphasis  to  improving  the  classification  procedure) 

2.  To  make  use  of  the  improved  Landsat  imagery 
in  an  updated  sampling  frame 

3.  To  employ  a set  of  segments  allocated  such  that 
the  LACIE  production  estimate  could  be  expected  to 
satisfy  the  90/90  criterion  after  allowing  for  errors 
due  to  sampling,  classification,  yield  prediction,  and 
loss  of  data 

The  revised  allocation  resulted  in  an  increase  in  the 
total  number  of  sample  segments  in  the  USGP  from 
431  in  Phase  II  to  601  in  Phase  III. 

From  Phase  III  to  the  Transition  Year,  ihe  alloca- 
tions changed  for  the  U.S.S.R.  and  the  United  States. 

The  U.S.S.R.  difference  resulted  from  the  verifica- 
tion in  Phase  III  of  what  had  been  suspected  for 
some  time — the  U.S.S.R.  had  been  oversampled  in 
the  initial  allocation.  Consequently,  a revised  alloca- 
tion was  made,  oriented  toward  achieving  the  90/90 
criterion  for  the  LACIE  production  estimate,  allow- 
ing for  the  same  errors  as  in  the  yardstick  region.  The 
result  was  a reduction  from  1949  to  1 1 1 1 segments  in 
the  U.S.S.R. 

The  alteration  in  the  USGP  allocation  from  Phase 
Hi  to  the  Transition  Year  was  made  for  purposes  of 
testing  a "natural"  sampling  strategy  (the  details  of 
this  strategy  are  included  in  Feiveson's  paper).  More 
specifically,  it  is  well  known  that  the  level  of  ac- 
curacy of  an  estimator  such  as  that  utilized  in  LACIE 
depends  on  the  sample  size  and  is  adversely  affected 
by  the  variability  or  heterogeneity  of  the  charac- 
teristic^) being  measured  (in  this  case,  the 
heterogeneity  of  wheat  density  and  yield).  Apart 
from  increasing  the  sample  size,  another  means  of 
effectively  reducing  this  heterogeneity  is  by  improv- 
ing the  stratification.  During  Phase  II  of  LACIE,  a 
methodology  was  developed  to  use  Landsat  imagery 
and  agrophysical  data  to  improve  stratification  in 
foreign  areas.  This  method  ignored  political  bound-  * 

aries  and  restratified  along  boundaries  of  areas  that 
are  more  homogeneous  in  agricultural  density,  soil 
characteristics,  and  average  climatic  conditions.  The 
use  of  this  natural  sampling  strategy  domestically  in 
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the  Transition  Year  was  intended  to  provide  better 
applicability  of  the  yardstick  region  as  a quantifier  of 
foreign  results.  The  allocation  to  support  this 
strategy  resulted  in  a change  in  the  USGP  from  601 
segments  in  Phase  111  to  487  segments  in  the  Transi- 
tion Year. 

H'ithiiHWHntry  sample  alhhuiion. — Sample  seg- 
ments were  allocated  to  the  strata/substrata  within  a 
country  based  on  weights  which,  in  Phase  1,  were  a 
function  of  (1)  the  agricultural  area  in  the  stratum/ 
substratum  and  (2)  the  within-stratum/substratum 
standard  deviation  of  wheat  area  from  segment  to 
segment.  Estimates  of  the  former  were  obtained 
from  the  sampling  frame  generated  from  USD  A land 
use  maps  and  from  Landsat  imagery  over  those  areas 
with  sufficient  quality  coverage.  Estimates  of  the  lat- 
ter were  determined  by  assuming  that  the  per-seg- 
ment  numbers  of  pixels  classified  as  wheat  followed 
the  binomial  distribution.  Under  this  assumption, 
the  within-stratum/substratum  standard  deviation  of 
wheat  area  from  segment  to  segment  is  a function  of 
the  proportion  of  wheat  in  the  stratum/substratum, 
which  was  obtained  from  historical  data  for  an  epoch 
year. 

In  the  revised  allocations  made  in  the  USGP  and 
the  U.S.S.R.  in  Phase  HI  and  the  Transition  Year,  the 
allocation  weights  were  a function  of  (1)  the 
agricultural  area  in  the  substratum/stratum,  (2)  the 
within-substratum/stratum  standard  deviation  of 
small-grains  (wheat)  area  from  segment  to  segment 
in  the  USGP  and  in  the  U.S.S.R..  (3)  the  classifica- 
tion error  variance,  (4)  the  substratum/stratum  yield 
estimate,  and  (5)  the  substratum/s'ratum  yield  pre- 
diction error. 

The  agricultural  area  in  the  substratum/stratutv 
was  obtained  by  the  same  procedure  used  in  Phase  I: 
however,  the  availability  of  higher  quality  Landsat 
imagery  over  more  extensive  areas  permitted  the  use 
of  a more  refined  sampling  frame  for  Phase  HI. 

Direct  estimation  of  the  within-substratum  stan- 
dard deviation  of  wheat  area  from  segment  to  seg- 
ment in  the  U.S.  yardstick  region  (and  other 
substratum-level  countries)  was  not  possible  since 
most  substrata  had  insufficient  segments.  Although 
the  approach  taken  in  Phase  1 of  resorting  to  a 
binomial  distribution  assumption  and  using  epoch- 
year  historical  wheat  data  to  estimate  these  variances 
seemed  to  work  reasonably  well,  there  were  indica- 
tions fiom  the  Phase  II  North  Dakota  study  that  an 
improvement  was  attainable  by  taking  a slightly 
different  approach.  As  a part  of  the  North  Dakota 
study,  approximately  40  counties  were  selected 


throughout  the  USGP  in  an  attempt  to  obtain  an  im- 
proved estimator  of  the  within-substratum  standard 
deviation  of  small-grains  area.  The  proposed  ap- 
proach consisted  of  modeling  the  relation  between 
the  segment-to-segment  standard  deviation  of  small- 
grains  area  at  the  substratum-level  to  that  of  the  seg- 
ment-to-segment standard  deviation  of  agricultural 
area.  The  rationale  for  concennating  on  small  grains 
as  opposed  to  wheat  included  the  following. 

1.  At  the  time,  it  was  impossible  to  distinguish 
wheat  from  other  small  grains  in  Landsat  imagery. 

2.  Because  of  the  predominance  of  wheat  in  the 
areas  to  be  sampled,  the  belief  was  that,  for  allocation 
purposes,  replacing  unobservable  wheat  information 
with  small-grains  data  would  be  a reasonable 
substitution.  (Areas  outside  the  range  of  variability 
over  which  the  model  was  developed  offered  the 
greatest  potential  for  degradation.) 

3.  The  procedure  was  repeatable  in  other 
substratum-level  countries. 

Estimates  of  the  other  quantities  that  were  input 
w the  allocation  procedure  are  detailed  in  Feiveson's 
paper.  Depending  on  the  resulting  allocation  weights, 
substrata  in  countries  with  substratum-level  histori- 
cal data  (United  States.  Canada,  Australia)  were 
designated  as  Group  i (high  sampling  rate).  Group  U 
(low  sampling  rate),  and  Group  III  (not  sampled). 
Stratum-level  countries  (i.e.,  those  having  historical 
data  at  only  one  level  smaller  than  the  country  it- 
self—U.S.S.R.,  China,  Argentina,  India,  Brazil)  had 
their  strata  assigned  to  Group  1 or  Group  III  only.  In 
the  United  States,  where  substrata  are  counties,  the 
range  of  variation  of  the  allocation  weights  was  such 
that  to  meet  the  90/90  criterion  in  the  USGP,  it  was 
necessary  to  allocate  anywhere  from  0 to  S segments 
to  a county.  In  stralum-leve!  countries,  several  times 
that  number  were  assigned  to  a stratum.  (For  specific 
details  of  the  allocation  procedure,  see  the  paper  by 
Feiveson.) 

Area  Estimation 

Until  the  Transition  Year,  the  methodology  for 
the  direct  estimation  of  wheat  area  at  the  segment 
level  did  not  exist.  Instead,  a segment-level  winter, 
spring,  or  total  small-grains  estimate  was  made  for 
each  aggregable  segment.’  The  area  of  wheat  in  each 


-Certain  criteria  had  to  be  met  before  a given  segment  was  ap- 
proved for  aggregation  purposes.  These  criteria  are  detailed  in  the 
paper  h>  lleydorn  et  at.  entitled  "Classification  ait,!  Mensuration 
of  t ,\CTl:  Segments." 
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such  segmem  was  then  estimated  by  using  a “confu- 
sion crop"  ratio.  In  particular,  in  Phases  t and  11  (and 
Phase  III  in  the  U.S.S.R.),  the  substratum/stratum- 
level  historical  ratio  of  wheat  to  small  grains,  esti- 
mated from  epoch-year  data,  was  applied  to  the  raw 
small-grains  estimate  from  the  Classification  and 
Mensuration  Subsystem  (CAMS). 

During  Phase  11,  development  was  initiated  on  an 
econometric  confusion  crop  ratio  model  that  was 
later  evaluated  and  implemented  in  the  USGP  early 
in  Phase  111.  This  model  provided  confusion  crop 
ratio  estimates  at  the  CRD  level  and  made  use  of  a 
considerable  amount  of  near-real-time  economic  in- 
formation. The  development  and  evaluation  of  this 
model  are  detailed  further  in  the  paper  by  Umbcrger 
et  al.  entitled  "Econometric  Models  for  Predicting 
Confusion  Crop  Ratios." 

Given  the  set  |p,)  “ , of  aggregate  segment-level 
wheat  proportion  estimates  covering  the  area  of  in- 
terest, the  L AGE  wheat  area  estimate  A of  the  given 
area  of  interest  is  expressible  in  the  most  simplified 
form  as 


A A 

A = £ Vi  (2) 

where  /*,  *=  the  estimate  of  the  proportion  of  wheat 
in  the  Ah  aggregate  sample  segment  and  »,  **  the 
aggregation  weight  associated  with  the  rth  aggregate 
sample  segment.  In  general,  «•,  is  a function  of  the 
epoch-year  historical  data,  the  agricultural  area  in  the 
stratum  containing  the  segmem,  and  whether  or  not 
the  segment  is  used  in  part  of  a Group  111  ratio  (see 
"LACIE  Sample  Design”).  With  the  exception  of  the 
Group  III  ratio  case,  the  form  of  the  estimator  in 
equation  (2)  is  (he  same  as  the  standard  stratified 
sampling  estimator  (ref.  I). 

Siniitmlsitbsinmim-kwl  t srimiics. — Stratified  area 
estimation  is  performed  on  a base  of  stratum  or 
substratum  elements  (depending  on  whether  the 
country  is  a stratum-  oi  a substratum-level  country). 
In  the  L.AC1F:  framework,  substrata  are  designated 
us  Group  I (high  sampling  rate).  Group  11  (low  sam- 
pling rate),  and  Group  111  (not  sampled).  The  desig- 
nations are  made  on  the  basis  of  a threshold  value 
(detailed  in  the  “LACIE  Sample  Design”  paper). 
However,  the  qualitative  definitions  arc  as  follows: 

1.  Group  I substrata— Intensive  wheat-producing 
areas  which  are  allocated  one  or  more  sample  seg- 
ments. 

2.  Group  II  substrata — Areas  which  produce 
some  wheat;  one  sample  segment  is  allocated  with 


probabilities  proportional  to  size  (PPS)  (as  deter- 
mined from  the  wheat  area  in  an  epoch  year);  thus, 
some,  but  not  all.  Group  II  substrata  receive  a seg- 
ment. 

3.  Group  111  substrata— Areas  historically  having 
very  little  wheat;  thus,  no  sample  segments  are  allo- 
cated. A 

The  Group  1 substratum  wheat  area  estimator  Ax 
is  simply  the  standard  stratified  sampling  estimator 

A\  (3) 

where  Ji,  - the  estimate  of  the  mean  proportion  of 

wheat  in  the  Ah  Group  1 substratum  agricultural  area 

and  a,  - the  Ah  Group  I substratum  agricultural 
area. 

The  Group  II  stratum  wheat  area  estimator  .Ijj  is 
the  standard  PPS-type  estimator  (ref.  I),  which  has 
the  general  form 

4.  ■ £ 141 

i-i 

where  .1,  = the  estimate  of  the  wheat  area  in  the  Ah 
sampled  substratum  and  II,  = the  probability  with 
which  the  Ah  Group  II  substratum  was  selected  to 
receive  a segment. 

The  Group  111  estimator  -tm  of  the  collection  of 
Group  III  substrata  within  a given  stratum  is  a ratio 
estimator  having  the  form 


where  H,,  H n,  and  H m denote  the  historical  wheat 
acreages,  frorn  an  epoch  year,  over  the  associated 
areas  that  .4,.  A1{,  and  .4m  estimate  at  some  level.  The 
level  (i.e.,  cither  stratum  or  zone)  at  which  this  ratio 
is  applied  is  dependent  on  the  availability  of  aggrega- 
te segments  in  the  stratum.  Specifically,  if  there  is 
less  than  a certain  threshold  (i.e..  to  date,  a threshold 
of  three  has  been  employed— see  the  paper  by 
Chhikara  and  Feiveson  entitled  “LACIE  Large-Area 
Acreage  Estimation”)  of  aggregable  segments  at  the 
stratum  level,  this  ratio  is  applied  at  the  zone  level; 
otherwise,  it  is  applied  at  the  stratum  level.  In  the 
former  case,  this  simply  means  ,4',  and  .4,j  are  the 
total  estimates  of  wheat  area  for  all  Group  1 and 
Group  11  substrata,  respectively,  in  the  parent  zone. 
(Of  course,  Wj  and  Rjj  are  the  associated  historical 
wheal  acreages  over  the  corresponding  areas.)  Other- 
wise, .4,  and  4„  are  the  total  estimates  of  wheat  area 
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for  all  Group  I and  Group  II  substrata,  respectively, 
in  the  parent  stratum.  (Again,  H(  and  If’,,  are  the  as- 
sociated historical  acreages.)  In  either  case,  ln|  is  the 
total  wheat  area  estimator  of  all  Group  III  substrata 
(having  historical  wheat  area  It  m in  the  epoch  year) 
in  the  parent  stratum. 

In  stratum-level  countries,  recall  there  are  Group  l 
and  Group  III  strata  only.  In  this  case,  the  Group  I 
and  Group  III  estimators  arc  of  the  same  form  as  ex- 
plained previously;  however,  the  stratified  area 
estimation  is  performed  on  the  base  of  stratum  ele- 
ments. Moreover,  the  Group  ill  ratio  estimator  is  al- 
ways applied  at  the  zone  level  with  one  exception; 
the  situation  has  occurred  (in  Phases  II  and  III  of 
LACIE  in  the  U.S.S.R.)  wherein  one  or  more  zones 
had  little  to  no  acquired  segments  at  the  time  of  ag- 
gregation. In  this  situation,  such  zones  are  ratio  esti- 
mated using  a ratio  estimator  not  unlike  that  dis- 
cussed previously,  whereby  stratum  wheat  area  esti- 
mates from  surrounding  zones  (having  sufficient 
segment  coverage)  were  employed  as  the  basis  for 
the  ratio  estimation. 

Maher  level  estimates. — Wheat  area  estimates 
above  the  stratum/substratum  base  level,  such  as  at 
the  zone,  region,  or  country  level,  are  obtained 
simply  by  adding  the  estimates  for  the  strata  in- 
cluded in  the  area  of  interest  (i.e.,  zone.  region,  coun- 
try, etc.).  In  mixed  wheat  areas,  the  estima- 
tion/aggregalion  procedure  is  performed  separately 
for  each  crop  ivpe  (winter  and  spring  wheat).  To  ob- 
tain total  wheat,  the  separate  estimates  for  each  crop 
type  are  aggregated. 


YIELD  AND  PRODUCTION  ESTIMATION 

Yield  model  development  and  evaluation  in 
LACIE  has  been  primarily  the  responsibility  of  the 
Center  for  Climatic  and  Environmental  Assessment 
(CCEA)  (a  branch  of  the  National  Oceanic  and  At- 
mospheric Administration  (NOAA)).  The  yield 
models  used  in  LACIE  have  been  of  an  "agromet" 
type  (i.e.,  have  utilized  agronomic  and  meteorologi- 
cal data  as  inputs  to  their  development  and  use)  and 
have  been  polynomial  functions  of  such  weather 
variables  as  monthly  precipitation  and  potential 
evapotranspiration.  Weather  variables  arc  entered  as 
departures  from  long-term  normals.  In  general,  the 
assumed  model  form  is 


) = \fi  + t <b) 


where  ) is  the  vector  of  historical  yields  over  the 
stratum  of  interest.  Vis  the  matrix  of  weather  data,  0 
is  the  weather  coefficient  to  be  estimated,  and  < is  a 
vector  of  random  errors.  Consequently,  the  weather 
coefficients  were  estimated,  by  /},  in  the  standard 
least  squares  manner,  i.e., 

0 = i Vr,V>~,.Y,>  (7) 

Yield  prediction  for  the  Ah  yield  stratum  is  then 
computed  using 

a a 

Y,  * 0Xt  (8) 

where  V,  is  the  current-year  weather  observation  vec- 
tor for  the  Ah  stratum.  Coefficients  were  estimated 
separately  for  each  yield  stratum  because  agrophysi- 
cal differences  between  regions  arc  such  that  it  is 
unlikely  the  same  yield  model  would  hold. 

Some  of  the  assumptions/shortcomings  of  this 
model  are 

1.  The  assumption  of  the  adequacy  of  the  model 
form  to  predict  yield. 

2.  The  historical  yields  utilized  in  the  model 
development  were  those  reported  by  the  SRS 
domestically  and  the  FAS  in  foreign  areas;  however, 
the  weather  data  came  from  the  various  meteorologi- 
cal stations  and,  hence,  were  not  sampled  in  the  same 
manner  as  the  yield  data. 

.V  Initially  in  LACIE.  some  yield  strata  utilized 
common  weather  data  which  induced  unaccounted- 
for  correlations.  Prior  to  the  beginning  of  Phase  111, 
an  adjustment  was  made  to  the  yield  strata  in  order 
to  correct  for  this  situation.  The  resulting  yield  strata 
were  no  longer  necessarily  state-level  strata;  figures  5 
and  6 illustrate  the  new  yield  struta  in  the  U3GP  and 
the  U.S.S.R. 

An  "average"  yield  esiimate  is  obtained  for  a yield 
stratum  by 


where  -I,  and  Ps  are  the  urea  and  production,  respec- 
tively. for  the  stratum.  The  specifics  of  the  yield 
modeling  and  evaluation  effort  are  documented  in 
the  paper  by  Strommcn  et  al.  entitled  “Development 
of  LACIE  CCEA-I  Weather/Wheat  Yield  Models" 
and  will  not  be  detailed  further  herein;  the  remainder 
of  this  discussion  considers  the  estimated  yields  and 
their  associated  prediction  error  estimates  as  given. 

The  LACIE  production  estimator  (depicted  in  fig. 
7)  is  simply  the  product  of  the  yield  and  acreage  (ag- 
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gregated  to  the  "pseudozone"  level3  to  account  for 
the  lack  of  coincidence  between  the  yield  and  area 
strata)  estimators.  An  estimate  of  the  production  in  a 
pseudozone  is  obtained  by  the  product  of  its  area 
estimate  with  its  yield  prediction,  and  these  esti- 
mates are  aggregated  to  predict  zone  and  higher  level 
production.  The  form,  therefore,  of  the  production 
estimator  l for  a given  area  of  interest  is  as  follows 

a r u 

-P-aL  Y(A(  (lot 

where  f,  - the  yield  estimator  for  the  rth 
pseudozone  and  .4,  <*  the  estimator  of  wheat  area  for 
the  Ah  pseudozone.  (Specifically,  ,-f,  is  the  aggregate 


JA  "fwcuiliMone”  is ihc  are*  resulting  from  the  intersection  of 
h yield  stratum  with  the  area  strata  in  a tone 


FIC.l’RE  5.— Wheat  yield  model  (weather  regression)  catenet 
far  I he  U.S.  Great  Plains,  (a)  Whiter  wheal  model  homdoHei. 
(hi  Spring  wheal  model  boundaries. 


*0  N M E M 40  M SO  ?0  M 


Pttil  hi  t S.S  h.  crop  rvgiuns  murid  by  sprint:  and  niiilcr  altcal  yield  rntAwciun  models  UntltHis  ,t|  and  12  arc  tint  siiuwit  on 
this  map). 
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of  those  stratum/substratum  wheat  area  estimates  in 
the  parent  pseudo/one.) 

In  mixed-wheat  areas,  this  procedure  is  performed 
separately  for  each  ciop  type.  The  total  production  is 
obtained  by  adding  the  estimates  for  each  crop  type. 
Assessment  of  the  LACIE  area,  yield,  and  produc- 
tion estimators  requires  estimation  of  their  respec- 
tive variances.  (The  accuracy  assessment  details  of 
LACIE  are  included  in  the  paper  by  Houston  ct  al. 
entitled  “Accuracy  Assessment:  The  Statistical  Ap- 
proach to  Performance  Evaluation  in  LACIE.")  The 
following  section  summarizes  this  procedure. 


VARIANCE  ESTIMATION 


Area  Variance  Estimation 

In  stratum-level  countries,  there  arc  frequently 
enough  segments  available  to  permit  the  direct  com- 
pulation of  the  sample  v ariance  from  segment  to  seg- 
ment m each  stratum.  Recall  that  the  Group  III 


strata  arc  ratio  estimated  from  various  Group  I strata 
cs'iniites.  Consequently,  the  w heat  area  variance  of 
a Gr.up  III  stratum  is  a function  of  the  Group  I 
varia  tees.  The  w heat  area  variances  for  higher  levels 
(e  g , zone,  region,  or  country)  are  essentially  aggre- 
gates of  the  stratum-level  variances.  The  Group  III 
variances  arc  appropriate  linear  combinations  of 
Group  I variances. 

For  substratum-level  countries,  the  within- 
substratum  wheat  area  variance  estimation  pro- 
cedure is  somew  hat  more  complicated  since  there  are 
many  substrata  with  only  one  segment.  A summary 
of  the  procedure  follows;  the  specific  details  arc  in- 
cluded in  the  paper  by  Chhikara  and  Eeivcson  en- 
titled "Large  Area  Aggregation  and  Mean  Square 
Prediction  Error  Estimation  for  LACIE  Yield  and 
Production  Estimates,” 

1.  Divide  all  the  substrata  within  a zone  into  “col- 
lections” based  on  prior  w ithin-substratum  variances 
computed  at  the  time  of  allocation. 

2.  Within  the  / 1 h collection,  estimate  a variance 
\*  by  regressing  substratum  estimates  against 
historical  data  and  computing  the  residual  variance. 


AREA 

SEGMENT 

WHEAT 

PROPORTION 


AGGREGATED  AREA  STRATUM  » VIEL0  STRATUM  * PRODUCTION  STRATUM 


PRODUCTION  USGP  = V PRODUCTION  STRATUM 


I Mil  Ml  7. — |*i ihIik'I ion  cNiiiiiaiioii  from  vaiii|)linu 


3.  Assign  the  value  s}  for  the  within-stratum 
variance  for  all  substrata  in  the  rth  collection. 

The  model  assumed  in  the  regression  fit  carried 
out  in  step  2 is  as  follows, 

C,  ■ a + 0*,  + 5,  (ID 

where  <\  “ the  true  wheat  acreage  in  the  rth 
substratum,  V,  - the  historical  wheat  acreage  in  the 
rth  substratum  for  an  epoch  year,  and  A,  * random 
fluctuation.  Also 


- C,  * 

(12) 

A 

C(  « a + 0Xt  ♦ e,  ♦ 8. 

(13) 

A 

where  “ the  LACIE  wheat  area  estimate  in  the 
rth  substratum  and  «(  ” the  sampling  plus  classifica- 
tion error. 

It  is  also  assumed  that  the  variance  of  A,  is  con- 
siderably smaller  than  that  of  «,.  (Based  on  previous 
observations,  this  appears  to  be  a good  assumption.) 


Yield  M«an-8quar«  Prediction 
Error  Estimation 

Referring  to  the  form  of  the  CCEA  yield  model 
estimator  (see  eqs.  (6)  to  (8)),  the  yield  mean-square 
prediction  error  (MSPE)  estimator  for  a given  yield 
stratum  is  the  standard 

- o2[l  + Xf\XTX  1 .V,  ] (14) 

where  /'(•)  denotes  the  expectation  operator. 

The  variance  of  the  “average"  yield  given  by  equa- 
tion (9)  is  obtained  using  the  approximate  variance 
of  a ratio  between  two  correlated  random  variables 
(detailed  in  the  paper  by  Chhikara  and  feiveson). 


Production  Variance  Estimation 

Two  basic  assumptions  are  made  in  arriving  at  the 
final  form  of  the  LACIE  production  variance  estima- 
tor. 

1.  Segment-level  wheat  area  estimates  arc 
mutually  independent  and  unbiased. 

2.  Yield  estimates  are  unbiased,  are  mutually  in- 
dependent (at  the  yield  stratum  level  as  opposed  to 


the  pseudo/one  level),  and  are  independent  of  the 
acreage  estimates. 

Under  these  assumptions,  if  the  summation  in 
equation  (10)  is  taken  over  yield  strata  (as  opposed 
to  pseudozones)  the  variance  of  the  production 
estimator  A is  expressible  as 


where  l (.f,)  and  1(1,)  denote  the  variances  of  the 
acreage  and  yield  estimators,  respectively,  of  the  rth 
yield  stratum,  and  .1,  and  ),  denote  the  estimator 
means  for  acreage  and  yield,  respectively,  for  the  rth 
yield  stratum.  The  LACIE  production  variance 
estimator,  then,  is  approximately  that  obtained  by 
replacing  the  parameters  on  the  right  side  of  equation 
(IS)  with  their  respective  estimates.  (An  additional 
adjustment  of  changing  the  sign  preceding  the  term 
l < .1,)  I ( ),)  to  a minus  results  in  an  unbiased 
variance  estimator  of  A.)  Further  details  are  included 
in  the  paper  by  Chhikara  and  Feiveson. 

SPECIAL  PROBLEMS  ENCOUNTERED 
IN  LACIE  SAMPLING  AND  AGGREGATION 

In  an  experiment  having  as  many  constraints  and 
complexities  as  LACIE  had,  there  is  the  expectation 
from  the  outset  of  encountering  many  problems.  In 
LACIE,  an  Accuracy  Assessment  Subsystem  was 
created  not  only  to  closely  monitor  the  LACIE  esti- 
mates and  assess  their  accuracy  but  also  to  expedite 
the  surfacing  of  various  problems  and  their  subse- 
quent resolutions  in  order  that  system  impacts  could 
be  held  to  a minimum  (sec  the  paper  by  Houston 
et  al.  for  the  specifics  of  the  accuracy  assessment 
functions  in  LACIE).  During  Phases  I,  II.  and  III  of 
LACIE.  a number  of  sampling  and  aggregation  prob- 
lems surfaced.  For  example,  by  the  end  of  Phase  I.  it 
was  clear  that  the  design  as  it  existed  at  that  time  had 
certain  disadvantages  such  as 

1 . The  domestic  approach  did  not  appear  to  be  en- 
tirely adequate  as  an  indicator  of  the  expected  perfor- 
mance levels  in  foreign  regions;  i.e.,  the  U.S.  county 
is  a substratum  of  much  smaller  size  than  the  areas 
for  which  data  were  available  in  most  foreign 
regions. 

2.  Considerable  effort  was  required  to  establish 
the  degree  to  which  all  assumptions  were  sufficiently 
satisfied;  moreover,  extensive  data  were  required  to 


14 


evaluate  the  precision  of  area,  yield,  and  production 
estimates. 

Other  problems  related  to  sampling  and  aggrega- 
tion  that  surfaced  during  Phases  I.  II.  or  III  are  dis- 
cussed in  the  following  paragraphs. 


Crop  Typo  Estimation 
in  Mixad-Whaat  Aroaa 

In  areas  having  significant  amounts  of  both  spring 
and  winter  wheat,  there  was  the  question  of  whether 
analysts  should  provide  estimates  of  both  spring  and 
winter  wheat  from  every  segment  or  predcsignate 
-ach  segment  as  either  a spring  or  a winter  wheat  seg- 
ment and  provide  only  one  crop  type  estimate  from 
each  segment.  In  Phases  I and  11.  the  latter  procedure 
wai  employed;  predesignation  of  segments  was  ini- 
tially i in  Phase  I)  performed  as  follows.  For  each 
stratum  (CRD  in  the  United  States,  oblast  in  the 
U.S.S.R.),  the  proportion  of  allocated  segments  pre- 
designated as  spring  (winter)  was  the  sa  ic  as  the 
proportion  of  spring  (winter)  wheat  a a in  the 
stratum  in  an  epoch  year:  segment  labels  m this  pro- 
portion were  levied  randomly.  The  results  from  this 
approach  appeared  favorable. 

A decision  was  made  at  the  end  of  Phase  II  to  im- 
plement the  first  procedure;  i.e.,  the  one  in  which 
analysts  are  required  to  pass  both  a spring  and  a 
winter  wheat  estimate  from  each  segment  in  a 
mixed-wheat  area.  Unfortunately,  there  were  several 
mixed-w  heat  areas  in  which  segments  had  almost  no 
spring  wheat  or  almost  no  winter  wheat,  thus  forcing 
the  analyst  to  look  for  “a  needle  in  a haystack."  In- 
dications were  that  more  care  should  be  exercised  in 
designating  which  strata  should  be  “mixed." 

During  Phase  111,  the  Accuracy  Assessment  group 
conducted  a study  which  indicated  that  a reasonable 
guideline  is  to  designate  a stratum  as  mixed  if  neither 
crop  type's  presence  (historically)  is  below  approx- 
imately 20  percent  of  the  total  wheat  area  (i.e..  winter 
plus  spring).  To  date,  this  procedure  appears  to  be 
working  satisfactorily  ; however,  this  issue  requires 
further  investigation. 


Nonraaponaa  Because  of  Cloud  Cover 

Because  of  atmospheric  effects  such  as  ha/e  and 
cloud  cover,  L ACTE  does  not  get  coverage  over  every 
segment  on  every  pass.  To  counteract  this  problem. 


the  aggregation  logic  uses  a ratio  estimator  to  provide 
estimates  of  nonrespsow  areas  in  the  same  manner 
as  previously  discussed  for  Group  III  areas.  The  mag- 
nitude of  the  bias  induced  by  nonresponse  has  been 
monitored  in  the  Accuracy  Assessment  program.  In- 
dications are  that  the  loss  of  acquisitions  from  cloud 
cover  was  a problem  in  Phase  I;  however,  tests  con- 
ducted to  date  indicate  that  error  arising  from  this 
lo'.s  is  probably  tandom  with  no  significant  bias 
being  introduced.  In  foreign  areas  where  the  strata 
are  considerably  larger  than  counties,  the  bias  in- 
duced by  nonresponse  is  bclieveJ  to  be  somewhat 
more  pronounced  (particularly  at  the  stratum  level) 
but  not  to  such  a degree  as  to  warrant  alarm, 
although  this  conclusion  has  not  been  rigorously 
verified. 


Claaaiftcatlon  and  Yield  Prediction  Bias 

As  indicated  in  the  first  section  of  this  paper,  an 
assessment  of  the  accuracy  of  the  L ACIE  production 
estimator  requires  knowledge  of  its  variance  and 
bias.  The  variance  is  computed  as  has  been  indicated. 
Although  the  sampling  scheme  employed  in  LACIE 
is  oriented  toward  having  no  sampling  bias  (except 
that  induced  by  cloud  cover  and  ratio  estimation). 
Accuracy  Assessment  still  has  the  chore  of  estimat- 
ing the  bias  induced  by  classification  and  yield  pre- 
diction, As  a result,  “wall-io-waH"  ground  data  in- 
ventories have  been  taken  over  a subsample  of  the 
segments  in  the  yardstick  area  for  use  in  estimating 
bias  and  for  general  diagnostic  purposes.  The  ap- 
proach has  been  to  estimate  the  bias  from  ground 
data  taken  from  a random  subsample  of  approx- 
imately one-third  of  the  allocated  segments.  (The 
resulting  segments  are  referred  to  as  “blind  sites.”) 
However,  until  a larger  subsample  of  blind  sites  is 
available  (not  a cost-effective  approach)  or  until  a 
more  clever  procedure  for  bias  estimation  is  found, 
reliable  estimates  of  the  magnitude  of  the  bias  will 
continue  to  be  lacking. 

Instability  of  Group  III  Ratios 

The  information  in  the  section  on  area  estimation 
indicates  that  the  Group  III  estimator  »m  for  a given 
Group  III  stratum/substratum  has  the  general  form 

T,„  * RA  (16) 
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where  A ii  an  estimator  of  a nearby  area  having  suffi- 
cient data  for  a direct  estimate  and  R is  the  ratio  of 
the  historical  wheat  area  in  the  Group  III  area^to  the 
historical  wheat  area  in  the  area  estimated  by  t.  Con- 
sequently, the  variance  of  4j„  is 

ini 

Examination  of  the  graph  of  l(  lm)  (fig.  8)  for 
various  values  of  HA)  suggests  that  one  should  ex- 
pect instability  in  the  Group  III  ratio  estimator  over 
areas  wherein  R and/or  1(4)  are  targe.  This  situation 
was  not  rigorously  investigated  until  Phase  III  of 
LACIE.  At  that  time.  Accuracy  Assessment  con- 
ducted a study  which  led  to  the  conclusion  that  areas 
to  be  used  as  a base  for  ratio  estimation  of  a particu- 
lar stratum  should  be  selected  such  that  the  mag- 
nitude of  R is  between  I and  1.5.  Although  inis  ap- 
proach seems  to  perform  satisfactorily  , further  work 
is  warranted. 

Poor  Estimation  of  Group  II  "•laes" 

Shortly  after  completion  of  the  revised  allocation 
in  the  USGP  and  before  the  first  Phase  III  aggrega- 
tion, the  1974  U.S.  Agricultural  Census  data  became 
available  and  were  used  in  place  of  the  l%9  census 
data  to  update  the  Crop  Assessment  Subsystem 
(CAS)  aggregation  data  base.  No  historical  data  used 
in  performing  the  revised  allocation  were  retained  in 
the  data  base.  As  a result,  sufficient  disagreement  ex- 
isted in  a numbci  ot  the  Group  II  substrata  between 
the  allocation  and  the  aggregation  “sizes"  (recall  that 
a PPS  allocation  was  performed  on  the  collection  of 
Group  II  substrata— as  a result,  a Group  II  bias  arises 
unless  the  Group  II  allocation  and  aggregation 
“sizes'*  (weights)  agree)  to  induce  considerable  bias 
in  the  first  two  aggregations  in  Phase  III— particu- 
larly in  the  state  of  South  Dakota. 

Further  investigation  determined  that  the  Phase 
111  allocation  of  segmenis  (based  on  small  grains— 
sec  “LACIE  Sample  Design”  for  details)  led  to  the 
allocation  of  segments  to  some  counties  which 
historically  contain  little  or  no  wheat.  Specifically . it 
was  found  that  32  counties  designated  as  Gr-*.!**  III 
counties  should  probably  have  been  Group  I or  II 
counties,  whereas  12  Group  I and  II  counties  -.oould 
have  been  Group  III  counties.  The  main  effect  of  the 
first  error  was  to  increase  bias  and  variance  ihrough 
the  use  of  Group  III  ratios.  The  second  error  w as  cor- 
rected by  redesignating  the  appropriate  Group  I and 
Group  II  as  Group  III  for  the  remainder  of  Phase  III. 
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Thr«»hoMlt>fl/ Screening  of  OutUwra 

The  accuracy  of  scgmcni-lcvcl  estimates  varies 
considerably  wijh  the  segment  as  a result  of  such  fac- 
tors as  crop  development  stage  at  the  time  of  anaty  sis 
(early  season,  midscason.  harvest),  acquisition  pat- 
icm  ( liming  relative  to  crop  biosiagcs  and  number  of 
missing  acquisitions),  analyst  effect,  and  segment 
effect  (eg.,  field  size),  As  a result,  various  segment 
estimates  received  by  CAS  arc  sufficiently  inaccurate 
as  to  not  warrant  inclusion  in  the  aggregation.  Durin™ 
Phase  III  of  LACIE.  a procedure  for  thresholding 
early-scason  estimates  was  developed  to  eliminate 
segmenis  the  estimates  for  which  were  made  much 
earlier  in  the  year  and  which,  consequently,  were  not 
reliable  indicators  of  the  actual  wheal  area,  in  addi- 
tion. a screening  procedure  was  developed  to  detect 
segment  outliers  that  may  be  due  toother  reasons  de- 
scribed in  more  detail  in  the  following  paragraphs. 

The  I -ACII  began  data  processing  for  crop  5fc.tr 
1977  winter  wheat  when  the  normal  crop  calendar 
reached  2.0  on  the  Robertson  growth  scale  These 
early-season  classifications  b>  C AMS  resulted  in  seg- 
ment proportion  estimates  made  before  all  wheat  if^ 
a segment  was  detected  h>  LACIE.  These  early- 
season  estimates  remain  in  the  data  base  and  are  ag- 
gregated until  replaced  by  a later  acquisition  If  these 
suspect  estimates  arc  eliminated  from  the  aggrega- 
tion as  more  data  become  available,  the  subsequent 
aggregated  area  estimates  should  more  accurately 
reflect  the  actual  wheat  area  (provided,  of  course, 
enough  segments  remain  to  permit  an  aggregation 
with  tolerable  variance).  During  Phase  III.  an  empiri- 
cal approach  was  developed  to  arrive  objectively  ai  a 
threshold  date  lor  each  state  Although  the 
thresholding  of  the  early-season  acquisitions  has  the 


effect  of  reducing  the  early-aeaaon  bits,  the  km  of 
segments  siso  incresses  the  variance.  Consequently, 
the  empirical  approach  attempted  to  select  the  op- 
timal biostage  for  a threshold  date  such  that  the  com- 
bined contributions  of  variance  and  bias  yielded  a 
minimal  mean-squared  erroAf  the  acreage  estimate 
Figure  9 depicts  this  approach. 

At  about  the  same  time  in  Phase  III,  a screening 
procedure  was  developed  for  the  USGP  to  Hag  seg- 
ments having  highly  questionable  estimates  at  a 
result  of  such  influences  as  bad  classification  or  ap- 
plication of  an  erroneous  small-grains-to-wheat  ratio, 
it  was  clear  from  the  problems  experienced  in  South 
Dakota,  for  example,  that  the  aggregation  logic  is 
very  sensitive  to  acreage  estimation  errors  for  coun- 
ties located  in  low  to  marginal  wheat-growing  areas. 
To  achieve  an  accurate  area  estimate,  it  is  desirable  to 
screen  segments  whose  estimates  are  probable  out- 
liers. Following  is  the  procedure  developed  and  ap- 
plied in  Phase  III  in  the  IJSGP. 

1.  Counties  were  grouped  into  four  categories: 

low,  marginal,  medium,  and  high  wheat  density 
counties.  . 

2.  For  each  aggregable  segmeff.  the  log  of  the 
ratio  of  the  CAMS  estimate  of  wheat  proportion  in 
(he  segment  to  that  of  the  county  containin^lhe  seg- 
ment (estimated  from  the  epoch-year  data)  is  com- 
puted. 

3.  The  distribution  of  the  data  generated  in  step  2 
is  investigated  within  each  category  and  confidence 
intervals  are  constructed  about  the  category  means. 

4.  Those  segments  in  a category  that  lie  outside 
the  category  confidence  interval  computed  in  step  3 
are  declared  outliers. 

The  thresholding  and  screening  procedures  were 
applied  in  the  USGP  in  Phase  III  with  quite  satisfac- 
tory' results. 

Additional  problems  that  have  not  received  ,he 
degree  of  attention  demanded  by  those  discussed 
previously  and  are  thus  awaiting  resolution  or 
further  refinement  include  the  following. 

1 . An  investigation  is  needed  to  assess  the  degree 
of  dependency  between  acreage  and  yield  model  er- 
rors followed  by  the  in^rporation  of  results  into  the 
production  prediction  mean-squared  error  estimator. 
(Currently,  the  acreage  and  yield  errors  are  assumed 
to  be  independent.) 

2.  Crop  acreage  variances  arc  not  computable  for 
strata  having  only  one  aggregable  sqyrtent,  Cur- 
rently. the  variances  computed  at  the  time  of  alloca- 
tion arc  used  for  such  strata;  however,  in  some  cases, 
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these  estimates  are  regression  estimates.  This  situa- 
tion warrants  further  investigation  for  pott.‘*ial 
improvements. 

3.  There  is  a problem  in  estimating  crop  acreages 
over  nonresponse  areas;  i.e.,  areas  for  which  little  or 
no  satellite  data  are  available  and  historical  data  are 
either  very  poor  or  nonexistent.  Currently,  the  ap- 
proach taken  in  LACIE  is  to  use  a ratio  estimator 
(the  Group  III  ratio  estimation  procedure)  that  esti- 
mates the  trend,  relative  to  an  epoch  year  using  near- 
by, satellite-acquired  data  as  a base  in  the  ratio 
estimation.  Further  research  is  needed  to  improve 
this  situation,  particularly  in  foreign  areas  where  lit- 
tle or  no  historical  data  are  available. 

4.  Related  eflorts  to  date  in  LACIE  have  pro- 
duced a single-crop  sampling  strategy  and  the  associ- 
ated aggregation  and  variance  estimation  formula- 
tions. There  is  now  a need  to  generalize  to  the  case  of 
multiple  crops  (work  is  currently  being  initiated  to 
this  end).  In  particular,  there  is  a need  for  research  to 
develop  and  test  a multicrop  production  estimation 
procedure  tha:  incorporates  various  constraints  and 
interrelations  between  crops  including  the  correla- 
tions between  separate  crop  type  acreage  estimates, 
the  correlation  between  acreage  and  yield,  and  in- 
herent constraints  such  as  the  inequality  constraint 
that  disallows  total  crop  area  to  exceed  the  total 
agricultural  area. 

5.  Results  from  LACIE  indicate  that,  apart  from 
increasing  the  sample  size,  an  improved  stratification 
is  another  means  of  improving  precision  of  crop  pro- 
duction estimators.  Recently,  the  methodology  has 
been  developed  for  interpreting  and  synthesizing 
satellite  data  (particularly  I andsat).  soils  informa- 
tion. and  meteorological  data  to  enable  an  improved 
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stratification.  The  resulting  strata  are  referred  to  as 
“agrophysical  units"  (geographic  areas  having 
definable/comparable  agronomic  and  physical 
parameters  which  reflect  a certain  range  of 
agricultural  use  and  management).  There  is  a need  to 
investigate  further  the  relationship  between 
agrophysical  unit  development  and  both  conven- 
tional soils  mapping  and  soils  mapping  using  satellite 
data  to  develop  methods  of  improving  the  utility  of 
agrophysical  units  in  large  area  crop  production 
estimation.  Emphasis  should  placed  on  the 
development  and  testing  of  techniques  applicable  in 
foreign  areas  where  suitable  soils  maps  and  other 
pertinent  data  are  generally  not  available. 


SUMMARY/CONCLUSIONS 

It  was  essential  that  the  sampling  strategy  used  in 
LACIE  be  good  enough  to  support  a cost-effective 
system.  This  has  been  particularly  evidenced  by  the 
following. 

1 . An  approximately  2-percent  sampling  error  has 
been  achieved  in  LACIE  by  sampling  only  approx- 
imately 2 percent  of  the  sampling  frame. 

2.  The  sample  design  in  the  yardstick  region  for 
which  historical  data  were  available  down  to  a 
substratum  level  to  support  missing  data  resulting 
from  cloud  cover  provided  the  most  accurate  esti- 
mate possible. 

3.  The  implemented  strategy  provided  data  of 
sufficient  quantity  and  quality  to  support  required 


performance  levels  and  also  to  satisfy  the  existing 
constraints. 

4.  The  allocation  scheme  appeared  to  provide  the 
most  efficient  usage  of  the  available  data  and  cave 
efficient  segment  coverage  of  major  producing  areas 
and  thus  improved  the  probability  of  an  accurate 
estimate. 
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INTRODUCTION 

Using  econometric  models  to  predict  annual  ad- 
justments in  the  ratios  of  crop  acreages  is  unique  in 
economic  literature.  The  more  common  approach  is 
to  develop  models  which  attempt  to  directly  predict 
annual  adjustments  in  the  planted  (or  harvested) 
acreage  of  an  individual  crop  within  an  area.  The 
need  for  the  LACIE  was  partly  derived  from  the  past 
lack  of  success  in  devising  ways  (including 
econometric  models)  to  predict  the  acreage  of  wheat 
and  other  crops  in  foreign  areas.  However,  the  im- 
petus for  developing  ratio  models  arose  from  a prac- 
tical problem  in  the  LACIE — the  inability  to  ac- 
curately classify  spring  wheat  signatures  in  certain 
areas  where  “confusion”  crops  are  grown.  The  ques- 
tion the  ratio  modeling  effort  attempted  to  answer 
was  “Assuming  that  accurate  and  reliable  confusion 
crop  acreage  estimates  for  an  area  were  made  avail- 
able from  remote-sensing  sources,  could  ratio  models 
be  developed  to  provide  accurate  and  reliable  .esti- 
mates of  the  acreages  of  individual  crops?" 

The  study  which  is  the  subject  of  this  paper  was 
initiated  by  LACIE  management  to  determine  the 
feasibility  of  developing  econometric  models  capable 
of  improving  the  predictive  characteristics  of  histori- 
cal ratios  to  a level  that  would  support  established 
LACIE  accuracy  goals.  A plan  was  developed  outlin- 
ing LACIE  requirements  for  ratios,  area,  coverage, 
prediction  timeliness,  modeling  approach,  data 
needs,  and  resource  requirements  for  developing 
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operational  models  on  a time  schedule  consistent 
with  providing  confusion  crop  ratio  estimates  for 
Phase  III.  The  task,  which  was  undertaken  by  scien- 
tists from  the  University  of  Missouri  and  Oregon 
State  University  and  by  LACIE  personnel  at  Colum- 
bia, Missouri,  was  begun  in  January  1977  and  com- 
pleted in  July  1977.  The  results  for  the  United  States 
and  Canada  are  reported  separately  (refs.  1 and  2). 

The  purpose  of  this  study  was  to  test  the 
feasibility  of  using  econometric  methods  to  predict 
annual  adjustments  in  the  relative  acreages  of  wheat 
and  “confusion"  crops  in  the  States  of  North  Dakota, 
South  Dakota,  Montana,  and  Minnesota  and  in  the 
Canadian  Province  of  Saskatchewan.  “Confusion" 
crops  were  defined  as  those  crops  having  similar  crop 
calendars,  including  such  spring-planted  crops  as 
spring  wheat,  barley,  oats,  and  flax  and  such  fall- 
planted  crops  as  winter  wheat  and  rye. 


Confusion  Crop  Problems 

Initially  in  the  LACIE  project,  it  was  planned  to 
use  area  samples  of  Landsat  imagery  to  estimate 
wheat  acreage  for  a test  region  (ref.  3).  Regional 
wheat  acreage  estimates  were  to  be  derived  from 
Landsat  imagery  by  estimating  the  proportion  of 
wheat  acreage  to  total  agricultural  land  in  each  seg- 
ment and  aggregating  these  estimates  to  the  ap- 
propriate regional  level.  During  LACIE  Phases  I and 
II  (1975  and  1976  crops),  however,  operational 
difficulties  were  encountered  in  providing  a reliable 
wheat-proportion  estimate  for  sample  segments  in 
areas  where  confusion  crops  were  grown.  During 
Phase  I,  it  became  apparent  that  LACIE  analysts, 
using  existing  classification  procedures,  were  unable 
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to  reliably  separate  wheat  signatures  from  the  sig- 
natures of  certain  other  crops  in  certain  regions  (ref. 
3,  pp.  1-11  and  2-60).  This  problem  was  particularly 
troublesome  in  those  areas  where  several  crops  with 
similar  stages  of  plant  development  are  found.  For 
example,  potential  confusion  was  suspected  among 
winter  wheat,  spring  wheat,  rye,  barley,  oats,  and  flax 
signatures  in  the  spring  or  mixed  wheat  states  (Min- 
nesota, Montana,  North  Dakota,  and  South  Dakota) 
of  the  U.S.  Great  Plains.  Rye,  barley,  and  oats  were 
the  mqjor  crops  being  confused  with  spring  wheat  in 
the  Saskatchewan  Province  of  Canada.  Similar 
classification-related  problems  were  also  suspected 
in  the  spring  and  mixed  wheat  areas  of  the  U.S.S.R. 

During  Phase  II,  the  accepted  procedure  was  for 
the  LACIE  analyst  to  identify  wheat  for  each  sample 
segment  where  this  could  be  done  with  a high  degree 
of  accuracy.  Where  the  existence  of  confusion  crops 
in  a segment  made  the  classification  of  wheat  im- 
possible or  of  questionable  accuracy,  the  analyst 
would  instead  provide  a proportion  estimate  of  either 
winter  or  spring  confusion  crop  acreage.  Where  there 
was  possible  confusion  between  both  winter  and 
spring  small  grains,  a proportion  estimate  for  total 
small  grains  was  provided  for  the  sample  segment. 
Thus,  the  total  set  of  classification  results  for  use  as 
input  in  acreage  aggregations  was  one  of  the  follow- 
ing: winter  wheat  (WW),  spring  wheat  (SW),  winter 
small  grains  (WG),  spring  small  grains  (SG),  or  total 
small  grains  (GR). 

In  cases  where  a WG,  SG,  or  GR  ratio  was  esti- 
mated for  a sample  segment,  some  procedure  was  re- 
quired to  obtain  an  estimate  of  the  desired  WW  or 
SW  ratio  prior  to  aggregating  the  sample  segment  to 
the  strata  level  (crop  reporting  district  (CRD)  in  the 
United  States).  In  Phase  II,  the  procedure  was  to  ap- 
ply to  each  segment  (with  a WG,  SG,  or  GR  ratio)  a 
substrata  level  (county  in  the  United  States)  histori- 
cal ratio  of  the  proportion  of  winter  wheat  or  spring 
wheat.  The  historical  ratio  was  WW/WG  or  WW/GR 
for  winter  wheat  and  SW/SG  or  SW/GR  for  spring 
wheat.  In  general,  historical  ratios  for  estimating  the 
proportion  of  winter  wheat  or  spring  wheat  at  the 
substrata  level  were  based  on  the  most  recent  single 
year  of  historical  data  available.  County-level  Statisti- 
cal Reporting  Service  (SRS)  data  for  1975  were  used 
to  develop  ratios  in  Minnesota,  North  Dakota,  and 
South  Dakota;  1973  data  were  used  for  Montana 
since  more  recent  estimates  were  not  available.  In 
Canada,  1971  census  data  were  used. 

Because  historical  ratios  proved  to  be  a source  of 


error  in  estimating  acreages,  more  accurate  estimates 
of  current-year  confusion  crop  ratios  were  desired. 
Econometric  models  were  considered  as  a possible 
method  to  reduce  the  error  in  historical  ratios  until 
classification  techniques  and  procedures  which  could 
use  Landsat  imagery  to  reliably  differentiate  between 
wheat  and  other  small  grains  were  developed,  tested, 
and  implemented. 


Objective 

The  overall  objective  of  the  ratio  modeling  effort 
was  to  develop,  test,  evaluate,  and  recommend  for 
implementation  a method  of  projecting  confusion 
crop  ratios  for  the  1977  crop  year  to  a level  of  ac- 
curacy that  would  support  the  attainment  of  stated 
LACIE  performance  criteria. 


LACIE  CONFUSION  CROP  RATIO 
REQUIREMENTS 

The  nature  of  the  LACIE  operational  system  and 
aggregation  procedures  led  to  several  unique  require- 
ments for  econometric  modeling.  The  requirements 
of  the  LACIE  aggregation  software  often  led  to  com- 
promises with  the  preferred  methodology  for  ratio 
model  development.  These  compromises  are  ad- 
dressed in  the  following  sections. 


Geographic  Region*  Considered 

Potential  confusion  crops  and  required  ratios 
varied  among  regions.  Based  on  procedures  used  in 
Phase  III,  ratioing  methodology  was  required  for  all 
active  LACIE  countries  (the  United  States,  Canada, 
and  the  U.S.S.R.).  However,  because  the  U.S.S.R.  is  a 
planned  economy  and  the  set  of  variables  that  might 
explain  significant  changes  in  crop  ratios  differs 
from  that  of  the  other  countries,  the  initial  ratio 
modeling  effort  was  concentrated  on  specified  U.S. 
and  Canadian  areas.  Resource  limitations  and  data 
availability  were  also  factors  contributing  to  this 
decision. 

Geographic  areas  for  which  ratio  models  were 
developed  to  support  Phase  III  estimates  were  the 
States  of  Minnesota,  Montana,  North  Dakota,  and 
South  Dakota  (fig.  1)  and  the  Canadian  Province  of 
Saskatchewan  (fig.  2). 
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Six*  of  Geographic  Area 

Ideally,  the  LACIE  needed  a specific  estimate  of 
confusion  crop  ratios  for  each  sample  segment  in  a 
region1,  however,  data  did  not  exist  to  develop 
models  that  were  capable  of  providing  estimates  at 
the  segment  level.  The  next  larger  area  size  was  a 
county  or  substrata.  However,  the  questionable  ac- 
curacy of  county  acreage  data  and  the  large  number 
of  models  needed  presented  practical  problems  for 
developing  ratio  models  at  the  county  level. 

As  a compromise  between  LACIE  requirements 
and  practical  modeling  constraints,  confusion  crop 


FIGMtE  l.—M»p  of  U.S.  geographic  areas  showing  state  crop 
reporting  districts  modeled. 


FIGURE  I.— Map  of  Saskatchewan  showing  geographic  areas 
(rones  and  crop  districts)  modeled. 


models  were  developed  for  LACIE  strata;  that  is,  for 
CRD’s  in  the  United  States  and  for  crop  districts 
(CD’s)  in  Saskatchewan.  As  a guide  to  independent 
variable  selection,  state-level  mooels  were  also 
developed  in  the  United  States  and  a province  model 
was  developed  for  Saskatchewan. 

Although  preliminary  state-level  models  were 
tested  for  the  United  States,  all  final  models  were 
estimated  at  the  CRD  level  (fig.  1 ),  Canadian  models 
were  reported  for  20  CD's,  9 zones  (zones  being  com- 
posed of  2 to  4 CD’s),  and  the  province  (fig.  2). 


Specification  of  Confusion  Crop  Ratio* 

Confusion  crops  were  defined  as  those  crops 
which  could  not  be  reliably  differentiated  from 
winter  wheat  or  spring  wheat  using  Landsat  imagery. 
Although  LACIE  analysts  were  uncertain  about  the 
precise  identity  of  confusion  crops  in  all  areas,  they 
agreed  that  wheat,  rye,  barley,  oats,  and  flax  would 
provide  a useful  basis  for  testing  the  capability  of 
econometric  models  in  Phase  HI.  Definition  of 
specific  needs  in  terms  of  planted  versus  harvested 
ratios  was  not  clear  at  the  beginning  of  the  task. 
Where  possible,  models  were  developed  for  predict- 
ing both  harvested  and  planted  ratios.  For  Minnesota 
and  Saskatchewan,  where  historical  data  on  planted 
acreage  were  not  available  at  the  CRD  level,  only 
harvested  acreage  ratio  models  were  developed. 

The  required  product  for  input  into  the  LACIE  ag- 
gregation system  was  a ratio  of  winter  wheat  or 
spring  wheat  acreage  to  the  appropriate  small  grains 
acreage  classes.  These  classes  were  (1)  winter  grains 
(including  winter  wheat  and  winter  rye);  (2)  spring 
small  grains  (including  spring  wheat,  barley,  oats, 
and  flax);  and  (3)  total  grains  (including  both  spring 
and  winter  small  grains).  In  Minnesota,  North 
Dakota,  and  Saskatchewan,  it  was  necessary  to  pro- 
ject only  two  ratios:  SW7SG  and  SW/GR.  These  two 
spring  wheat  ratios  also  were  required  for  the  mixed 
wheal  states  of  Montana  and  South  Dakota  in  those 
CRD's  where  both  winter  and  spring  grains  were 
grown.  In  addition,  two  winter  grain  ratios 
(WW/WG  and  WW/GR)  were  required  for  South 
Dakota  and  one  winter  grain  ratio  (WW/GR)  was  re- 
quired for  Montana.  Because  no  CRD-level  acreage 
estimates  were  available  for  winter  rye  in  Montana, 
no  WW/WG  ratio  was  estimated  in  that  state.  Thus, 
168  CRD  and  20  CD  confusion  crop  ratio  models 
were  required  (table  I). 
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Table  I.— Number  of  Confusion  Crop  Ratio  Models  Developed. for  Crop  Reporting 
Districts  in  the  United  States  and. for  Crop  Districts  in  Canada 


State  or 

province  

Number  of  CRD  or  CD  models 

Total 

Planted  acreage 

Harvested  acreage 

sw/sxs 

SW/GR 

WW/WG 

WW/GR 

SWISG 

SWtGR 

WMWG 

WW/GR 

Minnesota 

0 

0 

0 

0 

9 

9 

0 

0 

18 

Montana 

7 

7 

7 

0 

7 

7 

7 

0 

42 

North  Dakota 

9 

9 

0 

0 

9 

9 

0 

0 

36 

South  Dakota 

9 

9 

9 

9 

9 

9 

9 

9 

72 

Saskatchewan 

0 

0 

0 

0 

20 

0 

0 

0 

20 

Total 

25 

25 

16 

9 

54 

34 

16 

9 

188 

TECHNICAL  BACKGROUND 

This  section  analyzes  the  historical  adjustments  in 
confusion  crop  acreage  and  reviews  past  efforts  to 
develop  econometric  models  for  predicting  acreage 
adjustments  of  individual  crops. 


Analysis  of  Historical  Confusion  Crop  Data 

The  SW/SG  ratios  for  a CRD  or  CD  are  a function 
of  spring  wheat  acreage  and  acreages  of  barley,  oats, 
and  flax  in  the  area.  Changes  in  the  acreages  of  any 


of  these  crops  alter  the  SW/SG  ratios.  As  shown  in 
tables  II  to  VI,  annual  changes  in  the  SW/SG  ratios 
historically  have  been  large.  For  example,  in  Min- 
nesota, the  size  of  the  annual  changes  in  the  SW/SG 
ratios  historically  ranged  from  -53.3  percent  (in 
CRD  90)  to  353.3  percent  (in  CRD  20).  Changes  in 
the  SW/SG  ratio  from  1975  to  1976  ranged  from 
-10.0  percent  in  CRD  30  to  150.0  percent  in  CRD 
70.  Although  the  range  of  year-to-year  adjustments 
in  the  SW/SG  ratios  was  generally  larger  in  the  Min- 
nesota CRD’s  than  in  other  states  or  in  Saskatch- 
ewan, annual  changes  greater  than  ± 10  percent  were 
nevertheless  common  in  all  CRD's  and  CD's.  These 


Table  II. — Minnesota:  Harvested  Acreage  of  Spring  Wheat  and  Spring  Grains 

[Calculated  front  ref.  4] 


Geographic 

area 

1976  data  (range  for  196.1-76) 

SW''  area, 
percent  of 
state  total 

SG  area, 
percent  of 
slate  total 

SW/SG  area 
ratio 

Annual  change 
in  SW/SG  area, 
percent 

CRD  10 

49.2  (47.0  to  68.8) 

42.3  (35.5  to  47.3) 

6.640  (0.160  to  0.640) 

10.3  (-38.5  to62.5) 

CRD  20 

.4  (0.2  to  0.7) 

1.0  (0.7  to  1.3) 

.250  <0.030  to  0.300) 

-16.6  (-50.0  to  366.7) 

CRD  30 

0 (0) 

.1  (0.0  to  0.2) 

.090  (0.030  to  0.110) 

-10.0  (-50.0  to  133  3) 

CRD  40 

30.8  (23.8  to  34.7) 

28.0  (25.3  to  31.9) 

.600  (0.140  to  0.600) 

15.4  (-16.7  to  60.0) 

CRD  50 

7.3  (1.6  to  8.3) 

10.4  (9.3  to  11.9) 

.390  (0.030  to  0.390) 

25.8  (-62.5  to  200.0) 

CRD  60 

.3  (0.1  to  0.3) 

1.5  (l.S  to  2.4) 

.090  (0.010  to  0.090) 

12.5  (-50.0  to  200.0) 

CRD  70 

5.4  (0.5  to  5.4) 

7.4  (5.1  to  9.4) 

.400  (0.010  to  0.400) 

150.0  (-66.7  to  150.0) 

CRD  80 

4.8  (0.6  to  12.1) 

4.6  (3.0  to  7.2) 

.580  (0.060  to  0.580) 

56.8  (-65.0  to  266.7) 

CRD  90 

1.8  (0.5  to  4.0) 

4.6  (4.6  to  7,2) 

.210  (0.030  to  0.210) 

31.2  (-62.5  to  180.0) 
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changes  show  the  problem  of  using  last  year's  ratio  as 
a predictor  for  the  current  year. 

The  size  of  the  SW/SG  ratio  for  a CRD  or  CD  is 
not  necessarily  related  to  the  percentage  of  the  state 
spring  wheat  acreages  accounted  for  by  that  CRD  or 
CD,  but  the  largest  SW/SG  ratios  tend  to  be  in  those 
CRD's  or  CD's  with  the  largest  spring  wheat 
acreages  (tables  II  to  VI).  The  CRD's  with  the  largest 
acreages  of  spring  wheat  also  tended  to  have  the 
most  stable  SW/SG  ratios.  A small  error  in  estimat- 
ing  the  SW/SG  ratio  for  a CRD  with  a miyor  share  of 
the  state’s  wheat  acreage  can  have  a large  impact  on 


the  error  in  the  spring  wheat  acreage  estimate  for 
that  state.  Thus,  in  developing  ratio  models,  it  was 
important  that  efforts  be  focused  on  those  CRD's 
with  the  large  spring  wheat  acreages.  In  1976,  most  of 
Minnesota's  spring  wheat  and  spring  grain  acreages 
were  concentrated  in  CRD's  10  and  40  (table  II). 
CRD  10  alone  accounted  for  47.0  to  68.8  percent  of 
the  state's  spring  wheat  acreage  (excluding  durum) 
during  the  1963-76  period.  In  Montana,  CRD's  20 
and  30  together  typically  account  for  about  90  per- 
cent of  the  spring  wheat  acreage  with  the  latter  CRD 
alone  accounting  for  more  than  one-half  of  the  total 


Table  III. — Montana:  Harvested  Acreage  of  Spring  Wheat  and  Spring  Grains 

(Calculated  from  rtf.  SJ 


Geographic 

area 

1976  data  (range  for  196.1-76) 

SH'  area, 
percent  of 
state  total 

SG  area, 
percent  of 
state  total 

SW/SG  area 
ratio 

Annual  change 
In  SW/SG  area, 
percent 

CRD  10 

0.6  (0.4  to  1.0) 

2.3  (1.6  to  2.4) 

0.170  (0.114  to  0.243) 

— 15.8  (-36.6  to  66.5) 

CRD  20 

50.8  (20.6  to  39.6) 

35.3  (32.5  to  44.2) 

.543  (0.229  to  0.543) 

25.4  (-50.9  to  71.4) 

CRD  30 

57.8  (48.8  to  64.8) 

43.3  (34.4  to  47.4) 

.830  (0.654  to  0.830) 

5.7  (-14.6  to  23.1) 

CRD  50 

4.5  (1.9  to  5.3) 

8.3  (5.8  to  9.4) 

.336  (0.119  to  0.336) 

83.6  (-55.0  to  83.6) 

CRD  70 

.8  (0.5  to  1.0) 

2.3  (1.7  to  2.3) 

.222  (0.083  to  0.287) 

69.5  (-47.9  to  112.9) 

CRD  80 

.9  (0.5  to  1.3) 

3.0  (2.5  to  4.5) 

.184  (0.078  to  0.184) 

53.3  (-48.7  to61.4) 

CRD  90 

4.7  (3.2  to  5.7) 

5.6  (4.0  to  6.1) 

.526  (0.369  to  0.594) 

.4  (-28.5  to  60.9) 

Table  IV. — North  Dakota:  Harvested  Acreage  of  Spring  Wheat  and  Spring  Grains 

(Calculated  from  ref.  6/ 

Geographic 

1976  data  (range  for  196.1-76) 

area 

SW  area. 

SG  area. 

SW/SG  area 

Annual  change 

percent  of 

percent  of 

ratio 

in  SW/SG  area. 

state  total 

state  total 

percent 

CRD  10 

16.4  (15.3  to  18.3) 

13.0  (12.5  to  13.6) 

0.877  (0.579  to  0.877) 

0.8  (-8.9  to  20.1) 

CRD  20 

11.6  (10.8  to  12.5) 

11.1  (10.7  to  11.7) 

.728  (0.502  to  0.728) 

5.4  (-13.4  to  20.2) 

CRD  30 

11.3  (11.3  to  19,6) 

19.3  (18.3  to  20.1) 

.708  <0.419  to  0.708) 

3.2  (-8.9  to  23.5) 

CRD  40 

7.5  (7.5  to  9.6) 

6.4  (6.4  to  7.9) 

.818  (0.576  to  0.818) 

4.6  (-11.9  to  12.7) 

CRD  50 

12.5  (9.3  to  12.5) 

11.2  (10.610  11.7) 

.775  (0.401  to  0.775) 

9.3  (—9.8  to  35.7) 

CRD  60 

13.2  (8.5  to  13.2) 

.13.5  (11.7  to  13.5) 

.679  (0.293  to  0.679) 

6.1  (-10.5  to  36.6) 

CRD  70 

8.1  (6.6  to  10.8) 

6.7  (6.5  to  7,3) 

.837  (0.584  to  0.837) 

3.0  (-18.8  to  33.1) 

CRD  80 

7.1  <5.8  to  7.8) 

6.6  (6.2  to  7.5) 

.747  (0.471  to  0,747) 

13.0  (-12.3  to  27.9) 

CRD  90 

12.3  (7.0  to  12.3) 

12.2  (11.1  to  12.5) 

.698  (0.322  to  0.698) 

15.6  (-11.9  to  45.1) 
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Table  V.— South  Dakota:  Harvested  Acreage  of  Spring  Wheat  and  Spring  Grains 


(Calculated  from  ref.  Ij 


Geographic 

area 

1976  data  (range  Jar  196.L76) 

SWarta. 
percent  of 
state  total 

SG  amt, 
percenttf 

state  total 

SW/SG  arm 
rath 

Annual  change  « 
mSH/SGarea. 

percent 

CRD  10 

17.1  (13.1  to  11.3) 

108  (7.7  to  108) 

0.795  (0,530  to  0 .795) 

19.4  (—22.3  to  23.9) 

CRD  20 

38.2  <38.2  to  49.0) 

24.6  (24.6  to  291) 

.780  (0.460  to  0.780) 

26.0  (-121  to  26.0) 

CRD  30 

26.7  (14.410  26.7) 

23.3  (22.0(0  25.1) 

.575  <0,178  to  0.575) 

44.5  (-12.5  to  53.7) 

CRD  40 

.8  <0.8  to  2.2) 

1.0  <10  to  2.4) 

.384  (0.159  to  0 384) 

29.3  (—29.8  10  67.7) 

CRD  SO 

6.2  (6.2  to  14.71 

4.4  (4.4to  13.1) 

.708  <0.294  to  0 708) 

43.3  (-20.6  to  26.41 

CRD  60 

S.0  (1.6  to  $.0) 

18.0  (13.2  to  18.0) 

.140  (0.038  to  0.140) 

29.6  ( — 24.6  to  53.5) 

CRD  70 

.$  (0.1  to  0.S) 

.7  (0,6  to  1,0) 

.403  (0.038  to  0.403) 

38.5  (-67.0  10  112.51 

CRD  80 

2.S  (1.0  to  2.8) 

2.5  (2.5  to  4.0) 

.494  (0.082  to  0 494) 

80  3 (-34.5  lo  108.4) 

CRD  90 

3.0  (1.2  lo  .*.0) 

14.6  <88  to  14.6) 

.102  (0.042  to  0.102) 

30  8 ( — 28.9  io  57.8) 

TABLE  VL—Saskatchewan:  Harvested  Acreage  of  Spring  Wheat  and  Spring  Grains 

[Calculated  from  ref  21 


Geographic 

area 


1 976  data  (range  for  1962-76) 


SH'’  area , SG  area.  SWISS  urea  Annual  change 

percent  of  percent  of  ratio  in  SM  /SG  area, 

state  total  state  total  percent 


CD  IA 

4.4 

(4.4  to  5.4) 

4.5 

(4.4  to  5.1) 

0.766 

<0,622  to  0 865) 

8.0 

(-20.2  to  8.0) 

CD  IB 

3.3 

<2.2  to  3.6) 

3.8 

(3.4  to  4.0) 

.677 

(0.413  to  0.830) 

8.1 

( -40  0 to  35  6) 

CD  2A 

4.7 

(4.1  to  5.4) 

4.0 

(3.8  to  4.5) 

.909 

(0.713  to  0,911) 

3,8 

( — 17.5  lo  6.3) 

CD  2B 

6.7 

(5.3  lo  6.9) 

5.8 

(5.0  to  6.4) 

.904 

(0.646  to  0.932) 

5.5 

(-23.3  to  9 4) 

CD  3AS 

6.7 

(6.1  to  10.2) 

5.6 

(5.6  to  7.2) 

.935 

(0.793  to  0.935) 

2.6 

( -8.9  to  6.5 1 

CD  3AN 

3.4 

(3.4  to  4.6) 

3.0 

(3.0  to  3.6) 

.891 

(0.762  to  0.939) 

3.8 

1—13.4  to  7.9) 

CD  3BS 

4.6 

(4.4  to  8.2) 

3.8 

(3.8  to  5.9) 

.932 

(0.792  to  0.932) 

1.7 

(-7.5  to  10  2) 

CD  3BN 

6.4 

(6.1  to  9.0) 

5.5 

(5.5  to  7.1) 

.913 

(0.731  to  0.944) 

2.0 

t -11.3  to  11.5) 

CD  4A 

2.4 

(2.1  to  4.2) 

2.1 

<2.1  to  3.3) 

.885 

(0.731  to  0.885) 

3.8 

(-8.7  to  8.3) 

CD  4B 

3.3 

<3.5  to  5.9) 

2.8 

(2.8  to  4.1) 

.930 

(0.797  lo  0.960) 

1.0 

t — 7.6  lo  5.9) 

CD  5A 

t>.& 

(5.5  to  6.7) 

6.9 

(6.0  to  7.2) 

.745 

(0.526  to  0,828) 

96 

(-296  to  10  6) 

CD  SB 

6.4 

<3.9  to  6.4) 

7.7 

(6.3  to  7.7) 

.649 

(0.379  to  0.767) 

15.7 

(-43.6  to  36.1) 

CD  6A 

8.1 

(7.5  to  9.1) 

7.5 

(7.3  to  8.4) 

.843 

(0.6IS  to  0.93 1) 

6,8 

(-26.0  to  8.8) 

CD  6B 

6.0 

<5.9  to  7.1) 

5.8 

(5.8  to  6.9) 

.8)6 

(0.606  to  0.888) 

5.8 

1-23.4  to  16.0) 

CD  7 A 

6.1 

(5.4  to  6.6) 

5.4 

(5.3  to  6.0) 

.870 

(0.613  to  0.944) 

26 

(-21.7  to  16.2) 

CD  7B 

4.9 

(3.2  to  5.0) 

5.0 

(4.2  to  5.0) 

.772 

<0.571  lo  0.870) 

2.7 

(-39.5  to  27.7) 

CD  8A 

3.8 

<1.9  to  3.8) 

5.0 

(2.9  to  5.0) 

.607 

(0.390  to  0 767) 

19.7 

1-36.0  to  19  7) 

CD  8B 

4.3 

<3.9  to  4.8) 

4.8 

(4.2  to  5.2) 

.704 

(0.410  to  0 807) 

12  3 

1 -41  1 to  24  6) 

CD  9A 

4.6 

(3.3  to  5.4) 

6.4 

(5.5  to  6.6) 

.568 

(0.344  to  0.741) 

18.8 

t — 40.4  to  23.8) 

CD  9B 

3.2 

<1.7  to  3.2) 

4.5 

(3.5  to  4.5) 

.548 

(0.264  to  0.729) 

17.3 

(-501  to  47  7 ) 
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acreage  (table  111).  In  North  Dakota,  spring  wheat 
acreage  was  more  evenly  distributed  with  no  CRD 
accounting  Tor  as  much  as  20  percent  or  the  state 
acreage  (table  IV).  Most  of  South  Dakota's  spring 
wheat  acreage  was  grown  in  the  three  northern 
CRD's  (10,  20,  and  30)  with  CRD  SO  accounting  for 
another  6.2  percent  of  the  acreage  in  1976  (table  V). 
In  Saskatchewan,  wheat  acreage  is  distributed  fairly 
evenly  among  the  CD's  with  no  CD  accounting  for  as 
much  as  10  percent  of  the  total  spring  wheal  acreage 
in  the  province  (table  VI).  (A  more  complete  set  of 
data  for  the  1963-76  period  is  available  from  the 
authors  on  request  as  a statistical  appendix.) 


Review  of  Acreage  Response  Studies 

The  need  for  models  capable  of  predicting  ratios 
or  relative  acreages  of  wheat  and  other  small  grains  is 
unique  to  the  LAC1E  project,  and  no  direct  prece- 
dent for  ratio  modeling  was  found  in  the  literature. 
Nonetheless,  the  ratio  problem  can  be  usefully 
viewed  as  a special  case  of  the  general  problem  of 
predicting  how  farmers  adjust  crop  acreages  in 
response  to  changing  economic  and  physical  signals. 
Several  studies  seeking  to  explain  and  predict  wheat 
acreage  response  in  the  United  States  and  Canada 
have  appeared  in  recent  years.  A review  of  these 
studies  provided  useful  background  information  in 
developing  ratio  models.  All  these  studies  employed 
single-equation  multiple-regression  analysis  on  time 
series  data. 

U.S.  wheal  acreage  response  studies. — Of  three  re- 
cent studies  of  U.S.  wheat  acreage  responses 
reviewed,  two  were  primarily  concerned  with  assess- 
ing the  impact  of  government  policy  on  wheat  plant- 
ings. Lidman  and  Bawden  (ref.  8)  focused  on  govern- 
ment allotment  programs  and  the  various  incentives 
for  program  participation  (loan  rates,  direct  pay- 
ments, voluntary  diversion  payments,  etc.).  Using  a 
model  to  predict  national  wheat  acreage  from  1954 
through  1970,  the  authors  concluded  that  agricultural 
programs  exerted  a significant  influence  on  the 
amount  of  wheat  acreage  planted,  whereas  lagged 
market  price  was  not  an  important  determinant  in 
the  acreage  planted  to  wheat  during  the  1954*70 
period  (ref.  8,  p.  333).  No  prediction  tests  were  re- 
ported, but  it  is  not  likely  that  their  model  would  per- 
form well  outside  the  sample  period  because  a 


general  cropland  set-aside  program  was  substituted 
for  the  commodity  specific  allotment  program  in 
1971. 

In  an  attempt  to  assess  the  impact  of  this  change 
in  policy,  Garst  and  Miller  (ref.  9)  developed  a new 
set  of  wheat  acreage  response  models.  These  models 
are  quite  similar  to  those  presented  by  Lidman  and 
Bawden,  regressing  planted  acreage  on  wheat  allot- 
ment, wheat  diversions,  and  lagged  price,  with  two 
dummy  variables  intended  to  represent  changes  in 
marketing  quota  requirements  as  well  as  the  wheat 
set-aside  variable.  In  all,  three  sets  of  model  results 
are  reported:  one  for  all-wheat  states,  one  for  winter- 
wheat  states,  and  one  for  all-spring-wheat  states 
(North  Dakota,  South  Dakota,  Minnesota,  and  Mon- 
tana). In  the  spring  wheat  model  allotments,  diver- 
sions and  wheat  set-aside  variables  were  statistically 
important,  whereas  lagged  price  was  not.  The  model 
explained  a large  proportion  of  the  historical  varia- 
tion in  spring  wheat  plantings  (coefficient  of  multi- 
ple determination  R1  greater  than  98.0).  Nonethe- 
less, Garst  and  Miller  conclude  that  “as  with  the  use 
of  most  models  of  this  type,  predictions  of  future  im- 
pacts should  be  examined  with  some  skepticism,  par- 
ticularly when  they  rely  on  data  outside  the  range  of 
that  used  in  the  regressions’*  (ref.  9,  p.  36). 

The  third  study  reviewed  the  development  of  a 
predictive  model  as  a primary  goal  of  the  analysis. 
Using  state-level  data  for  North  and  South  Dakota 
for  1948  through  1974,  Weaver,  Morzuch,  and 
Helmberger  (ref.  10)  hypothesize  that  planted 
acreages  of  wheat  are  a function  of  expected  prices 
for  wheat  and  alternative  crops.  In  a departure  from 
other  acreage  response  studies,  they  employed  index- 
es of  future  prices  as  proxies  for  exported  prices.  A 
trend  intended  “to  account  for  any  systematic 
changes  attributable  to  changes  in  technology,  rela- 
tive factor  prices,  and  other  disturbing  influences" 
(ref.  10,  p.  8)  is  also  included  in  the  model.  Predic- 
tions for  1975  and  1976  were  generated  extra-sample 
and  then  compared  with  preliminary  official  esti- 
mates. The  authors  concluded  that  “.  . . the  equa- 
tions do  not  yield  good  predictions  of  planted  spring 
wheat  acreages  in  1975  and  1976"  (ref.  lu,p.  18).  Pre- 
dicted acreage  for  North  Dakota  equaled  78  percent 
of  the  actual  acreage  in  1975  and  74  percent  in  1976; 
for  South  Dakota,  it  equaled  74  percent  of  the  actual 
acreage  in  1975  and  67  percent  in  1976.  The  authors 
characterize  one  of  their  important  findings  as  being 
that,  in  the  absence  of  binding  acreage  allotments 
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(such  as  those  in  effect  in  the  years  1950  and  1954 
through  1964),  the  acreage  planted  to  spring  wheat 
responds  positively  to  the  ratio  of  the  expected  price 
of  spring  wheat  to  the  expected  price  of  other  crops 
(ref.  10,  p.  18). 

The  poor  predictive  ability  was  not  surprising  con- 
sidering the  period  used  to  develop  the  model.  Dur- 
ing most  of  the  period,  government  programs  created 
incentives  to  limit  wheat  and  feed-grain  acreage. 
During  1971  through  1973,  government  programs 
became  more  neutral  concerning  which  crops  farm- 
ers planted  on  their  allotted  crop  acreage  but  con- 
tinued to  provide  incentives  for  limiting  total  wheat 
and  feed-grain  acreages.  After  1973,  the  wheat  and 
feed-grain  programs  had  essentially  no  impact  on  the 
grain  industry  because  market  prices  were  substan- 
tially in  excess  of  target  prices.  Farmers  were  not 
only  free  to  allocate  "normal"  cropland  acreage 
among  competing  crops  according  to  expected  rela- 
tive crop  prices,  but  they  also  had  an  incentive  to  in- 
crease “normal”  crop  acreages  by  reducing  land  in 
fallow  and,  in  some  cases,  planting  crops  on  land 
usually  considered  to  be  marginal  cropland.  Thus, 
events  in  the  period  after  1973  introduced  a basic 
supply  shifter  in  the  acreage  response  fur.ction  which 
was  not  included  in  any  of  the  studies  reviewed. 

Canadian  acreage  response  studies. — Four  Cana- 
dian acreage  response  studies  were  reviewed. 
Schmitz  (ref.  11)  theorizes  that  acreage  planted  to 
wheat  is  a function  of  expected  price  for  wheat  and 
other  crops  and  of  several  nonpricc  variables,  such  as 
moisture  before  and  during  seeding,  farm-level 
wheat  stocks,  wheat  export  sales,  technology,  and 
capital  availability.  (The  last  two  variables  were 
represented  by  a trend  proxy  in  the  estimations.) 
Estimation  was  accomplished  by  ordinary  least- 
squares  regression  on  national-level  annual  data 
from  1947  through  1966.  In  all,  some  24  variations  on 
the  two  basic  models  are  reported.  In  no  case  was  the 
correct  sign  obtained  for  moisture  before  and  during 
planting  or  for  livestock  prices. 

With  respect  to  alternative  crops,  barley  prices 
showed  the  expected  sign  but  were  not  statistically 
significant,  whereas  flax  prices  were  of  the  correct 
sign  and  usually  significant  in  the  various  alternative 
models.  Wheat  prices  were  significant  in  every  case, 
as  were  export  sales  and.  generally,  on-farm  stock 
levels. 

A study  by  Capel  (ref.  12),  which  was  intended  to 
provide  an  efficient  low-cost  forecast  of  wheat 
acreage,  employed  the  simplest  form  of  the  Nerlove 
distributed  lag  model,  expressing  acreage  as  a func- 


tion of  expected  price.  In  practice,  this  amounts  to  a 
regression  on  wheat  price  and  last  year's  acreage.  An- 
nual data  for  the  Canadian  prairie  provinces  from 
1950  through  1967  were  used  to  estimate  the  model 
coefficients  in  a double-log  transformation.  A test  of 
predictive  capability  was  not  presented. 

In  the  third  study  reviewed,  Meilke  (ref.  13)  hy- 
pothesized that  Canadian  producers  react  to  two 
different  prices  in  making  their  production  decisions. 
The  first  of  these  is  the  Canadian  Wheat  Board 
(CWB)  initial  and  adjustment  payments  combined, 
initial  payments  are  received  on  delivery  and  con- 
stitute a floor  price.  Adjustment  payments  are 
usually  made  in  the  spring,  retroactive  to  the  begin- 
ning of  the  crop  year.  Meilke  noted  that  the  two  pay- 
ments taken  together  are  quite  stable.  Final  pay- 
ments (and  advances  on  final  payments,  or  interim 
payments)  are  usually  made  18  to  24  months  after 
planting  and  are  much  more  variable.  As  a test  of  the 
two-price  hypothesis,  Meilke  constructed  acreage 
response  models  for  wheat,  barley,  and  oats;  one  set 
includes  the  two  variables  separately,  whereas  one 
combines  initial  and  final  payments  into  a single 
variable.  Other  variables  included  marketings  of 
wheat  as  a percentage  of  production  plus  on-farm 
carry-in  stocks  lagged  I year  (a  proxy  for  the  antici- 
pated restrictiveness  of  marketing  quotas),  and  a 
dummy  variable  equal  to  one  in  1970  (representing 
the  Lower  Inventories  for  Tomorrow  (LIFT)  pro- 
gram). Price  expectations  are  accounted  for  by  a dis- 
tributed lag  formulation  (i.e.,  the  lagged  dependent 
variable  is  included  as  a regressor). 

These  equations  were  estimated  using  annual  data 
for  the  prairie  provinces  from  1949  through  1974  by 
ordinary  leasi-squares  regression.  All  coefficients 
were  of  the  expected  sign  and  had  large  t-values. 
Meilke  interpreted  these  results  as  showing  that 
"final  payments  have  an  effect  on  acreage”  (ref.  13, 
p.  574). 

The  fourth  study  reviewed,  by  Meilke  and  Kramar 
(ref.  14),  analyzed  individual  acreage  response  equa- 
tions for  corn,  oats,  barley,  soybeans,  mixed  grain, 
and  winter  wheat  in  Ontario.  Lagged  barley  yields, 
lagged  barley  prices,  and  lagged  acreage  of  winter 
wheat  were  the  independent  variables  used  in  the 
winter  wheat  equation.  Predictions  of  winter  wheat 
acreage  for  1972  and  1973  using  the  model  were  con- 
sidered “adequate,”  although  naive  forecasts  (made 
by  regressing  acreages  in  year  t on  acreages  in  year 
t — 1)  gave  estimates  of  actual  winter  wheat  acreage 
thai  were  about  as  accurate.  The  results  for  other 
crop  models  were  similar. 
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RATIO  MODKL  DEVELOPMENT 
PROCEDURES 

Procedures  used  to  develop  and  select  ratio 
models  were  similar  for  both  the  United  States  and 
Canada.  However,  because  of  basic  differences  in 
agricultural  policies  and  other  factors,  the  resulting 
models  were  quite  dissimilar. 


Candidate  Variable  Selection 

The  acreage  response  literature  suggested  alterna- 
tive variables  for  initial  inclusion  in  the  ratio  models. 
Four  categories  of  variables  were  indicated,  repre- 
senting changes  in  (I)  economic  conditions,  (2) 
government  policies,  (3)  historical  crop-livestock 
patterns,  and  (4)  physical  conditions  such  as  annual 
climatic  variations. 

Ei  onmk  variables .—  All  else  being  equal  changes 
in  planted  acreages  of  wheat  and  other  small  grains 
should  depend  primarily  on  the  expected  net  income 
relationship  of  these  confusion  crops  prior  to  plant- 
ing. Of  course,  the  net  income  per  acre  from  each 
crop  is  equal  to  the  gross  income  (yield  per  acre  in 
bushels  times  the  farm  price  received  per  bushel) 
minus  costs.  Unfortunately,  the  net  incotrr  farmers 
expect  to  receive  for  their  crops  when  marketed  is 
not  directly  observable.  Time-series  cost  data  are 
generally  not  available  lor  the  different  confusion 
crops,  but  the  relative  costs  of  producing  the 
different  confusion  crops  generally  change  little 
from  year  to  year.  Thus,  changes  in  expected  gross 
income  relationships  should  be  an  acceptable 
substitute  for  net  income.  Further,  because  small 
grains  yields  tend  to  respond  similarly  to  climatic 
conditions,  year-to-year  variations  in  yields  are  cor- 
related. Hence,  price  is  likely  to  be  the  major  eco- 
nomic factor  causing  annual  adjustments  in  confu- 
sion c.  - p ratios. 

Unfortunately,  neither  the  yields  nor  the  prices 
farmers  expect  to  receive  for  their  crops  when 
marketed  are  directly  observable.  Future  prices  are  a 
possible  proxy;  however,  future  prices  were  not 
available  for  all  confusion  crops.  More  commonly, 
some  combination  of  past  prices  is  used  to  represent 
expected  prices,  a procedure  based  on  the  highly 
plausible  assumption  that  past  experiences  deter- 
mine farmers’  price  expectations.  After  selecting 
historical  prices  as  a basis  for  farmers'  price  expecta- 
tions, considerable  latitude  remains  in  choosing  the 
precise  means  by  which  past  price  information  i 


reflected  in  the  acreage  response  model,  An  obvious 
approach  is  to  include  prices  lagged  one  or  more 
periods  as  independent  variables;  however,  alterna- 
tive specifications  may  be  weighted  or  unweighted 
moving  averages  of  prices  for  several  periods,  a 
measure  of  relative  price  changes  between  two 
periods,  and  nonlinear  transformations  such  as  price 
squared  or  the  log  of  price.  In  the  ratio  model 
analysis,  several  alternative  price  and  yield  combina- 
tions were  analyzed. 

Agricultural  program  variables. — Both  the  U.S.  and 
Canadian  Govern  menu  have  taken  an  active  role  in 
determining  wheat  production,  though  it  might  be 
fair  to  characterize  this  role  as  an  unintended  one. 
Most  production  adjustment  actions  were  under- 
taken to  support  income  maintenance  programs 
rather  than  as  production  management  programs  per 
se.  Nevertheless,  these  programs  have  historically 
played  an  important  role  in  determining  small  grains 
acreage  adjustments.  The  Canadian  and  U.S.  pro- 
grams were  sufficiently  different  to  justify  separate 
treatment  here. 

Canadian  policy:  In  contrast  to  the  United  States, 
Canada  employs  a marketing  board  as  the  central 
feature  of  iu  wheat  program  (ref.  15).  The  CWB  en- 
joys monopolistic  power  in  both  the  procurement 
and  the  disposition  of  all  wheat  grown  in  the  prairie 
provinces  and  the  Peace  River  area  of  British  Colum- 
bia that  is  destined  for  export  or  sale  among  prov- 
inces (feed  wheat  excepted).  Prior  to  August  1, 1974, 
the  CWB  had  similar  control  over  feed  wheat,  oats, 
and  barley  grown  in  the  Canadian  West,  but  its  juris- 
diction has  since  been  limited  to  the  export  market 
for  feed  grains,  while  the  private  sector  has  been 
allowed  to  carry  out  domestic  feed-grain  trade.  In  an 
effort  to  provide  price  stability  for  wheat,  oats,  and 
barley,  the  CWB  employs  “price  pooling."  Under 
this  system,  producers  receive  an  initial  payment, 
usually  well  below  current  market  prices,  on  delivery 
of  their  crop  to  country  elevators  which  act  as  agents 
of  the  CWB.  A final  payment  is  made  based  on  CWB 
net  revenues  from  marketing  the  crop. 

Jolly  (ref.  16)  has  identified  two  major  effects  of 
price  pooling.  First,  because  final  payments  are  not 
generally  made  until  6 months  after  the  dose  of  the 
marketing  year,  and  payments  are  occasionally 
delayed  an  additional  6 months  after  that,  producers 
must  make  production  and  consumption  decisions 
on  the  basis  of  incomplete  price  information.  Sec- 
ond, since  the  pooling  system  averages  out  within- 
season  price  variation,  the  market  mechanism  does 
not  serve  to  distribute  deliveries  over  the  year.  The 
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second  effect  gives  rise  to  an  alternative  method  of 
distributing  crop  deliveries,  the  Grain  Delivery 
Quota  System.  Under  this  system,  producers  are 
allowed  to  market  a portion  of  their  tout  quota  at 
successive  intervals.  Quotas  were  determined  by 
several  criteria  during  the  study  period  and  most  re* 
cently  by  “assigned  acreage,”  which  is  defined  as  (1) 
land  seeded  to  wheat,  oats,  barley,  rye,  rapeseed,  and 
flaxseed;  (2)  land  in  summer  fallow;  (3)  land  in 
miscellaneous  crops;  and  (4)  land  seeded  to  peren- 
nial forages  up  to  a maximum  of  one-third  of  the 
total  land  in  the  other  three  categories.  Changes  in 
quotas  are  expected  to  cause  changes  in  the  relative 
confusion  crop  acreages. 

The  LIFT  program  was  an  important  departure 
from  historical  Canadian  programs.  Enacted  in 
1970-71  for  I year,  LIFT  was  designed  to  reduce  bur- 
densome wheat  inventories  which  had  accumulated 
during  the  late  1960's.  Producers  were  given  a strong 
incentive  to  reduce  wheat  acreage  and,  indeed, 
seeded  acreage  decreased  by  SO  percent  from  the 
1969-70  acreage. 

The  attempt  to  include  Canadian  policy  in  the 
ratio  models  introduces  a large  set  of  candidate  varia- 
bles which  might  theoretically  be  used  to  explain 
Canadian  ratios.  Possible  variables  suggested  for  test- 
ing inelu  'ed  initial,  interim,  md  final  CWB  pay- 
ments to  producers  for  wheat,  oats,  barley,  and  rye. 
Minimum  support  prices  for  wheat  for  domestic 
needs;  data  on  the  various  CWB  marketing  quotas, 
particularly  the  so-called  general  quotas  and  the 
acreage  factors  (variously  referred  to  as  specific 
acreage,  assigned  acreage,  etc.)  used  to  compute 
quotas;  and  information  on  diversion  payments 
made  under  the  LIFT  program  v.crc  also  considered. 

L’.S.  program • The  study  period  used  in  develop- 
ing the  U.S.  ratio  models  (!96>*through  1976)  in- 
cluded iwo  rather  dramatically  different  agricultural 
policy  environments  (ref.  17).  Within  these  policy 
periods,  important  annual  adjustments  in  farm  pro- 
grams were  made.  Before  1970,  wheat  production 
was  strongly  influenced  by  the  U.S.  Department  of 
Agriculture  (USDA)  through  an  acreage  allotment 
program.  This  program  was  voluntary  (unlike  its  pre- 
decessor prior  to  1963)  in  that  farmers  who  did  not 
comply  with  their  allotments  were  not  fined  or 
otherwise  penalized  except  by  being  denied  access  to 
government  price  and  income  support  programs. 
Economic  incentives  for  complying  with  program 
provisions  were  nonetheless  strong,  since  participat- 
ing producers  received  price  support  loans,  market- 
ing certificates  redeemable  for  cash,  and  payments 


for  diverting  additional  acreage  from  their  allot- 
ments, whereas  noncomplying  producers  received 
no  direct  benefits.  Similar  programs  were  applied  to 
feed  grains  such  as  corn  and  barley  and,  at  various 
times,  to  oats  and  flax. 

Under  the  Agricultural  Act  of  1970,  beginning 
with  the  1971  crop,  the  use  of  acreage  allotments  was 
suspended  for  wheat  and  feed  grains.  Instead,  pro- 
ducers were  required  to  keep  a certain  percentage  of 
their  total  cropland  out  of  production  in  order  to  be 
eligible  for  price  supports.  However,  they  were  free 
to  plant  whatever  crops  they  desired  on  the  remain- 
ing land.  Allotments  were  retained  only  in  the 
limited  sense  that  they  were  used  to  apportion 
domestic  marketing  certificates,  which,  to  producers, 
were  worth  the  difference  between  the  wheat  parity 
price  and  the  average  market  price  during  the  first  5 
months  of  the  ysar.  A factor  here  was  the  provision 
which  required  a producer  to  plant  allotted  wheat 
acreage  to  maintain  the  allotment  for  certificate  pur- 
poses. Compared  to  earlier  programs,  increased 
substitution  was  allowed  between  wheat  and  feed 
grains.  In  fact,  one  of  the  principal  aims  of  abandon- 
ing the  old  allotment  system  was  to  allow  the  market 
to  allocate  land  among  crops.  The  shift  in  policy  was 
continued  by  the  Agriculture  and  Consumer  Protec- 
tion Act  of  1973.  legislation  written  in  the  wake  of  a 
widely  perceived  shift  from  surplus  to  shortage  in 
the  world  wheat  market,  The  intent  of  this  legislation 
was  to  encourage  expanded  production  (ref.  15.  p. 
19).  An  innovation  of  the  1973  act  was  the  introduc- 
tion of  the  target  price  concept,  under  which  pro- 
ducers  were  paid  the  difference  between  the  target 
price  and  the  average  market  price  for  the  first  5 
months  of  the  marketing  year,  As  in  the  1970  act. 
allotments  were  retained  as  the  basis  for  program 
payments. 

Candidate  variables  designed  to  reflect  the 
agricultural  policy  environment  during  the  early 
years  of  the  1973-76  period  should  obviously  include 
allotted  acreages  for  wheat  and  competing  crops,  as 
well  as  acreage  diverted  for  payment  from  those 
allotments.  Price  support  loan  rates,  the  dollar  value 
of  diversion  payments,  and  the  value  of  wheat 
marketing  certificates  are  also  relevant.  Changes  in 
acreage  allotments  and  diversion  incentives  are  ex- 
pected to  have  a direct  effect  on  planted  acreages. 
Since  much  of  the  government's  price  support  ac- 
tivity was  conducted  through  the  nonrecourse  loan 
program,  levels  of  Commodity  Credit  Corporation 
(CCC)  grain  stocks  are  another  possible  explanatory 
variable.  These  same  variables  (or  their  equivalents) 


28 


might  also  have  a role  in  explaining  ratio  adjust- 
ment* after  1973,  but  any  model  incorporating  them 
should  consider  the  emergence  of  a policy 
specifically  intended  to  reduce  the  role  of  the  govern- 
ment in  producers'  decisions. 

Historical  crop  livestock  patterns The  emphasis 
so  far  has  been  on  modeling  farmers'  responses  to 
changing  economic  and  governmental  policy  signals. 
It  seems  reasonable  to  assume,  however,  that  these 
responses  are  conditioned  by  historical  crop  and 
livestock  production  patterns.  Farmers  may  have 
difficulty  adjusting  to  changing  conditions  as  fully 
and  as  quickly  as  they  would  like  to  adjust.  Annual 
adjustments  arc  limited  ( I ) because  farmers  are  com- 
mitted to  a particular  rotation  schedule,  (2)  because 
they  have  invested  in  specialized  equipment  while 
changing  cropping  patterns  may  require  additional 
machinery  investment  with  unknown  payoffs,  or 
(3)  because  local  conditions  make  the  marketing  of  a 
particular  crop  more  desirable  than  is  readily  ap- 
parent from  CRD-  or  state-level  data.  Furthermore, 
farmers  may  occasionally  continue  a particular  pat- 
tern of  crop  production  ‘rom  sheer  force  of  habit, 
regardless  of  the  current  economic  signals  they  may 
be  receiving.  Thus,  it  was  expected  that  regional 
crop/livestovt  patterns  might  tend  to  constrain 
rather  than  lead  crop  adjustments.  In  terms  of  the 
ratio  project,  the  primary  means  suggested  for  allow- 
ing for  historical  production  patterns  was  to  include 
the  lagged  dependent  variable  as  a possible  ex- 
planatory variable. 

Physical  factors.—  Some  of  the  most  important 
factors  determining  acreage  patterns  are  physically 
based  rather  than  socially  oriented.  Two  obvious  ex- 
amples are  soils  and  climate.  Interregional 
differences  in  normal  climatic  states  arc  important 
factors  explaining  interregional  differences  in  pro- 
duction patterns.  Together  with  the  factor  of  differ- 
ing soils,  interregional  climatic  differences  help  to 
justify  the  development  of  separate  models  for  each 
region  (CRD  or  CD)  of  interest. 

However,  although  explaining  interregional 
differences,  physical  factors  arc  normally  somewhat 
less  important  causes  ol  year-to-year  changes  in  crop 
acreage.  Although  varying  among  regions,  for  all 
practical  pui  poses,  soils  are  constant  over  time.  Cli- 
mate. on  the  other  hand,  does  exhibit  variation 
through  time;  and  annual  climatic  changes  could 
conceivably  help  explain  annual  changes  in  crop 
ratios.  For  example,  in  areas  where  it  is  possible  to 
substitute  spring  wheat  lor  w inter  wheat,  the  SW/CiR 
ratio  might  very  well  be  affected  by  the  severity  of 


winterkill.  Likewise,  weather-delayed  planting  dates 
could  affect  production  patterns  as  farmers 
substitute  crops  with  a shorter  growing  season,  Con- 
siderations such  as  soil  moisture  levels  at  planting 
could  also  induce  producers  to  substitute  one  crop 
for  another. 


Dtta  Collection  and  Limitation* 

Data  needs  and  collection  problems  differed  be- 
tween Canada  and  the  United  States. 

Canadian  data.—  Data  were  available  for  a much 
longer  time  period  in  Canada.  The  existence  of  data 
for  all  candidate  variables  allowed  CD  model  coeffi- 
cients to  be  estimated  over  the  1948-49  to  1976*77 
period.  Nearly  all  Canadian  agricultural  statistics  are 
gathered  on  a marketing-year  basis.  The  marketing 
year  begins  on  August  1 and  ends  the  following  July 
31.  Thus,  1976-77  indicates  the  year  beginning 
August  I,  1976.  and  ending  July  31,  1977.  However, 
in  1976-77  crop  year  in  Saskatchewan,  crops  are 
planted  in  late  April  and  May  of  1976. 

Data  were  collected  for  two  types  cf  variables: 
those  specific  to  particular  regions  and  those  which 
can  be  applied  across  regions  (ref.  2).  Region-specific 
data  included  CD-  or  zone-level  acreage  estimates  for 
oats,  barley,  rye,  mixed  grains,  flax,  rapesecd,  and 
wheal.  Nonspecific  data  included  on-farm  stocks  of 
wheat  at  the  beginning  of  the  crop  year;  total  wheat 
production;  prices  for  wheat,  barley,  flax,  rapesecd. 
and  rye;  CWB  exports  of  wheat;  marketings  of  wheat 
by  producers  to  the  CWB  from  the  beginning  of  tne 
crop  year  to  March  I;  the  CWB  initial  buying  price 
for  wheat;  and  the  CWB  selling  price  for  wheat  (ref. 
2).  The  CD  acreage,  yield,  and  production  data  were 
collected  from  the  Saskatchewan  Office  of  Statistics 
(ref.  18).  Other  data  are  available  from  national 
publications  of  Statistics  Canada  (refs.  I*»  u>  22). 

i'.S.  data.—  The  most  limiting  and  difficult  data 
base  to  construct  was  the  long-term  historical  infor- 
mation on  CRD  agricultural  program  factors.  The 
construction  of  CRD-level  agricultural  program 
variables  required  aggregating  county-level  data  on 
each  of  several  program  features.  Program  details 
varied  from  year  to  year,  and  the  method  of  report- 
ing varied  by  state.  Basic  data  for  constructing  the 
various  agricultural  program  variables  were  provided 
by  the  national  and  individual  state  offices  of  the 
USD  A Agricultural  Stabilization  and  Conservation 
Service  (ASCS)  for  the  1966-76  period.  The  official 
policy  is  to  maintain  national,  state,  and  county  pro- 
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gram  summaries  for  10  years  only.  In  most  cases, 
however,  ASCS  personnel  at  each  state  office 
generously  provided  personal  copies  of  published 
ASCS  data  for  1963  through  1966  (refs.  33  to  26). 
Nevertheless,  county  data  were  not  available  for 
Minnesota  in  1964  and  1965  or  for  Montana  in  1963. 
Because  of  major  differences  in  agricultural  pro- 
grams prior  to  1963  and  the  extreme  difficulties  or 
the  impossibility  of  collecting  earlier  county-level 
data,  it  was  decided  to  limit  U.S,  data  bases  to  the 
post-1962  period. 

Another  data  limitation  was  the  lack  of  timely 
CRD-leve!  price  data  on  confusion  crops  for  each  of 
the  four  states.  Timely  data  were  particularly  lacking 
for  monthly  price  series  at  the  CRD  level.  However, 
in  those  states  where  CRD-level  crop  prices  were  re- 
ported, the  size  of  the  differences  among  monthly 
CRD  prices  was  generally  small  (less  than  5 per- 
cent). Also,  because  only  state-level  data  are 
published  in  time  to  meet  operational  requirements, 
suic  prices  were  used  in  developing  CRD  models. 
These  prices  are  published  by  the  USDA  Economics, 
Statistics,  and  Cooperatives  Service  (ESCS.  formerly 
the  Statistical  Reporting  Service  (SRS)). 

State  and  national  data  on  quarterly  grain  stocks 
for  1961  through  1976  were  collected  from  ESCS 
sources  (ref.  27).  Also,  CRD  data  on  mean  mos,;My 
precipitafion  and  mean  monthly  temperatures  for 
1961  through  1976  were  tabulated  from  National 
Weather  Service  (NWS)  records. 

The  CRD  data  on  acreage,  yield,  and  production 
for  the  individual  confusion  crops  were  generally 
available  from  the  published  data  of  the  ESCS  state 
office.  Also,  CRD  acreage  data  for  the  1976  crop  year 
were  not  available  until  the  final  stages  of  the  study. 

Primarily  because  of  limited  historical  data,  the 
preliminary  CRD  models  were  estimated  using  data 
for  (1)  Minnesota,  1966  through  1975;  (2)  Montana. 
1964  through  1975;  (3)  North  Dakota,  1963  through 
1975;  and  (4)  South  Dakota,  1963  through  1975.  Data 
for  1976  were  added  before  developing  the  models 
for  predicting  1977  ratios.  Thus,  final  models  were 
developed  using  II  observations  for  Minnesota,  13 
observations  for  Momma,  and  14  observations  each 
for  North  and  South  Dakota. 


Modol  Selection  Procedures 

Multiple  regression  analysis  was  used  for 
parameter  estimation  (ref.  28).  Several  statistical 
tests  were  used  as  aids  in  selecting  the  “best"  model 


from  h nun  ber  of  alternatives.  Since  the  purpose  of 
the  ratio  modeling  effort  was  to  produce  predictions, 
the  “best"  model  is  the  one  with  the  greatest  predic- 
tive ability.  Indicators  of  a model's  predictive  ability 
include  the  coefficient  of  multiple  determination  K:> 
the  F-tcsl  for  significance  of  the  overall  regression, 
the  consistency  of  coefficient  signs  with  economic 
theory  and  other  a priori  information,  the  t-valuc 
tests  of  coefficient  significance,  the  mean  square 
error,  and  the  Durbin-Watson  d-staiistic.  Unfor- 
tunately, none  of  these  tests  provide  a completely 
conclusive  indicator  of  a model's  predictive 
capability,  and  the  final  choice  of  a “best"  mode!  re- 
mains partly  a matter  of  judgment. 

The  general  procedure  for  the  ratio  study  was  to 
specify  several  plausible  alternative  models  using 
data  through  1975  and  then  to  evaluate  them  by  the  ^ 
available  statistical  tests.  This  exercise  was  supple- 
mented by  generating  extra-sample  predictions  for 
the  1976  crop  ratios.  This  allowed  1 year.  1976.  for 
comparing  predicted  and  observed  values.  The 
models  that  showed  the  greatest  promise  were  then 
subjected  to  further  analysis  and  refinement  until  the 
apparent  “best"  model  emerged. 

Because  of  the  large  number  of  models  required, 
the  analysts  was  begun  developing  preliminary 
state-  or  province-level  models  to  use  as  guides  in 
variable  selection  for  CRD  and  CD  models.  These 
preliminary  studies  were  intended  io  narrow  the 
range  cf  possible  independent  variables  or  model 
alternatives  to  be  considered.  A principal  rationale 
for  employing  state-  or  province-level  data  was  that 
the  development  of  individual  models  tailored  to 
each  CRD  or  CD  would  have  required  more  research 
resources  than  were  available  and  might  have 
resulted  in  a product  that  was  operationally  cumber- 
some. 

Instead,  the  intention  was  to  produce  for  each 
state  or  province  a general  model  (for  each  ratio 
type)  that  would  retain  the  same  basic  variables  for 
all  CRD's  or  CD's  while  allowing  th'*  <*stimatcs  of  the 
coefficients  to  vary.  In  the  early  stt^-.s  of  the  study, 
this  procedure  was  also  necessary  because  CRD  and 
CD  data  were  unavailable  for  analysis.  However,  in 
both  countries,  modifications  were  made  at  the 
statc/prcvincc  tnd  CRD/CD  level  as  the  researchers 
tested  and  compared  model  results.  Although  the 
procedures  used  in  the  development  of  the  U.S.  and 
Canadian  models  were  similar,  enough  differences 
existed  to  justify  the  reporting  of  each  separately. 

Canadian  model  development.— In  preliminary 
work,  the  ratio  of  spring  wheat  to  spring  grains  was 
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hypothesized  to  be  a linear  function  of  several  eco- 
nomic, technological,  and  policy  variables  (ref.  2,  pp. 
33-37).  A crop  acreage  ratio  response  function  for 
Saskatchewan  was  estimated  from  a set  of  variables 
consisting  of  the  annual  percentage  change  in  (1)  the 
price  of  wheat,  (2)  the  price  of  rapesecd,{3)  the  price 
of  rye,  (4)  the  price  of  barley,  (5)  the  total  production 
of  wheat  lagged  1 year,  (6)  the  total  on-farm  stocks  of 
wheat,  (7)  the  dummy  variable  to  account  for  the 
effects  of  the  LIFT  program  instituted  in  1970,  (8) 
wheat  exports,  and  (*?)  the  SW/SG  confusion  crop 
ratio  lagged  1 year.  A more  detailed  definition  of 
these  variables,  the  preliminary  model  parameters, 
and  the  estimated  coefficients  can  be  found  in 
reference  2. 

From  this  set  of  indej  ..i.  ’ent  variables,  a forward- 
selection  stepwise  estim-  jn  procedure  (found  in 
several  computer  multiple-regtession  packages)  was 
used  for  further  analysis  o select  those  variables 
which  increased  the  R\  decreased  the  variance,  and 
had  the  theoretically  expected  sign.  The  significance 
of  the  t-values  of  the  coefficients  was  ignored 
because  a predictive  model  was  desired  as  the  end 
result.  (The  Canadian  and  U.S.  models  were 
developed  separately  (refs.  1 and  2).  The  researcher 
who  developed  the  Canadian  models  argued  that 
“since  the  purpose  of  this  research  is  to  generate  a 
‘predictive'  model  and  not  an  ‘explanatory’  model 
(where  i-values  for  individual  coefficients  become 
highly  relevant),  there  is  no  compelling  reason  to  test 
for  the  significance  of  the  coefficients''  (ref.  2,  p.  35). 
This  contention  is  supported  by  Johnston  (ref.  29). 
More  emphasis  should  be  placed  on  a significant 
F-test  and  the  size  of  the  overall  variance.) 

Additional  analysis  resulted  in  two  important 
changes:  the  removal  of  the  dummy  variable  and  the 
dropping  of  four  other  variables  (the  percentage 
change  in  the  price  of  wheat,  rapeseed,  rye,  and  the 
production  variable).  The  dummy  variable  was  in- 
cluded in  the  first  model  to  account  for  the  effects  of 
the  LIFT  program.  Although  it  was  one  of  the  most 
significant  variables  in  that  model,  dummy  variables 
provide  problems  in  a predictive  model.  Because  it 
was  desired  to  be  able  “to  predict  the  efforts  of 
events  such  as  the  LIFT  program  without  having 
prior  information  about  parameters  on  variables 
representing  them  ...  it  was  deemed  desirable  to 
test  whether  a model  could  be  developed  which  had 
satisfactory  predictive  capabilities  without  including 
policy  variables"  (ref.  2,  p.  35). 

Four  other  variables  were  deleted  from  the 
preliminary  model  as  a result  of  additional  work 


done  at  the  zone  level.  The  forward-selection  step- 
wise estimation  procedure  was  applied  to  each  of  the 
nine  zones  (fig.  2),  and  only  those  variables  which 
had  the  theoretically  expected  sign  Tor  the  coeffi- 
cients in  all  nine  zones  were  included  in  the  final 
model.  As  a result  of  this  procedure,  the  final  model 
included  only  four  independent  variables— wheat  ex-  >■ 
ports,  on-farm  stocks  of  wheat,  the  lagged  dependent 
variable,  and  the  percentage  change  in  the  price  of 
barley. 

The  final  model  and  a discussion  of  the  relevance 
and  the  predictive  ability  of  the  mode)  for  Canada  is 
presented  in  the  section  entitled  “Results  and  Dis- 
cussion." 

U.S.  model  development .—  Preliminary  models  for 
each  of  the  four  states  were  developed  separately  as  a 
guide  for  developing  CRD  models  in  each  state  (ref. 
1).  The  general  procedure  used  in  initial  model  for- 
mulations involved  selecting  four  or  five  policy,  eco- 
nomic, and  dummy  variables;  running  the  regres- 
sion; and  then  examining  the  computer  run.  In 
selecting  a particular  model  formulation,  emphasis 
was  placed  on  the  significance  of  individual  t-va!ues 
for  regression  coefficients,  the  coefficient  of  multiple 
determination  R\  the  mean  square  error,  and  the 
Durbin-Watson  test  statistic. 

The  more  numerous  agricultural  program  changes 
and  the  much  more  limited  data  base  available  in  the 
United  States  necessitated  a different  development 
procedure  than  that  followed  for  Canada.  Because  a 
maximum  of  13  years  (1963  through  1975)  of  CRD 
data  were  available,  initial  model  formulation  was 
based  on  the  need  for  frugality  in  the  use  of  indepen- 
dent variables  to  maintain  a sufficient  number  of 
degrees  of  freedom  for  statistical  validity, 

One  of  the  consequences  of  the  short  data  base 
was  the  limited  ability  to  make  use  of  a stepwise 
selection  procedure  for  choosing  among  parameters. 
Alternative  model  possibilities  and  variable  formula- 
tions had  to  be  tested  largely  without  the  benefit  of 
this  statistical  tool.  “Curve  fitting,”  or  selecting  a 
model  which  fits  the  sample  data  well  but  has  no  un- 
derlying validity,  is  always  a danger  when  using 
model  selection  routines  such  as  the  stepwise  pro- 
cedure; this  is  particularly  true  when  there  are  a large 
set  of  candidate  regressors  and  a small  sample,  as  in 
the  U.S.  study. 

Although  several  combination  variables  consist- 
ing of  ratios  of  gross  income  from  the  confusion 
crops  or  ratios  of  confusion  crop  prices  were  tested  in 
preliminary  model  analysis,  the  best  economic  vari- 
able appeared  to  be  the  difference  between  price 


31 


series  on  competing  crops;  for  example,  wheat  price 
minus  oats  price.  Also,  this  reduced  the  number  of 
independent  variables  in  the  model. 

The  importance  of  agricultural  adjustment  pro* 
grams  in  determining  confusion  crop  acreage,  the 
high  frequency  of  changes  in  program  provisions 
during  the  period,  and  the  varying  impact  among 
states  due  to  different  farmer  participation  rates  indi- 
cated the  necessity  for  Finding  a succinct  way  to  cap- 
ture the  impacts  of  program  changes.  The  main  task 
was  to  specify  variables  which  could  separate  the 
put  effects  of  agricultural  adjustment  programs 
from  economic  predictor  variables.  As  the  programs 
were  generally  not  important  in  planting  decisions 
after  1973,  no  effort  was  made  to  develop  a variable 
which  could  predict  the  impacts  of  the  programs.  (In 
1978,  it  is  likely  that  a program  variable  would  again 
be  needed.)  1116  limited  information  at  the  CRD 
level  and  the  limited  resources  precluded  a search  for 
such  predictive  variables. 

The  preliminary  model  results  for  the  United 
States  were  generally  disappointing.  Numerous  com- 
binations of  price  and  gross  income  variables  for  the 
confusion  crops  were  tested  in  these  preliminary 
models;  however,  little  progress  in  model  develop- 
ment was  made  until  the  1976  CRD  data  bases 
became  available  Tor  analysis. 

The  major  problem  in  the  early  stages  of  the  study 
was  the  failure  to  fully  recognize  the  nature  of  the 
economic  adjustment  that  was  occurring  after  1972. 
Beginning  in  1972,  high  export  demands  caused  farm 
prices  to  substantially  exceed  price  support  levels  for 
the  first  time  during  the  1963-76  period.  Prior  to  this 
time,  the  most  important  factors  determining 
acreage  adjustments  in  the  United  States  were  policy 
related.  Until  the  1976  acreage  data  became  available, 
the  nature  of  the  structural  changes  taking  place  was 
not  recognized  by  the  researchers,  With  the 
availability  of  1976  acreage  data,  it  became  obvious 
that  farmers  were  rapidly  adjusting  wheat  production 
to  the  “free  market"  conditions  existing  since  1973. 
Their  reaction  to  “free  market*'  conditions  became  a 
major  variable  in  the  models  developed  to  predict 
1977  ratios.  The  nature  of  the  farmers'  response  to 
“free  market"  prices  was  not  readily  apparent  using 
the  3 years  of  data  from  1973  through  1975.  The  ad- 
justment during  these  3 years  had  been  unidirec- 
tional. The  nature  of  the  adjustment  became  known 
as  farmers  responded  to  altered  price  relationships  in 
the  1976  crop  year.  This  is  in  contrast  to  Canada, 
where  pricing  policies  remained  fairly  consistent  dur- 
ing the  study  period. 

Although  climatic  variables  and  US.  grain  stock 
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variables  were  introduced  into  severe  of  the  models, 
, these  variables  did  not  appear  to  cause  improved  pre- 
dictive capability.  However,  some  of  the  events  ex- 
plained by  dummy  variables  appeared  to  be  dimate 
related.  A more  complete  discussion  of  the  alterna- 
tive models  tested  can  be  found  in  reference  1;  the 
model  forms  selected  for  predicting  1977  ratios  are 
presented  in  section  5.2  of  that  report. 


RESULTS  AND  DISCUSSION 

Because  the  model  variables  for  the  SW/SG  ratios 
were  identical  to  the  variables  for  the  SW/SG  ratios 
and  because  the  model  variables  Tor  predicting 
planted-acreage  ratios  (in  those  states  where  pre- 
dicted) were  identical  to  the  variables  in  the  ratio 
models  for  harvested  acreage,  only  the  final  models 
used  to  predict  the  SW/SG.  WW/WG,  and  WW/GR 
ratios  for  harvested  acreage  are  reported.  (The  model 
coefficients  and  the  ratio  predictions  for  1976  and 
1977  for  those  results  not  reported  here  are  available 
from  the  authors  as  a statistical  appendix.)  Although 
the  statistical  procedures  used  to  develop  and  select 
ratio  models  were  similar  for  both  the  United  Slates 
and  Canada,  basic  differences  between  the  two  coun- 
tries in  agricultural  policies  and  other  factors  caused 
the  model  forms  for  each  country  to  be  different. 


Raaulta  for  Canada 

t /'»</<*/  form. — The  general  form  of  the  SW/SG 
ratio  model  provided  for  each  Cl),  for  each  /one,  and 
for  the  province  was  as  follows, 

SW//SG ; = + 11,1X1'^'  + MVV'  ’/sc;  1 

+ ^’s\v~  + *M;*har 


where 

SW‘  acres  of  spring  wheal  harvested 
in  year  I in  geographic  unit  i.  For 
example,  if  » equals  the  1 V??- 78 
crop  year  which  begins  August 
I,  1977.  and  ends  July  31.  IV78, 
then  that  crop  is  planted  in 
April  and  May  of  I977,and  /can 
he  any  of  20  CD’s.  9 /ones,  or 


the  province. 

total  acres  of  spring  grain  confu- 
sion crops  (spring  wheat,  spring 
rye,  oats,  barley,  mixed  grains, 
and  buckwheat)  harvested  in 
year  t in  geographic  unit  i 
exports  of  total  wheat  in  bulk 
(millions  of  bushels)  from 
August  1 to  March  1 in  crop 
year / - 1 

the  previous  year's  spring  confu- 
sion crop  ratio;  lagged  depen- 
dent variable.  (Following 
Nerlove's  technique  (ref.  30), 
this  amounts  to  a distributed 
lag.) 

the  total  on-farm  stocks  of 
wheat  (millions  of  bushels)  car- 
ried into  crop  year  t — 1;  where, 
for  predicting  the  1977-78  crop 
year  ratio,  the  average  for  CI^ 
is  equal  to  the  July  31, 1976,  on- 
farm  storage. 

the  change  in  price  (dollars  per 
bushel)  of  grade  1 Canadian 
western  six-row  barley,  from 
March  1 in  crop  year  t — 2 to 
March  I in  crop  year  / — I. 
Calculated  by  the  formula 

(^bar  “ /,BAR^BAR>  where 
^bar'8  thc  Mwrch  1 CWB  export 
selling  price  at  Thunder  Bay  in 
crop  year  t — 1. 

Table  VII  reports — for  each  CD,  zone,  and 
province — the  ordinary  least-squares  estimates  of 
the  coefficients  for  the  variables,  the  t-values  for  the 
estimates  of  the  coefficient,  and  the  coefficient  of 
multiple  determination  R}  for  each  equation.  Each  of 
the  four  independent  variables  has  the  theoretically 
expected  sign  in  every  CD  and  zone  and  in  the  prov- 
ince model.  The  sign  of  the  coefficients  for  the 
change  in  the  price  of  barley  and  for  wheat  stocks 
carry-in  arc  expected  to  be  negative,  as  increases  in 
either  would  be  expected  to  decrease  the  economic 
returns  to  wheat  production.  For  example,  a rise  in 
the  price  of  barley  would  be  expected  to  cause  some 
movement  into  barley  production.  When  wheat 
stocks  are  high,  an  inducement  exists  for  the  eco- 
nomically rational  producer  to  shift  to  another  crop. 
The  signs  on  the  coefficients  for  the  lagged  depen- 
dent variable  SW|""I/SGJ”1  and  the  annual  exports 


EXPjvy1  were  positive  as  expected.  An  increase  in 
exports  usually  indicate  a better  price  for  wheat  and 
a tendency  for  farmers  to  move  into  wheat  produc- 
tion. The  lagged  dependent  variable  indicates  a ten- 
dency on  the  part  of  some  farmers  to  follow  recent 
production  practices,  partly  because  farmers  give  less 
than  full  credibility  to  current  market  signals. 

Based  on  an  examination  of  t-values  in  table  VII, 
the  most  significant  variables  were  the  export  and 
the  lagged  dependent  variables,  which  are  generally 
significant  at  the  1-percent  level  of  probability.  The 
least  significant  variable  was  the  annual  percentage 
change  in  the  price  of  barley. 

The  total  regression  for  every  CD,  zone,  and  prov- 
ince was  significant  at  the  1-percent  level.  However, 
the  R1  values  were  generally  low,  particularly  at  the 
zone  and  CD  levels.  A low  R*  indicates  that  the 
model  does  not  explain  a considerable  amount  of  the 
variation  found  in  the  historical  SW/SG  ratio. 
However,  the  CD's  with  the  smallest  R1,  CD  4A  and 
CD  4B,  were  also  the  ones  with  the  smallest  variation 
in  the  historical  SW/SG  ratios.  The  variation  in  zone 
4 during  the  last  10  years  ranged  between  0.77S  and 
0.910,  compared  to  0.312  to  0.6S6  in  zone  9.  Thus, 
although  the  model  explained  little  of  the  total  varia- 
tion in  zone  4,  there  was  little  variation  to  be 
explained. 

Precision  and  accuracy  tests. — Tests  were  made  of 
the  accuracy  of  the  model  developed  to  predict  the 
SW/SG  ratio.  At  the  province  level,  10  single-year 
predictions  from  crop  year  1967-68  to  crop  year 
1976-77  were  made  using  the  “best”  model  and  the 
previous  year’s  ratio  (lagged  ratio).  In  table  VIII, 
these  results  are  compared  to  the  actual  ratios  for 
each  of  the  10  years.  The  predictions  with  the  “best” 
model  were  made  by  using  data  only  up  to  the 
1966-67  crop  year  for  the  first  year’s  prediction 
(1967-68).  Parameter  estimates  were  then  updated 
yearly  and  a prediction  made  for  each  of  the  follow- 
ing years, 

The  “best”  model  gave  a more  accurate  estimate 
of  the  actual  SW/SG  ratio  than  did  the  lagged  ratio  in 
7 of  the  10  years  (table  VIII).  The  “best"  model  was 
accurate  within  90  percent  of  the  actual  ratio  value  in 
9 out  of  10  years.  However,  the  lagged  ratio  was  also 
at  least  90-percent  accurate  in  9 of  the  10  years. 
Neither  the  "best”  model  nor  any  of  the  alternatives 
was  able  to  readily  predict  the  effects  of  the  LIFT 
policy  with  no  prior  information  on  program 
parameters  (ref.  2). 

Ten-year  tests  were  not  run  at  the  CD  level. 
However,  a paired  difference  test  (see  ref.  2 for  pro- 
cedure) was  made  between  the  predicted  ratio  values 
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of  the  model  for  1976-77  and  the  predictions  based 
on  using  last  year's  SW/SG  ratio.  A calculated 
t-value,  based  on  the  absolute  values  of  the  weighted 
CD  paired  differences,  shows  that  the  predicted  1976 
SW/SG  ratios  have  less  error  associated  with  them, 
and  tests  indicate  they  are  from  a different  popula- 
tion than  those  predictions  based  on  last  year's  ratio 
(table  IX).  Therefore,  the  ratio  predictions  from  the 
"best"  model  are  concluded  to  be  statistically  more 


accurate  (at  the  1 -percent  level  of  significance)  than 
the  present  LACIE  technique  of  using  last  year's 
ratio  (ref.  2,  p.  24). 

Model  predictions  for  1977. — Table  X shows  the 
1977  predicted  SW/SG  ratio  for  each  CD,  zone,  and 
province  using  the  ratio  model.  In  almost  all  cases, 
the  predicted  ratio  is  slightlv  larger  than  the  1976 
ratio.  Three  of  the  four  predictor  variables  are 
favorable  toward  increased  wheat  acreage  in  1977. 


Table  VII. — Saskatchewan:  The  Models  Chosen  to  Predict  the  SW/SG  Harvested 
Acreage  Ratios  for  Each  Crop  District.  Zone,  and  Province 

[Based  on  ref.  2.  table  I / 


Geographic 

unit 


Constant 


Ordinary  least-squares  estimates  of  the  model  coefficients 
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CD  1A 

0.175 

(1.95)b 

— 8.32E-5  l 

(-0.76) 

5.28E-4 

(2.53) 

-0.103  ( 

— 1.55)c 

0.643 

(4.91) 

66.5 

CD  IB 

.133 

<l.72)b 

— 7.23E-5 

(-.64) 

6.84E-4 

(3.14) 

-.108  ( 

— I.57)c 

.620 

(5.14) 

71.4 

CD  2A 

.401 

(3.99)a 

— 1.38E-4 

[ — 2.54)a 

4.00E-4 

(4.03) 

— 3.45E-2  ( 

-1.08) 

.463 

<3  80) 

71.4 

CD  2B 

.589 

(4.66)a 

— 3.08E-4 

|-4.25)a 

4.01  E-4 

(3.90) 

-3.68E-2  ( 

— 1.08) 

.269 

(1.93)b 

75.4 

CD  3 AS 

.299 

(2.99)a 

— 1.19E-4 

( — 2.78)a 

2.63E-4 

(3.32) 

-2.81E-2  ( 

-1.09) 

.627 

(5.35) 

73.0 

CD  3AN 

.407 

(4.01  )a 

-1.39E-4 

| — 2.36)b 

3.88E-4 

(3.49) 

— 4.I1E-2  ( 

-1.16) 

.462 

(3.69) 

66.0 

CD  3BS 

.250 

(2.27)b 

— 4.84E-5 

(-1.14) 

2.36E-4 

(3.02) 

— 1.53E-2 

(-.58) 

.665 

(4.97) 

65.9 

CD  3DN 

.572 

(5.60)a 

— 2.16E-4 

<5.46ia 

2.74E-4 

(4.78) 

— 1.23E-2 

(-.65) 

.324 

(2.89) 

83.3 

CD  4A 

.149 

(.94) 

— 8.17E-6 

(-.45) 

2.87E-4 

(2.71) 

— 4.88E-2  ( 

-1.27) 

.750 

(3.81) 

45.8 

CD  4B 

.408 

(3.04)a 

— 1.24E-4 

( — 2.72)a 

1.76E-4 

(2.26)b 

— 2.80E-3  ( 

— 1.09) 

.533 

(3.76) 

62.2 

CD  5A 

.210 

(2.76)a 

— I.42E-4 

( — 1 ,94)b 

6.15E-4 

*4.69) 

— 7.45E-2  ( 

— 1.75)b 

.574 

(5.37) 

78.3 

CD  5B 

.169 

(2.1 4)b 

-1.49E-4 

( — l.37)c 

6.94E-4 

(3.45) 

— 1.02E-1  ( 

— l.S7)c 

.541 

(4.20) 

68.3 

CD  6A 

.499 

<4.72)a 

— 3.10E-4 

!-4.52)a 

3.96E-4 

(3.63) 

-6.91E-2  ( 

— 1.85)b 

.367 

(3.06) 

78.0 

CD  6B 

.403 

(4.30)a 

— 2.48E-4 

( — 4.07)a 

4.45E-4 

(4.69) 

— 6.33E-2  ( 

— 2.01  )b 

.432 

(3.87) 

80.4 

CD  7 A 

.490 

(4.06)a 

— 3.57E-4 

( — 4.21  )a 

4.14E-4 

(3.99) 

-7.88E-3 

(-.23) 

.388 

(3.04) 

83.8 

CD  7B 

.337 

(3.92)a 

— 2.27E-4 

( — 2.83)a 

7.22E-4 

(5.23) 

-4.16E-2 

(-.93) 

.403 

(3.50) 

76.0 

CD  8A 

.188 

(2.34)b 

-1.41  E-4 

( — 1.54)c 

5.21E-4 

(3.21) 

— 9.35E-2  ( 

-1.75)b 

.546 

(4.15) 

66.7 

CD  8B 

.355 

(3.23)a 

— 2.89E-4 

( — 3.14)a 

4.40E-4 

(302) 

— 1.00E-1  ( 

-2.02)b 

.440 

(3.19) 

72.3 

CD  «’A 

.240 

(2.54)a 

— 2.10E-4 

— 2.27)b 

4.43E-4 

(3.02) 

— 1.20E-1  ( 

— 2.47)b 

.506 

(3.69) 

69.7 

CD  9C 

.237 

(2.1 3)b 

— 2.59E-4 

:-2.07)b 

4.92E-4 

(2.72) 

-9.66E-2  ( 

— 1.62)c 

.505 

(3.28) 

68.2 

Zone  1 

.135 

(1.78)b 

-7.15E-5 

( — .72) 

5.17E-4 

(3.01) 

- 1.02E-1  ( 

— 1.71)b 

.671 

(5.83) 

74.1 

Zone  2 

.529 

(4.49)a 

— 2.35E-4 

( — 3.76)a 

4.12E-4 

(4.25) 

— 3.72E-2  ( 

— 1.15) 

.324 

(2.41  )b 

72.3 

Zone  3 

.388 

(4.01  )a 

— 1.26E-4 

[ — 3.41  )a 

2.84E-4 

(4.32) 

-2.14E-2 

(-.99) 

.518 

(4.62) 

75.8 

Zone  4 

.338 

(2.24)b 

-7.47E-5 

( — l.6l)c 

2.20E-4 

(2.62) 

-1.55E-2 

(-.55) 

.571 

(3.37) 

49.4 

Zone  5 

.184 

(2.39)b 

— 1.41E-4 

( — 1.57)c 

6.54E-4 

(3.96) 

— 8.89E-2  ( 

— 1.67)c 

.565 

(4.80) 

73.5 

Zone  6 

460 

(4.64)a 

— 2.84E-4 

( — 4.42)“ 

4.20E-4 

(4.16) 

— 6.62E-2  ( 

— 1.95)b 

.390 

(3.40) 

79.4 

Zone  7 

.447 

(4.57)a 

— 3.07E-4 

(-4.26)“ 

5. 67  E-4 

(5.59) 

-2.36E-2 

(-.70) 

.357 

(3.08) 

82.8 

Zone  8 

.272 

(2.91  )a 

— 2.18E-4 

( — 2.52)“ 

4. 78  E-4 

(3.33) 

-9.94E-2  ( 

— 2.05)b 

.496 

(3.80) 

70.8 

Zone  9 

.240 

(2.37)b 

— 2.29E-4 

( — 2.21 )b 

4.61  E-4 

(2.91) 

— 1.1  IE-1  ( 

— 2.11)b 

.505 

(3.51) 

69.2 

Province 

323 

(3.68)a 

— 1.73E-4 

( — 3.04)a 

4.30E-4 

(4.51) 

— 6.28E-2  ( 

— 1.98)b 

.505 

(4.51) 

77.4 

■•Significant  at  the  l-perccnt  level 
^Significant  at  the  5-pcrcent  level 
Significant  at  the  10-perceni  level 
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The  price  of  barley  declined  from  1976  to  1977,  the 
lagged  variable  is  larger  than  in  the  previous  year, 
and  wheat  stock  carry-in  is  at  a 20-year  low.  Only  the 
wheat  export  level  was  less  favorable  for  1977  than 
for  1976.  Data  were  not  yet  available  to  evaluate  the 
model  predictions  for  1977. 


Results  for  the  United  States 

Model  form.—  For  the  United  States,  a substantial 
number  of  alternative  models  were  analyzed  after 
1976  CRD  data  became  available,  but  only  the  "best" 
models  were  reported  (ref.  1).  Because  the  model 
variables  differed  from  state  to  state,  the  results  for 
each  individual  state  are  presented  separately. 
Models  are  presented  for  SW/SG  WW/WG,  and 
WW/GR  ratios  for  each  state  where  an  analysis  was 
made. 

North  Dakota  CRD  models. — The  North  Dakota 
SW/SG  models  included  the  following  variables. 

SKISGU  = + 0lPW0/  + 02WADi  + 03D71' 

where  SW ' =*  spring  wheat  acreage  (harvested 
or  planted)  for  year  /in  CRD  i of 
state/ 


SGj  j *“  spring  grain  acreage  (harvested  or 
planted)  for  year  t in  CRD  i of 
state/' 

PWOj  ■*  a 3-year  weighted  moving  average 
of  the  difference  between  the 
season  average  spring  wheat  price 
(cents  per  bushel)  and  the  season 
average  oats  price  (cents  per 
bushel)  received  by  farmers  in 
state/,  that  is, 

PWO/  = 0.5  (PW j~l  - PO/,1)  + 
0.3(PWf"2  - Po‘~2)+ 

0.2  (PWr**  - Pof-3) 


where  PW'-1  and  PO'_tare  the 
season  average  price  of  wheat  and 
oats,  respectively,  received  by 
farmers  in  the  marketing  year 
prior  to  planting.  The  marketing 
year  runs  from  June  1 to  May  31; 
consequently,  prices  for  the  com- 
plete marketing  year  r — 1 are  not 


Table  VIII. — A Comparison  of  the  Accuracy  of  Three  Techniques  to  Predict 
the  SW/SG  Confusion  Crop  Ratio  for  the  Province  of  Saskatchewan,  1967-68  to  1976-77 

l Based  on  ref.  7.  table  21 


Year 

Actual 

SW/SG 

ratio 

Lagged 

SW/SG 

ratio 

(a) 

Percent 
difference 
actual  r.t. 
tagged 

Prediction 
based  on 
"best"  model 

Percent  dif- 
ference actual 
is.  prediction 
based  on  model 

1976-77 

0.782 

0.740 

-5.37 

0.803 

2.69 

1975-76 

.740 

.704 

-4.86 

.744 

.54 

1974-75 

.704 

.713 

1.28 

.714 

1.42 

1973-74 

.713 

.688 

-3.51 

.743 

4.21 

1972-73 

.688 

.631 

-8.28 

.727 

5.67 

1971-72 

.631 

.602 

-4.60 

.614 

-2.69 

1970-71 

.602 

.778 

29.24 

.722 

19.93 

1969-70 

.778 

.813 

4.50 

.808 

3.86 

1968-69 

.813 

.833 

2.46 

.796 

-2.09 

1967-68 

.833 

.824 

-1.08 

.858 

3.00 

“Actual  SW/SG  ratio  lagged  I year 
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available  prior  to  planting.  All 
pi  ices  were  assumed  to  be  zero 
prior  to  1971. 

WAD'  “ the  size  of  the  wheat  allotment 
(hundreds  of  acres)  minus  the 
wheat  acreage  diverted  (hundreds 
of  acres)  in  CRD  / of  state  J in 
year  t 

D71'  — a dummy  variable  equal  to  1 for 
the  year  1971  and  0 otherwise 


Table  XI  shows  for  each  CRD  the  ordinary  least* 
squares  estimates  of  the  coefficients  for  the  vari* 


ables,  the  t-values  for  the  estimates  of  the  coeffi* 
dents,  and  the  coefficient  of  multiple  determination 
for  each  equation.  Generally,  the  si^is  of  the  coeffi- 
cients are  consistent  with  theoretical  expectations. 
As  expected,  an  increase  in  the  difference  between 
historical  wheat  prices  and  oats  prices  PWOj  led  far- 
mers to  plant  a larger  share  of  wheat  acreage  to  small 
grain  acreage.  It  should  be  noted  that  PWOj  was  a 3- 
year  weighted  moving  average  of  the  difference  be- 
tween spring  wheat  prices  and  oats  prices.  The 
weights  are  based  on  a hypothesis  about  how  farmers 
“learn”  to  form  expectations  about  future  prices.  The 
3-year  moving  average  generally  provided  a more  sig- 
nificant variable  than  a single-year  lagged  price 


Table  IX. — Saskatchewan:  Paired  Comparison  Test  of  the  Difference  in 
Accuracy  Between  the  "Best"  Model  and  Last  Year's  Ratio  as  Predictors 
of  the  1976  SW/SG  Ratio  far  Each  Crop  District a 


[Based  on  ref.  2.  tables  12  and  /.?/ 


CD 

Percent  difference  (absolute  value) 
between  1976 

actual  SW/SG  ratio  and  — 

Weight,  percent  SB7 
acreage  in 
crop  district 

Difference  between 
model-predicted 
error  and  last 
year's  error  weighted 
by  percent  SB  acreage 
(b) 

x‘-76-w<-76(d'— d*) 

Model- 
predicted 
1976  SW/SC 
ratio 

“i 

Last  year's 
(1975)  SW/SG 
ratio 

V 

wl-76 

6A 

4.63 

6.41 

8.1 

-14.42 

2B 

.22 

5.20 

6.7 

-33.37 

3AS 

.43 

2.57 

6.7 

-14.34 

5A 

1.34 

8.72 

6.6 

-48.71 

3BN 

.77 

1.97 

6.4 

-7.68 

SB 

.00 

13.56 

6.4 

-86.78 

7A 

4.25 

2.53 

6.1 

10.50 

6B 

3.43 

5.51 

6.0 

-12.48 

7B 

5.96 

2.59 

4.9 

16.51 

2A 

.77 

3.63 

4.7 

-13.44 

3BS 

.IS 

1.72 

4.6 

-4.46 

9A 

5.99 

15.85 

4.6 

-45.36 

1A 

.65 

7.44 

4.4 

-29.82 

8B 

5.40 

10.94 

4.3 

-23.82 

8A 

1.81 

16.47 

3.8 

-5S.71 

3AN 

.79 

3.70 

3.4 

-9.89 

IB 

3.84 

7.53 

3.3 

-12.18 

4B 

IS 

.97 

3.3 

-.72 

9B 

10.58 

14.78 

3.2 

-13.44 

4A 

3.50 

3.62 

2.4 

-.29 

aTe*t  based  on  the  absolute  value  of  the  percent  error  of  the  two  predicted  values  from  the  actual  SW/SG  ratio.  _ ^ 

bFor  these  data,  X - 20.00.  I (AT-  - !Fr  “ 11  278.99,  Sy  " 5.45  Under  the  hypothesis  of  no  difference.  HQ.  7 - 0 against  H0  7 *0  with 
/ - 3.67  for  q|  - 2.539;  HQ  is  rejected  and  the  two  sets  of  predictions  are  from  different  populations. 
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difference  variable.  The  sign  of  the  policy  variable 
WADJ,  was  expected  to  be  positive.  This  variable 
represents  the  net  effect  of  the  government  wheat 
program  on  wheat  acreages.  The  signs  of  WAD-j 
were  negative  in  CRD  50  and  CRD  60  but  the  t- 
values  of  the  coefficients  were  extremely  small,  in* 
dicating  the  variable  could  possibly  be  dropped  in 
these  two  CRD  models.  The  dummy  variable  repre- 
sents a 1-year  increase  in  spring  wheat  acreage  due  to 
a government  policy  shift  from  a diversion  program 
(1962  through  1970)  to  a set-aside  program  (1971 
through  1974).  Spring  wheat  acreage  rose  substan- 
tially in  1971,  and  other  variables  were  unable  to  ac- 
count for  this  rise. 

The  only  independent  variable  affecting  changes 
in  the  ratio  after  1971  was  PWO'.  Other  variables  in 
the  model  serve  simply  to  specify  model  structure  in 
the  earlier  years  of  the  1973-76  period.  The  relatively 


high  R*  values  indicate  that  the  models  explain  a 
large  amount  of  the  historical  variation  in  the  SW/SG 
ratio. 

Minnesota  CRD  models. — The  SW/SG  model  for 
harvested  acreage  in  Minnesota  differed  from  the 
North  Dakota  model  in  the  use  of  a lagged  dependent 
variable.  The  model  equation  was  of  the  form 

SW;./SG'>  = p0  + 0jPWOy  + 02WAD/7 

♦ ‘/SGu l) 

The  estimated  coefficients  for  the  CRD  models  are 
shown  in  table  XII. 

In  most  of  the  Minnesota  CRD’s,  the  only  signifi- 
cant t-values  for  variable  coefficients  were  for  the 


TABLE  X. — Saskatchewan:  The  1977  Predicted  Spring  Grains  Confusion  Crop  Ratios  (SW/SG) 

for  the  Crop  Districts  and  Province 


I Based  on  ref  \ 2,  table  14] 


Geographic 

unit 

Actual  1976 
SW/SG  ratio 

Predicted  1970 
SW/SG  ratio 

90-percent  confidence 
limits  ,/dr  the 
predicted  1977  SW/SG 
ratio 
(a) 

90-percent  confidence 
limits  for  the 
predicted  1977  SW/SG 
ratio 

CD  IA 

0.766 

0.791 

0.719  to  0.879 

0.634  to  0.947 

CD  IB 

.677 

.711 

646  to  .790 

,549  to  .873 

CD  2A 

.909 

910 

.827  to  1.000 

.843  to  .986 

CD  2B 

.904 

.913 

.830  to  1.000 

832  to  .994 

CD  3AS 

.935 

.941 

.855  to  1.000 

879  to  1.000 

CD  3AN 

.891 

.904 

.822  to  1.000 

,8l9to  .988 

CD  3BS 

.932 

.932 

.847  to  1.000 

869  to  .955 

CD  3BN 

.913 

.922 

.838  to  1.000 

.877  to  #68 

CD  4A 

.885 

.880 

.800  to  .978 

.786  to  .975 

CD  4B 

.930 

.939 

,854  to  1.000 

.879  to  1.000 

CD  SA 

.745 

.776 

.705  to  .862 

.675  to  .877 

CD  SB 

.649 

.678 

616to  .753 

.526  10  .831 

CD  6A 

.843 

.889 

.808  to  .988 

.803  to  .975 

CD  6B 

.816 

.850 

.773  to  .944 

.776  to  .925 

CD  7 A 

.870 

.908 

.825  to  1.000 

826  to  .989 

CD  7B 

.772 

.806 

.733  to  .896 

.700  to  .911 

CD  8A 

.607 

.638 

.580  to  .709 

.512  to  .764 

CD  8B 

.704 

.759 

.690  to  .843 

.645  to  .872 

CD  9A 

,568 

.627 

.570  to  .697 

.512  to  .742 

CD  9B 

.548 

.621 

.565  to  .690 

477  to  .764 

Province 

.782 

.812 

,738  to  .902 

,737  to  .887 

*Kor  the  predicted  1977  SW/SG  ratio  presented  here  to  be  within  ± 10  percent  of  actual,  the  actual  1977  value  must  fail  within  this  range 
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price  variable.  However,  except  in  CRD  30  where 
wheat  acreage  did  not  exceed  1000  acres,  the  R- 
values  exceeded  0.83,  indicating  that  the  models  ex* 
plained  a large  part  of  the  historical  variation  in 
SW/SG.  When  evaluating  the  results,  it  should  be 
noted  that  almost  90  percent  of  all  spring  wheat 


acreage  is  in  CRD's  10  and  40.  Again,  the  signs  of  the 
wheal  allotment  minus  the  diverted  acreage  variable 
were  inconsistent  among  CRD's.  It  is  worth  noting 
that  the  barley  allotment  used  on  state  data  in 
preliminary  model  analysis  showed  a highly  signifi- 
cant relationship  to  the  SW/SG  ratio.  However,  data 


Table  XI. — North  Dakota:  The  Models  Chosen  to  Predict  the  S W/SG 
Harvested  Acreage  Ratios  for  Each  CRD 


/Based  in  pari  on  ref.  I.  table  1.1/ 


CRD 

Ordinary  least -squares  estimates  of  the  model  eoeftkients 

R 

(at 

Constant 

(a) 

BWd 

tar 

D?ll 

10 

0.635  (15.24, 

8.25E-4  (4.50) 

6.03E-6 

(1.31) 

C.I83  (2.86)b 

78.7 

20 

.491  (21.73) 

7.24E-4  (7.36) 

1.06E-5 

(3.23)® 

171  (5.06)® 

89.2 

30 

.509  (12.29) 

6.48E-4  (3.63) 

5.30E-7 

1.13) 

.145  (2.38)b 

81.0 

40 

.565  (20.86) 

7.77E-4  (6.55) 

1.45E-5 

(3,00)b 

.168  ( 4.09)® 

85.7 

SO 

.475  (12.49) 

8.86E-4  (5.38) 

— 1.38E-6 

(-.21) 

HI  <2.33lb 

90,6 

60 

.405  (9.49) 

8.87E-4  (4.80) 

-7.81E-0 

(-.91) 

.115  (1.82)c 

90.6 

70 

.514  (24.28) 

9.91  E-4  (10.80) 

3.03E-5 

(9.39)“ 

.253  (8.09)® 

92.4 

80 

.449  (14.81) 

8.45E-4  (6.38) 

1.91E-5 

(2.7l)b 

.165  (3.62)® 

85  7 

90 

.337  (9.59) 

1.02E-3  (6,67) 

5.24E-6 

(.75) 

.101  (l.9S)c 

91.9 

"Significant  at  the  1 -percent  level 

bS*iufK.'iru  m the  5-pcnem  levci 

cStgmfic*M  at  the  10-percent  level 

Table  XII. — Minnesota:  The  Models  Chosen  to  Predict  the  SW/SG  Harvested 
Acreage  Ratios  for  Each  CRD 

1 Based  on  ret.  1.  table  16/ 

CRD 

Ordinary  least-squares  estimates  of  the  mode!  eoeftkients 

R 

(a) 

Constant 

pwd 

SH!j/sc!j 

to 

0.324 

(4.15)® 

1.21E-3 

(2.92)b 

— 6.30E-6  (- 

-0.65) 

-2.07E-;  (-0.07) 

94.1 

20 

7.S1E-2 

<2.28)c 

6.57E-4 

( l.96)c 

1.95E-4 

(.23) 

.118  (.35) 

83.5 

30 

4.49E-2 

(2.49)b 

3.09E-4 

(2.30)c 

2.83E-2 

(1.67) 

- 304  (-.67) 

b61 .8 

40 

.1*3 

(2.88)b 

8.42E-4 

(1.68) 

— 1.48E-5  ( 

-.68) 

.476  (1.59) 

97.4 

50 

-4.3/r  2 

<-2.38)b 

I.0SE-3 

(5.62)® 

2.20E-4 

<3.00)b 

.426  <2.75)b 

98  3 

60 

-8.02E-3 

(~2.00)c 

2.97E-4 

(7.81)® 

1.82E-3 

(3.77)® 

.274  (2.31  )c 

98.6 

70 

— 2.66E-2 

(-.81) 

-3.09E4  ( 

-.74) 

— I.65E-5  ( 

-.07) 

2.86  (3.72)® 

90.6 

80 

-7.91E-2 

(-1.62) 

1.S0E-3 

(3,52)® 

2.71  E-4 

(2.28)c 

.463  (1.58) 

91.5 

90 

-1.60E-2 

(-1.19) 

5.58E-4 

(3.95)® 

2.27E-4 

(2.02)c 

.373  (1.55) 

958 

38 


•Significant  «t  the  1 -percent  level 
^Significant  n the  5-pereem  level 
^Significant  it  the  10-percent  level 


on  barley  allotments  were  not  available  at  the  CRD 
level  (ref.  1). 

South  Dakota  CRD  models.—  The  SW/SG  model 
equation  for  harvested  acreage  was 


S*/.//SCi'j  . |S0  ♦ p.pwq'  ♦ (JjWADJj 

* PjOBS-Ttf  * d4D7l* 


where  D65-70'is  a dummy  variable  equal  to  1 for  the 
years  1965  to  1970  and  0 otherwise,  and  D71'  is  a 
dummy  variable  equal  to  1 for  1971  and  0 otherwise. 
The  dummy  variable  D65-70'is  thought  to  represent 
the  effects  of  a change  in  government  programs 
which  allowed  increased  substitution  between  wheat 
and  feed-grain  plantings  (table  XIII).  The  negative 
sign  (in  most  CRD'son  D65-70')  indicates  a shift  out 
of  wheat  acreage  during  this  period.  D71'  was  in- 
cluded in  part  to  account  for  a large  l-year  decline  in 
barley  acreage  in  CRD  10.  Although  increasing  the 
R ■’  value  for  this  CRD,  071'  had  a small  and  incon- 
sistent impact  in  other  CRD’s  and  probably  should 
be  dropped  from  the  model.  The  3-year  moving 
average  of  the  difference  between  wheat  and  oats 
prices  was  the  most  significant  variable  in  the  model. 
The  R2  values  ranged  from  0.46  in  CRD  40  to  0.93  in 
CRD  60. 


The  WW/WG  ratio  model  consisted  of  the  follow- 
ing equation. 

wV%  ■ «o  * 0jK'/TOi;1) 

+ 02PWWR + <33D67' 

where  WW';  " area  in  winter  wheat  (hundreds 
of  acres)  for  year  / in  CRD  / of 
state  j 

WGj  j "*  area  in  winter  wheat  plus  rye 
(hundreds  of  acres)  for  year  rin 
CRD /of  stated 

PWWR'^—a  3-year  weighted  moving 
average  of  the  difference  be- 
tween the  state  season  average 
price  (cents  per  bushel) 
received  by  farmers  for  winter 
wheat  (PWW')  and  that 
received  for  winter  rye  (PRj ); 
that  is, 


PWWRj  * 0.5  (PWW*  1 - PRj'1)  + 
0.3(PWW/'~2  - PR'_2)  + 
0.2(PWW/’3  - PRj"3) 


Table  XIII. — South  Dakota:  The  Models  Chosen  to  Predict  the  SWISG  Harvested 
Acreage  Ratios  for  Each  CRD 

(Based  on  ref.  /,  table  I9f 


CRD 

Ordinary  leasi-sttuores  estimates  of  the  model  coefficients 

R; 

Constant 

pwd: 

H40?. 

*V 

D6S- 7(1 

D7l' 

10 

0.471 

(11.89)* 

9.38E4 

(4,89)® 

7.15E-S 

(4.22)* 

— 5.58E-2  (—1.39) 

0.127 

(2.53)b 

*74.2 

20 

.377 

(9,94)* 

1.16E-3 

(6.38)* 

2S6E-5 

(4,06)* 

— 5.07E-2  (—1.39) 

4.20E-2 

(.91) 

®83J 

30 

173 

(4.69)* 

I.I0E-3 

(6.20)* 

3.00E-5 

(1.24) 

-3.45E-2  (-.82) 

1.96E-3 

(04) 

“90.9 

40 

190 

(3  56)* 

5 79E-4 

(2  26)c 

6.33E-5 

(1  23) 

— J.93E-2  (-.73) 

5.51E-2 

(.84) 

46.3 

50 

.235 

(4.1 1)* 

I.28E-3 

(4.66)* 

6.25E-5 

(2.I4)C 

- 3.55E-2  (-.58) 

: I0E-2 

(.30) 

*75.8 

60 

4 4SE-2 

(6,07)* 

2 89E-4 

(7.67)* 

5.09E-S 

(1.32) 

- 1.62E-2  (—1-30) 

- I.23E-2  (- 

-1.10) 

*92.9 

70 

637E-2 

(2,30)b 

1 05E-3 

(6.87)* 

3.67E-6 

(.59) 

2.30E-2  (.65) 

-7.80E.2(- 

-1.55) 

*88.6 

80 

6.27E-2 

(104) 

1 I2E-3 

(3.851* 

4.I8E-5 

(1.10) 

-2.07E-2  (-.35) 

- 1 38E-2  ( 

-.18) 

*76.1 

90 

5.36E-2 

(7.10)* 

I.I6E-4 

(2.77)b 

182E-6 

(1.07) 

— 2.70E-3  (.28) 

-2.14E-2  (- 

-1.56) 

b63.9 

%jrnfK,ni  >i  (hr  Ijvervem  level 
^SunilkaM  »( iht  Vpertem  level 

cSo»Bir«:*ni  11  the  IIHwrvent  level 
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Prices  are  assumed  to  be  aero 
before  1971, 

D67'  » a dummy  variable  equal  to  1 in 
1967  and  0 otherwise 

The  model  coefficients  for  harvested  acreage  are 
shown  in  table  XIV.  The  signs  of  the  coefficients  are 
all  positive  as  theoretically  expected.  Most  of  the 
winter  rye  acreage  is  in  CRD's  20  and  30.  The  high 
R*  for  these  two  CRD's  indicates  that  a large  amount 
of  the  historical  variation  was  explained  by  the 
models.  The  economic  variable  PWWR'  / had  a 
highly  significant  t-value  in  these  CRD's. 

A model  for  the  harvested  acreage  ratio  of  winter 
wheat  to  total  grains  (WW/GR)  was  also  developed 
for  South  Dakota.  The  equation  follows. 


W«M/ ■ VGR!./ ') 

+ fljPWW},  + 03D67' 


where  PWWj^  -*  a 3-year  weighted  moving 
average  of  the  state  season 
average  winter  wheat  price  (cents 
per  bushel)  received  by  farmers; 
that  is, 

PWW,'.  = O.SPWW jj2  + OJPWW/.  3 + 0.2PWW//  4 

Prices  were  assumed  to  be  zero 
prior  to  the  1971*72  marketing 
year. 

The  model  coefficients  for  harvested  acreage 
ratios  are  shown  in  table  XV.  Although  the  signs  of 
the  model  coefficients  were  all  theoretically  correct, 
the  statistical  results  were  inconsistent  among  the 
CRD’s.  While  the  A2  value  for  CRD  30  (a  CRD  with 
little  wheat  acreage)  showed  that  the  model  ex- 
plained 90  percent  of  the  historical  variation  in  the 
WG/GR  ratio;  the  R1  value  for  CRD  50  (a  CRD  with 
one  of  the  largest  winter  wheat  acreages)  showed 
that  the  model  explained  only  35  percent  of  the 
historical  variation  in  the  WW/WO  ratio.  The 
reasons  for  this  poor  performance  were  not  under- 
stood and  need  further  investigation. 

Montana  CRD  models.— The  model  equations  for 


predicting  the  SW/GR  ratios  were  of  the  following 

form. 


mjj/GRji  = 0O  + 0|PSWB >■  1 + d,Dt>‘).7lf 

+ WA*IJ 


where 

PSWB'"1  ■»  the  spring  wheat  price  (cents  per 
bushel)  minus  the  barley  price  (cents 
per  bushel)  lagged  I year.  Prices  are 
season  average  prices  received  by 
farmers  and  are  assumed  to  be  zero 
before  1972. 

D69-71 ' “ a dummy  variable  equal  to  — 1 for  the 
year  1969,1  for  1971,  and  0 otherwise 

The  coefficients  for  the  seven  Montana  CRD 
SW/SG  models  for  harvested  acreage  are  shown  in 
table  XVI.  Of  Montana's  spring  wheat  acreage  in 
1976.  58  percent  was  grown  in  CRD  30  and  another 
31  percent  was  grown  in  CRD  20  (ref.  5).  Except  in 
CRD  70,  the  signs  of  the  model  coefficients  are  all 
theoretically  correct  and  their  t-values  are  mostly 
significant.  The  negative  sign  on  WAD'  ;in  CRD  70 
is  possibly  related  to  the  fact  that  spring  wheat 
acreage  accounts  for  about  20  percent  of  all  wheat 
acreage  in  Montana.  Winter  wheat  is  a major  crop  in 
Montana,  and,  in  all  CRD's,  harvested  winter  wheat 
acreages  were  more  than  twice  the  spring  wheat 
acreages  harvested  in  1976.  Perhaps  a basic  problem 
with  the  SW/SG  models  is  a lack  of  a variable  ac- 
counting for  winter  wheat  planting  decisions.  Also, 
spring  wheat  plantings  may  depend  partly  on  the 
amount  of  winterkill  in  winter  wheat.  Data  to  esti- 
mate winterkill  were  not  available.  The  R:  values, 
which  are  typically  smaller  in  Montana  than  in  other 
states,  also  indicate  that  an  important  independent 
variable  may  be  missing  from  the  models. 

Because  Montana  winter  rye  acreage  was  not  re- 
ported by  CRD,  the  winter-whcat/total-grain  model 
was  the  only  winter  wheat  model  developed.  The 
model  equation  for  WW/GR  follows. 

WW/.//CR/./  = + + tfjWAI)', 

* 0,D(»7.O8' 
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where 


WW'"2  — the  season  average  winter  wheat  price 
(cents  per  bushel)  received  by  farm- 
ers, lagged  2 years.  Since  / represents 
the  marketing  year  which  begins  when 
the  crop  is  harvested  and  the  crop  is 
planted  the  previous  fail,  the  price 
must  be  lagged  two  periods. 

D67-68'  “ a dummy  variable  equal  to  1 for  1967 
and  1968  and  0 otherwise 

The  CRD  model  coefficients  for  WW/GR  ratios 
for  harvested  acreage  are  shown  in  table  XVII. 
Model  coefficients  have  the  correct  sign  in  all 
CRD’s.  However,  the  R1  value  for  CRD  20,  where 
about  one-half  of  Montana's  winter  wheat  is  grown, 
was  disappointing.  The  model  explained  only  57  per- 
cent of  the  historical  variation  in  the  WW/GR  ratio. 

Accuracy  rest  of  1976  CRD  predictions . — A paired 
difference  test,  similar  to  the  one  run  on  the  Cana- 
dian CD  data  and  described  in  the  section  entitled 
“Precision  and  Accuracy  Tests,"  was  made  between 
the  CRD  predictions  based  on  the  model  for  1976 
and  those  based  on  last  year's  ratio.  The  test  results 
for  the  1976  CRD-harvested  acreages  are  shown  in 
table  XVIII. 

The  results  show  that  the  ratio  values  predicted  by 
the  model  were  closer  to  the  actual  1976  ratio  values 


than  were  the  lagged  (1975)  ratio  values  at  the 
10-percent  (or  higher)  significance  level  in  North 
and  South  Dakota,  but  not  in  Minnesota  and  Mon- 
tana. Nevertheless,  in  the  latter  two  states,  the  model 
predictions  were  generally  closer  to  the  actual  value 
than  were  the  lagged  values. 

Mode!  predictions  for  1977,— The  1977  ratio  predic- 
tions, the  associated  variance  of  estimates,  and  the 
associated  variance  of  prediction  are  shown  in  table 
XIX.  in  contrast  to  the  Canadian  models,  which  pre- 
dicted that  the  ratio  of  SW/SG  acreage  would  in- 
crease in  1977  from  the  1976  values,  the  U.S.  CRD 
models  generally  predicted  a decline  in  the  1977 
SW/SG  ratios  from  the  1976  values.  A major  reason 
for  a decline  in  the  SW/SG  ratios  was  a reduction  in 
the  1976-77  price  of  wheat  from  the  previous  year,  it 
should  be  noted  that  the  only  new  information 
nreded  to  make  1977  predictions  was  the  estimated 
season  average  prices  received  by  farmers  for  small 
grains. 

A targe  jump  in  the  lagged  dependent  variable  in 
1976  caused  an  unrealistic  prediction  for  1977  in 
Minnesota  CRD  70  (table  XIX).  The  SW/SG  ratio 
increased  from  0.161  in  1975  (and  less  than  0.12  in 
earlier  years)  ic  0.406  in  1976.  CRD  70  accounted  for 
5.4  percent  of  the  total  Minnesota  spring  wheal 
acreage  in  1976,  up  from  2.2  percent  in  1975.  Conse- 
quently, the  rather  large  predicted  ratio  for  CRD  70 


Table  XIV.— South  Dakota:  The  Models  Chosen  to  Predict  the  WWIWC  Harvested 
Acreage  Ratios  for  Each  CRD 

[Based  on  ref.  I,  table  2,1 1 


CRD 

Ordinary  leasi-squarts  eitimates  of  the  model  coetfkimn 

R> 

Cunitant 

w^lwd,. 

D6t 

to 

0.435 

(2.51)b 

0 503 

<2.S8)b 

0.124 

(3.33)* 

4.32E-4 

(2.54)b 

*68.3 

20 

5 28E-2 

(1.09) 

.612 

(3.92)* 

216 

(3,10)* 

1.57E-3 

(5.43)* 

*88.4 

30 

-2.31E-2 

(-1.35) 

1.146 

(4.90)* 

6.67E-2 

(1.73) 

I.25E-3 

(3  34)* 

*96.1 

40 

.587 

<2.79)b 

.406 

( 1 90)c 

7.92E-3 

(.94) 

236E-5 

(.57) 

36.1 

50 

.380 

(1.92 11’ 

.477 

(1.8-4)' 

.146 

(2.31)b 

8.S2E-4 

(2.70)b 

*70.1 

60 

I.29E-2 

(51) 

840 

(4.31)* 

.132 

<2.78)b 

I.87E-3 

(4.28)* 

*96.0 

70 

.680 

(5.72)“ 

.297 

<2.3J)b 

1.85E-2 

(111) 

5.90E-5 

(.70) 

c47,5 

so 

.512 

<l.94)c 

.460 

(1.64) 

3.48E-2 

(1.27) 

I.72E-4 

(1.19) 

409 

90 

.440 

<3.l3)b 

,454 

<2.42>b 

8.63E-2 

(1.78) 

5,07  E4 

(1.61) 

*71.5 

‘Significant  at  the  l -percent  level 
Ntgmlkant  at  the  S-pertem  level 
‘’Significant  at  the  10- per  cent  level 
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will  not  have  a significant  impact  on  state  results. 
Nevertheless,  in  this  case,  the  1976  ratio  would  un- 
doubtedly provide  a better  predictor  in  CRD  70  than 
the  model  value. 

r All  other  models  provided  estimates  that  appear 
reasonable.  The  accuracy  of  ail  models  can  be  tested 
better  when  1977  CRD  crop  acreage  estimates 
become  available. 


CONCLUSIONS 

The  results  for  both  the  United  States  and  Canada 
show  that  econometric  models  can  provide  estimates 
of  confusion  crop  ratios  that  are  more  accurate  than 
historical  ratios.  Whether  time  models  cut  support 
the  LACIE  90/90  accuracy  criterion  is  uncertain.  In 
the  United  States,  experimenting  with  additional 


Table  XV.— South  Dakota:  toe  Models  Chosen  to  Predict  the  WW/GR  Harvested 
Acreage  Ratios  for  Each  CRD 


I BateC  on  rtf.  /.  table  23] 


CKO 

unmory  mnr*lfwrn  rifmmffS  of  fnt  moan  COtJJICRtnfS 

Coamm 

oei* 

wajj 

10 

J.94E-2  (169) 

O.Jdl  (1J4)C  6.94E-2 

(2.19)* 

1.96E4 

OS6>* 

9«6.1 

20 

2.0SE-2  (2.65)b 

-2.17E-2  (-.09)  I.JIE-2 

(102) 

6.7IE-S 

005)9 

*493 

20 

-6J9E-J  (-1,99)' 

2690  (J.0*)b  7.07E-J 

(1.14) 

-2.10E4 

(-0S) 

*126 

40 

.2*6  (1.36) 

.509  (134)  .117 

turn 

4.24E4 

«4i»9 

•"’S3 

$0 

ISO  (1.47) 

-.102  (-35)  5J1E-2 

(.53) 

J.I2E-4 

(2.24)9 

J5.J 

CO 

-2.27E-3  (-235)® 

1610  <4,9S>»  5.J2E-J 

(234)b 

9.9JE4 

(2.01)' 

*90.1 

70 

342  (J3J)* 

•49J  tZfpf  114 

(2.97)9 

2,1  JE4 

(2.94)9 

*743 

•0 

.577  (2.1 l>* 

-.154  (-39)  .115 

(167) 

235E4 

(131) 

JI.7 

90 

USE-2  (2.13)* 

JSI  (1.41)  1.25E-2 

<J.I)>9 

I.J2E-5 

(139)* 

C5.7,J 

hunifWmi at tlx  | praam  In*. 

at  tht  1 nfniti  Itwgf, 

Httniflcim  M tha  10  pattam  i**al, 

Tabu  XVI.— Montana:  The  Models  Chosen  to  Predict  the  SWfSG  Harvested 

Acreage  Ratios  for  Each  CRD 

IBostdenrtf.  1.  table  2V 

cud 

Ortmary  leatt-w*m  til  mom  of  the  moOri  coe/fkients 

a-* 

Constant 

Kwi; ' 

WA 

°U 

<•) 

10 

0.IU  (7.20) 

4.15E-4  <4.13)*  462E-2 

(2.70)9 

I30E4 

(506)* 

*74.5 

20 

JSI  (I.JJ) 

6.J0E4  (233)9  174 

<J.7S>* 

S.41E4 

(139) 

967.0 

JO 

672  (21.75) 

6.I6E4  <3.11)9  .102 

(J.01)9 

5.70E4 

(1.79) 

964.4 

JO 

.195  (•'  45) 

2.42E4  (.96)  I0« 

<M6>' 

2.69E-6 

(31) 

4J.5 

70 

.205  <73  r' 

— I.45E-S  (-  0S)  9.41E-2 

0.05)9 

— 2.16E-S 

(-.50) 

962.7 

so 

6.47E-2  0,79) 

4.IJE4  (J.7J)»  S04E-2 

(269)9 

J.17E-5 

(J.70)* 

965.4 

90 

.3*2  (1264) 

7.S6E-4  <J3S)»  .119 

<J59>* 

7.72E-5 

(4.7J)* 

*73.4 

JapiitaM  M Sw  l+tHtM  >av»t 
S^wficam  « At  i+*tm  If*** 
tf*MflcaM  at  At  IO«w«m  itwi 
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model  formulations  could  provide  improved  models 
in  some  CRD's,  particularly  for  winter  wheat.  Im- 
proved models  may  also  be  possible  for  the  Canadian 
CD's. 


The  more  aggregate  province/state  models  outper- 
formed individual  CD/CRD  models.  This  result  was 
expected— partly  because  acreage  statistics  are  based 
on  sampling  procedure)  and  the  sampling  precision 


Table  XVII.— Montana:  The  Models  Chosen  to  Predict  the  WW/GR  Harvested 
Acreage  Ratios  for  Each  CRD 

/Bated  on  ret.  I,  table  U/ 


CKO 


Ordinary  leati-tquare*  ettlmaiet  of  the  model  coeflktents 


Constant 

tat 


Dfi7-n# 


•4 


R- 


10 

0 301 

(Hit) 

2.J6E-J 

(0.32) 

-J7JE2 

(-1.53) 

2.21E-4 

(4.4?)* 

*85.1 

20 

266 

(3911 

7.66E-4 

<29l)b 

7.50E-2 

(.92) 

1.26E-5 

(l.S9)c 

bS7.7 

JO 

.114 

(6.12) 

1.S3E-4 

(2,5J)b 

4.17E-2 

(t.U)c 

205E-6 

(100) 

hit 

to 

.415 

(9.23) 

4.S2E-4 

<2.79)b 

S.S4E-2 

(1.01) 

5.12E-5 

<3.521* 

*74.6 

70 

J9S 

(22.72) 

1.2SE-4 

( 1 .*6)s 

-1.01E-2 

(-.47) 

1.64E-4 

(5.55)* 

*171 

to 

544 

(15.46) 

5.47E-4 

(4.05 1* 

9.3IE-2 

(2.17)« 

2.72E-5 

(1  41) 

*75.6 

so 

.412 

(12.65) 

1.67E4 

(1.34) 

57IE-2 

(1  46) 

26JE-5 

(1.31) 

C5I,2 

*S*§mf*a«l  at  the  I ,*rcc«i 
^Significant  ait  the  **ptrsf  nt  level 
iVgntfK«nt  at  the  fO-penem  level 


Table  XVIII. — Results  of  the  Paired  Comparison  Tests  of  the  Difference  in  Accuracy  Between 
the  "Best"  Model  and  Last  Year's  Ratio  as  Predictors  of  the  1976  Harvested  Acreage  Ratios 

for  CRD's  by  Slate  and  Ratio? 


State 

Ratio 

Somber 
of  CRD‘\ 

Mean  value  ol 
weighted  paired 
dittrremei.  X 

m 

Standard  error 
ot  paired 
differemet. 

H 

Cakutated 
t- value 

North  Dakota 

sw/so 

9 

262 

120 

C21S 

Minnesota 

SW/SG 

9 

921 

53.0 

1.75 

South  Dakota 

SW/SG 

9 

79.6 

31.6 

d2S2 

WW/WG 

9 

369 

15.5 

d23S 

WW/GR 

9 

73.6 

11.5 

e39S 

Monona 

SW/SG 

7 

11.0 

150 

1 20 

WW/GR 

7 

115 

144 

1.21 

*|MTf  rentes  arc  based  tun  the  absolute  values  of  the  percentage  error  m u-ung  < 1 1 the  ratio  model  h>  predict  the  actual  ratio  and  t ’» last  year's  ratio  to  predict  the  actual 


**1,  * where  J/k  u>  the  difference  between  the  m*«det  predicted  ratio  saH»e  and  the  actual  ratio  value  for  COM 

i.  slate  i.  and  ratio  i.  4 *s  the  difference  between  last  year’s  < I97$l  actual  ratio  value  and  the  actual  ratio  value  tor  CKO  /.  state  /.  and  ratio  i.  and  W*.*  * 

proportion  of  harvested  wheat  acreage  «n  l if)  t of  state  < trt  id?h 

1 Indicates  model  prediction  is  better  than  the  lagged  Hast  year  at  ratio  a!  the  lO-f'irm  level  of  wgmfWance 
^Indicates  model  prediction  is  better  than  the  lagged  Hast  year's i ratio  at  the  ^-percent  level  of  significance 
* indicate*  model  predict  wm  w better  than  the  lagged  (last  year  si  ratw  at  the  1 percent  level  of  significance 
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declines  from  the  province/state  to  the  CD/CRD 
level.  Also,  CD  end  CRD  data  were  not  always 
available  for  predictor  (independent)  variables  such 
as  prices.  More  aggregate  province/state-level  obser- 
vations had  to  be  substituted  for  the  desired 
CD/CRD  data  which  were  not  available.  Declining 
sampling  precision  and  the  need  to  substitute 
province/siate  data  for  CD/CRD  data  introduced 
measurement  error  into  the  CD/CRD  models.  When 


* 

the  independent  variables  are  subject  to  measure- 
ment errors,  ordinarily  least-squares  techniques  give 
estimates  of  the  model  coefficients  that  can  be  both 
biased  and  inconsistent. 

Td  minimize  operational  problems,  the  same 
variables  were  used  in  ill  CD  or  CRD  models  of  a 
province  or  state.  Using  the  same  set  of  variables  in 
all  CD's/CRD's  of  a province/state  may  introduce 
equation  errors  in  at  least  some  of  the  CD's/CRD's. 


Table  XIX.— Predicted  Confusion  Crop  Harvested  Acreage  Ratios  for  1977 
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LACIE  Sampling  Design 

A.  H.  Feiveson,*  R.  S.  Chhikarat  andC.  R.  Halium a 


INTRODUCTION 

The  sampling  design  used  in  LACIE  consisted  of 
two  major  components,  one  for  wheat  acreage 
estimation  and  one  for  wheat  yield  prediction.  The 
acreage  design  was  basically  a classical  survey  for 
which  the  sampling  unit  was  a 5-  by  6-nautical-mile 
segment;  however,  there  were  complications  caused 
by  measurement  errors  and  loss  of  data.  Yield  was 
predicted  by  sampling  meteorological  data  from 
weather  stations  within  a region  and  then  using  those 
data  as  input  to  a previously  fitted  regression  equa- 
tion. Most  of  the  discussion  rn  this  paper  refers  to  the 
acreage  sampling  design,  since  there  was  considera- 
bly more  freedom  in  planning  for  the  collection  of 
Landsat  data  (used  for  acreage  estimation)  than 
there  was  for  the  collection  of  meteorological  data,  a 
situation  in  which  one  was  forced  to  make  use  of 
what  was  currently  available.  Wheat  production  was 
not  estimated  directly;  instead,  it  was  computed  by 
multiplying  yield  and  acreage  estimates  (see  the 
paper  by  Chhikara  and  Feiveson  entitled  “Large- 
Area  Aggregation  and  Mean-Squared  Prediction  Er- 
ror Estimation  for  LACIE  Yield  and  Production 
Forecasts”). 


ACREAGE  ESTIMATION  SAMPLING  DESIGN 


Determination  of  Sampling 
Unite  and  Frame 

All  information  on  current-year  wheat  acreage 
was  obtained  through  Landsat  imagery  of  a number 
of  5-  by  6-nautical-mile  segments.  These  segments 


aNASA  Johnson  Space  Center,  Houston,  Texas. 
bLockheed  Electronics  Company,  Houston.  Texas. 


were  the  basic  sample  units  for  acreage  estimation 
and  were  distributed  throughout  the  wheat  growing 
regions  of  LACIE  countries. 

Because  of  various  data  base  engineering  con- 
straints, a maximum  of  4800  sample  segments  could 
be  processed  within  a crop  year,  regardless  of  the  size 
of  the  individual  segment.  Given  the  maximum  sam- 
ple size  of  4800,  the  physical  dimensions  of  5 by  6 
nautical  miles  for  sample  segments  were  decided  on 
as  large  enough  for  Classification  and  Mensuration 
Subsystem  (CAMS)  analysts  to  obtain  wheat  acreage 
estimates  and  small  enough  to  not  tax  computer  and 
manpower  resources.  Throughout  this  paper,  the 
term  “sample  segment”  refers  to  5-  by  6-nautical- 
mile  segments  actually  in  the  LACIE  sample, 
whereas  "segment”  refers  to  any  5-  by  6-nautical- 
mile  area  whether  or  not  in  the  sample. 

The  LACIE  sampling  frame  was  constructed  by 
first  covering  the  wheat  growing  regions  of  a country 
by  a large  grid  of  5-  by  6-nau?icai-inilc  segments  and 
then  excluding  those  segments  which  appeared  to 
have  less  than  5 percent  agriculture,  as  determined 
by  an  examination  of  previous  years'  Landsat  image- 
ry. The  remaining  segments  constituted  the  frame 
from  which  the  actual  sample  segments  were  chosen. 


Allocation  of  Samplea  to  Countries 

In  the  early  years  of  LACIE,  it  was  decided  to  allo- 
cate the  maximum  4800  sample  segments  to  8 major 
wheat  producing  countries  in  proportion  to  their 
most  recent  wheat  production  statistics.  Two  types 
of  sampling  strategies  were  used  in  LACIE — one  for 
countries  with  historical  wheat  data  on  a detailed 
level  (D)  and  one  for  countries  with  published 
historical  data  only  for  fairly  large  political  subdivi- 
sions (N).  In  table  I,  the  eight  LACIE  countries, 
their  smallest  political  subdivision  (SPD)  fer  which 
published  historical  data  exist,  and  the  number  of 
samples  in  the  initial  allocation  are  listed. 
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Definition  of  Strata 

In  level  N countries.  sample  segments  are  alio- 
caleii  at  random  within  strata  which  are  approx- 
imately the  intersection  of  SIMVs  with  the  sampling 
Name.  More  precisely,  for  a given  SIM)  the  corre- 
sponding stratum  consists  ol  all  5-  by  ('-nautical  mile 


segments  which  are  in  the  sampling  frame  and  the 
center  points  of  which  lie  in  the  SIM)  dig  It  \s  a 
consequence,  the  agricultural  area  in  each  SIM)  (and 
hence  the  w hole  country  I is  approximated  h\  the  col- 
lection of  5-  by  h-nautical-mile  segments  front  which 
samples  are  drawn  \t  the  country  level,  the  error  in 
the  approximation  is  negligible,  however,  when 
SIMVs  are  small,  adjustments  must  he  made  to  the 
wheat  acreage  estimate  for  a stratum  to  obtain  a 
more  precise  estimate  for  the  corresponding  sIM) 
(see  the  paper  bv  Chhikara  and  I eiveson  entitled 
I M il  l arge- Area  \creage  I stimation") 

In  level  I)  countries,  each  SIM)  is  .i|s()  approxi- 
mated by  the  collection  ol  >.  tn  b-nauitcal-mile  seg- 
ments which  lie  in  the  sampling  frame  and  the  center 
points  ol  which  lie  in  the  SIM),  however,  the  collec- 
tion m this  case  is  called  a "substratum''  rather  than  a 
“stratum"  because,  in  some  cases,  no  sample  seg- 
ment is  selected  from  it  lo  distinguish  between  an 
SIM)  and  its  approximating  collection  ol  segments, 
the  latter  will  be  called  a pseudo-MM)  il’sIMM 
Strata  in  level  l)  countries  are  defined  to  be  the 
union  ol  I'SIMVs  which  are  contained  in  the  next 
higher  political  subdivision  ol  the  country  l ot  ex 
ample,  in  the  l tilled  States,  where  an  SIM)  is  a coun- 
ty and  the  next  higher  political  subdivision  is  a crop 
reporting  district  ((  Kl)i  (within  a state!  the  stratum 
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consists  of  the  collection  of  all  “pseudo"  counties  the  h tended  to  be  rather  large  (between  10  and  SO), 
corresponding  counties  of  which  lie  within  that  Once  was  computed,  nk  sample  segments  were 
CRD.  selected  at  random  from  the  Nk  segments  comprising 

the  Art h stratum. 

In  level  D countries,  an  attempt  to  use  the  preced* 
Allocation  and  S*l*ctlon  of  Segments  ing  technique  in  substrata  would  produce  many 

in  8trata/8ubatr«t«  values  of  ik  of  less  than  1 or  between  1 and  2.  As  a 

result,  the  sample  sizes  t k as  computed  in  equation 
In  the  first  2 years  of  LACIE,  sample  sizes  for  (1)  were  used  to  categorize  substrata  into  three 
countries  were  fixed  as  shown  in  table  1.  Since  little  groups:  Group  I,  ik  > 1.0;  Group  II,  0.1  « ik  as  1.0; 
or  nothing  was  known  about  the  accuracy  of  yield  and  Group  III,  ik<  0.1. 

predictions  at  that  time,  it  was  decided  to  allocate  The  substrata  in  Group  I received  Nk  sample  seg- 
samples  to  strata  (level  N countries)  or  substrata  ments,  selected  at  random,  where  nk  was  ik  rounded 
(level  D countries)  so  as  to  minimize  the  best  a priori  to  the  nearest  integer.  All  Group  II  substrata  within  a 
estimate  of  the  variance  of  the  country  wheat  acreage  stratum  were  called  a “Group  II  collection."  Each  en- 
estimate.  tire  collection  received  an  allocation  of  segments 

It  is  well  known  (cf.  Chhikara  and  Feiveson,  equal  to  the  rounded  total  of  ik  within  the  collection. 
“Large- Area  Aggregation  and  Mean-Squared  Predic-  For  example,  in  the  United  States,  if  there  were  three 
tion  Error  Estimation  for  LACIE  Yield  and  Produc-  Group  II  pseudocounties  (substrata)  in  a pseudo- 
tion  Forecasts")  that  if  one  estimates  a population  CRD  (stratum)  with  respective  rk  values  of  0.7, 0.6, 
total  by  stratified  sampling  over  L strata  with  a total  and  0.5.  then  the  collection  of  three  pseudocounties 
sample  size  of  n.  the  variance  (ignoring  the  finite  would  receive  a total  of  2 (rounded  value  of  0.7  + 0.6 
population  correction)  of  the  estimate  is  minimized  + 0.5)  sample  segments.  Once  the  sample  size,  say  m, 
if  nk , the  sample  size  for  the  Ath  stratum,  is  propor-  had  been  determined  for  a Group  II  collection  con- 
tional  to  NkSk,  where  Nk  is  the  total  number  of  seg-  sisting  of  M substrata,  the  sample  segments  were 
ments  in  the  Ath  stratum  from  which  nk  samples  chosen  with  a two-stage  sampling  scheme  where,  in 
were  selected  at  random,  and  is  the  standard  the  first  stage,  m substrata  were  selected  at  random 
deviation  of  the  segment  "characteristics”  (in  this  with  probabilities  proportional  to  their  historical 
case,  wheat  acreages)  within  the  Ath  stratum.  This  wheat  acreage,  then,  in  the  second  stage,  one  sample 
fact  was  used  in  LACIE  to  obtain  allocations  to  strata  segment  was  selected  at  random  within  each  of  the  m 
in  nondetailed  countries,  where  Nk  was  the  number  chosen  substrata  (note  that  m ^ XT). 
of  segments  comprising  the  Ath  stratum  and  Sk  was  The  Group  III  substrata  were  those  that  would  hy- 
assumed  proportional  to  the  “binomial"  variance  pothetically  receive  less  than  a tenth  of  a sample  seg- 
Pk( I — Pk),  where  Pk  was  the  historical  proportion  ment  in  the  optimal  allocation  and  thus  were  not 
of  wheat  in  the  SPD  corresponding  to  the  Ath  sampled  at  all.  Their  wheat  acreage  was  instead  esti- 
stratunt.  Note  that  it  was  not  necessary  to  know  the  mated  by  first  computing  a historical  ratio  of  their 
constant  of  proportionality;  i.c.,  if  Sk  <x  Pk/:(\  — wheat  acreage  to  that  of  neighboring  Group  I and 
Pk)  , the  optional  sample  size  for  the  Ath  stratum  Group  II  substrata  and  then  applying  that  ratio  to  the 
would  be  given  by  current-year  estimate  for  the  neighboring  Group  1 

and  II  substrata.  (For  details,  see  the  paper  by 
i / v i Chhikara  and  Feiveson  entitled  “Large-Area  Ag- 

ih\’kPk  - (l  Pk)  2 gregation  and  Mean-Squared  Prediction  Error 

tk  = — — — (1)  Estimation  for  LACIE  Yield  and  Production 

- / \ - Forecasts.") 

yi  ' For  Phase  III  of  LACIE,  some  modifications  were 

A'  made  to  the  allocation  procedure.  Instead  of  assum- 

ing that  within-stratum  wheat  variances  were  propor- 
except  that,  in  general,  ik  would  not  be  an  integer,  tional  to  the  binomial  P(\  - Pi.  where  P is  the 
For  Phases  1 arid  11  of  LACIE,  nk  was  taken  to  be  the  historical  proportion  of  wheat  in  the  stratum,  it  was 
nearest  integer  to  ik.  In  level  N countries,  this  round-  decided  that  a better  approximation  would  he  to 
ing  could  be  done  with  little  effect  since  the  value  of  assume  that  the  wheat  variance  is  proportional  to  the 
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small-grain*  variance,  which  could  be  directly  esti- 
mated from  a regression  model  using  Landsa'  imag- 
ery from  recent  years.  The  advantage  of  this  pro- 
cedure lies  in  the  ability  of  analysts  to  examine  a 
Landsat  full-frame  color  image  and  to  obtain  crude 
estimates  of  small-grains  (but  not  wheat  alone)  pro- 
portions Tor  all  5-  by  6-nautical-mile  segments  within 
the  area  covered  by  the  image. 

It  is  not  feasible  to  use  this  capability  to  estimate 
small-grains  variances  for  all  strata/substrata  because 
it  is  a very  time-consuming  process  and  also  because 
appropriate  data  acquisition  dates  may  not  exist  in  all 
areas.  It  is,  however,  possible  to  use  full-frame  imag- 
ery to  estimate  the  proportion  of  agriculture  for  ev- 
ery segment  in  the  sampling  frame  and  then  estab- 
lish a regression  model  approximately  expressing  the 
small-grains  within-stratum  variance  as  a function  of 
the  agriculture  variance,  proportion  of  agriculture, 
and  historical  proportion  of  small  grains  in  the 
stratum.  Since  all  the  preceding  determinations 
employ  data  not  from  the  current  year  but  from  re- 
cent years,  there  is  an  implicit  assumption  that  with- 
in-stratum small-grains  variances  are  about  the  same 
from  year  to  year,  at  least  for  the  purposes  of  sample 
allocation. 

The  regression  model,  also  known  as  the  “magic 
formula,"  was  developed  as  follows.  Let  a(and  g/be 
the  respective  agriculture  and  small-grains  propor- 
tions in  the  M segment  of  a stratum/substratum. 
Then,  if  r,  — gj  /<?,,  one  can  write 

a/V  (2) 

Within  a region  where  cropping  practices  are 
about  the  same,  it  is  not  unreasonable  to  expect  that 
a,  and  r,  are  independent;  i.e.,  knowledge  about  the 
amount  of  agriculture  in  a segment  provides  almost 
no  information  about  the  ratio  of  small  grains  to 
agriculture  in  that  segment.  (Note  that  for  all  seg- 
ments in  the  sampling  frame,  o(  3*  0.05;  therefore,  r, 
is  always  well  defined.)  Under  independence,  one 
can  write 


*The  term  “small  grains"  refers  to  the  combined  irops  ol 
wheat,  barley,  oats,  ami  rye. 


Note  that  £(a/)  and  f'ario,)  are  directly  estimable 
from  knowledge  of  a,  for  all  segments.  The  quantity 
E(r),  although  not  known  precisely,  can  be  approxi- 
mated by  r,  - £/<?,,  where  is  the  historical  small- 
grains  proportion  in  the  stratum  and  a,  — E(a). 
Although  Var{rf ) is  unknown,  it  is  assumed  that  it 
can  be  approximated  by  <r,(l  - /■,),  where  c is  a 
coefficient  between  0 and  1.  Under  all  the  preceding 
assumptions,  equation  (3)  can  be  written 

" (‘  ~ fo]  + r\2-  <4> 

where  aj  =»  Vai{g),  a2  « £(«?/),  and  <rj  « 

I'aiia,). 

In  the  United  States,  c was  estimated  by  (1)  using 
full-frame  imagery  to  estimate  directly  and  thus 
compute  a 2 for  all  segments  within  40  selected 
counties  (i.e.,  substrata)  and  (2)  regressing  the  quan- 
tities (ay  - r aj)ta,  against  r(l  ~ r)  over  the  40 
counties  to  obtain  r.  It  was  found  that  this  procedure 
gave  a better  fit  than  that  obtained  by  regressing  c r * 
against  #0  ~ #.')• 

Once  c was  determined,  op , and  r were  computed 
for  all  substrata  and  a- 2 was  estimated  using  equa- 
tion (4).  Finally,  the  sample  sizes  ik  were  computed 
using  the  estimate  of  <t2  instead  of  Pk(  1 — Pk)  in 
equation  (1).  In  other  countries,  a similar  procedure 
is  used  to  estimate  <rc*;  however,  where  strata  are 
much  larger  than  U.S.  counties,  the  assumptions 
which  led  to  equation  (4)  are  more  likely  to  be  false. 

For  level  D countries  in  Phase  III,  the  definition 
of  Group  III  was  changed  to  be  the  set  ,V  of  all 
substrata  such  that  (1)  the  total  historical  wheat 
acreage  for  the  substrata  in  S was  approximately  2.5 
percent  of  the  country's  historical  wheat  acreage  and 
(2)  if  .f,  and  .5',  were  substrata  such  that  .5,  « .S'  and  .5, 
i .S',  then  S2  had  (historically)  more  wheat  than  5,. 
The  values  of  zK  in  equation  (1)  were  then  computed 
only  for  the  substrata  remaining  after  the  elimination 
of  those  designated  as  Group  III. 

Another  modification  in  Phase  III  was  that  the 
values  of  tk  were  “probabilistically"  rounded  10  in- 
tegers (nk)  in  the  sense  that  if  tk  = m + r,  where  m is 
an  integer  and  0 < r < 1,  then  nk  was  randomly  set 
equal  to  m (with  probability  1 - ,)  or  m + 1 (with 
probability  /).  This  revised  rounding  procedure  made 
the  total  sample  size  much  closer  to  n than  did  the 
old  procedure. 

Finally,  in  Phase  III,  rather  than  allocate  4800 
sample  segments  to  8 countries  in  proportion  lo  their 
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production,  it  was  suggested  that  LAC1E  should  esti- 
mate for  each  country  the  sample  size  needed  to 
satisfy  a given  accuracy  criterion  and  use  these  sam- 
ple sizes  as  long  as  the  total  was  less  than  4800.  This 
procedure  was  followed  in  the  United  States  and  the 
U.S.S.R.  by  specifying  a desired  coefficient  of  varia- 
tion (CV)  for  the  production  estimate  of  each  coun- 
try. then  calculating  the  sample  size  necessary  to 
achieve  the  CV,  given  errors  due  to  (1)  sampling,  (2) 
classification,  (3)  yield  prediction,  and  (4)  loss  of 
data.  Using  the  best  available  a priori  estimates  of  the 
magnitude  of  the  errors,  one  can  approximate  the 
variance  of  the  production  estimate  as  a function  of 
the  total  sample  size  n using  the  optimal  allocation 
strategy.  The  equation  can  then  be  solved  for  ».  The 
resulting  expression  is  given  by 


where  n = the  total  number  of  sample  seg- 
ments allocated  to  the  area  of  in- 
terest 

Njk  = the  total  number  of  agriculture  seg- 
ments in  the  Ath  substratum/ 
stratum  in  the yth  yield  stratum2 
0/AJ  = estimate  of  segment-to-segment 
variance  of  the  estimated  small- 
grains  area  within  the  Ath 
substratum/stratum  of  theyth  yield 
stratum,  which  is  the  sum  of  sam- 
pling and  classification  compo- 
nents. The  sampling  component 
comes  from  equation  (4),  whereas 
the  classification  variance  is  an 
estimate  obtained  from  previous 
testing  of  classification  procedures. 
= average  yield  of  the  ,/th  yield 
stratum  over  the  most  recent  2 to  3 
years  (Obtained  from  the  LACIE 
yield  models  if  available;  other- 
wise, obtained  from  historical  in- 
formation. If  neither  of  these  is 


2A  yield  stratum  is  an  area  for  which  wheat  yield  is  assumed 

to  be  constant.  In  current  usage,  a yield  stratum  is  a union  of 
acreage  strata. 


available,  Ky  may  be  obtained  from 
soil  characteristic  maps  overlaid  on 
Landsat  imagery  of  the  area  of  in- 
terest.) 

Tj  — estimate  of  the  standard  deviation 
of  the  yield  estimate  in  the  ,/th 
yield  stratum 

L — the  total  number  of  yield  strata  in 
the  area  of  interest 

Lj  — the  total  number  of  substrata/ 
strata  in  the  fih  yield  stratum 
CV{P)  “ preassigned  value3  of  the  CV  of 
the  prodi  ction  estimate 

P — estimate  of  the  total  wheat  pro- 
duction in  the  country/area  of  in- 
terest based  on  historical  data 

Aj  = estimate  of  wheat  area  in  the  yth 
yield  stratum  based  on  historical 
information 

5 -=  a conservative  lower-bound  esti- 
mate of  the  expected  sample  ac- 
quisition rate  (determined  from 
previous  experience  with  loss  of 
segments  due  to  cloud  cover  or 
other  factors) 

After  determining  the  total  number  n of  segments 
to  be  allocated  to  the  area  of  interest,  the  optimal 
sample  size  njk  to  be  associated  with  the  Ath 
substratum/stratum  in  the  ,/th  yield  stratum  is 
defined  by 


/'=!  A=l 


where  [P]  denotes  probabilistic  rounding  to  the  in- 
teger either  above  or  below. 


3The  choice  for  (he  preassigned  value  for  the  production 
coefficient  of  variation  is  dependent  on  the  desired  accuracy  of 
the  production  estimate  for  the  area  of  interest.  In  this  case,  ac- 
curacy is  measured  with  respect  to  the  probability  space  (UJ?,P) 
where  /'operates  on  fi.  the  ir-field  of  Lebesgue  measurable  sub- 
sets of  11 . the  set  of  percentage  deviations  from  true  production. 
For  example,  if  a country  estimate  is  to  be  made,  the  goal  is  to  ob- 
tain a country  production  cslima.e  which  is  within  10  percent  of 
the  actual  production  with  a probability  of  at  least  0.9, 
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Using  these  revised  procedures,  the  allocation  for 
Phase  III  was  601  segments  for  the  United  States  and 
1 1 1 1 for  the  U.S.S.R.  In  the  United  States,  wheat  area 
was  estimated  for  a total  of  855  counties;  288  were  in 
Group  1;  164  were  in  Group  II;  and  403  were  in 
Group  HI. 

YIELD  8AMPLINQ  DESIGN 

In  LACIE,  wheat  yield  was  not  measured  or  esti- 
mated on  a point  or  segment  basis.  The  average  yield 
for  a large  area  called  a “yield  stratum”  was  esti- 
mated by  collecting  meteorological  information  from 
existing  weather  stations  scattered  throughout  the 
stratum  and  using  the  average  weather  as  input  to  a 
yield  prediction  regression  equation.  These  yield 
strata  were  typically  larger  than  acreage  strata, 
especially  in  level  D countries.  For  example,  in  the 


United  States,  yield  strata  were  about  the  size  of 
states  and  were  delineated  such  that  they  were 
relatively  homogeneous  with  respect  to  climate,  soil 
type,  and  other  factors  which  could  affect  wheat 
yield.  (For  details,  see  the  paper  by  McCrary  ct  al. 
entitled  “Operation  of  the  Yield  Estimation  Sub- 
system.”) 


PRODUCTION 

Wheat  production  was  not  sampled  or  estimated 
directly  hut  was  computed  by  multiplying  yield  esti- 
mates by  acreage  estimates.  (For  details,  see  the 
paper  by  Chhikara  and  Feiveson  entitled  “Large- 
Area  Aggregation  and  Mean-Squared  Prediction  Er- 
ror Estimation  for  LACIE  Yield  and  Production 
Forecasts.”) 
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LACIE  Area  Sampling  Frame  and  Sample  Selection 


C J.  Liszc r* 


INTRODUCTION 

Early  in  the  design  phase  of  LACIE,  it  was 
decided  not  to  use  a completely  enumerated  area  for 
developing  the  sampling  frame  because  much 
wasteful  (totally  nonagricultural)  information  would 
be  included.  The  alternative  was  to  delineate  all  the 
agricultural  and  nonagricultural  areas  and  to  locate 
the  sample  segments  in  the  agricultural  areas.  An 
agricultural  area  was  defined  on  full-frame  Landsat 
color-infrared  (CIR)  imagery  as  an  area  having  dis- 
cernible field  patterns.  For  those  sections  of  a coun- 
try for  which  there  was  no  imagery,  operational 
navigation  charts  (ONC's)  were  used,  and  only  those 
areas  that  were  definitely  nonagricultural  (e.g., 
mountains,  deserts,  tundras,  and  lakes)  were  elimi- 
nated as  potential  sample  segments.  In  other  parts  of 
the  LACIE  countries,  some  imagery  was  unusable 
because  of  cloud  or  snow  cover;  both  these  situations 
required  the  use  of  ONC’s. 

In  later  LACIE  phases,  most  of  the  agricultural 
and  nonagricultural  delineations  were  accomplished 
using  full-frame  CIR  imagery.  This  procedure 
necessitated  moving  some  segments  from  what  was 
potentially  agricultural  on  the  ONC’s  to  “real” 
agricultural  as  interpreted  on  the  CIR  imagery. 

This  agricultural  area,  composed  of  5-  by  6- 
nautical-mile  segments  for  each  country's  smallest 
political  unit,  was  the  area's  sampling  frame.  The 
number  of  these  segments  for  each  country's 
smallest  political  unit  was  the  pseudocounty.  This 
terminology  was  used  for  ail  countries  even  though 
the  smallest  political  units  of  some,  such  as  the 
U.S.S.R.  and  the  People's  Republic  of  China 
(P.R.C.),  were  very  large  and  were  oblasts,  states, 
etc.,  instead  of  counties. 

Constraints  were  imposed  on  sample  segment 
locations  because  of  the  computer  hardware  avail- 
able for  selecting  or  stripping  the  sample  segment 
data  from  full-frame  data.  This  limited  the  number 
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of  segments  that  could  be  obtained  per  frame  in  the 
east-west  and  north-south  directions. 

An  additional  constraint  was  imposed,  by  the 
gathering,  sorting,  and  stripping  agency,  to  verify  the 
actual  acquired  segment  location  and  the  registration 
of  subsequent  acquisitions  of  that  segment.  The 
center  point  of  a segment’s  first  acquisition  was 
specified  as  being  within  2 nautical  miles  in  the 
north-south  direction  and  within  3 nautical  miles  in 
the  east-west  direction.  The  location  and  registration 
of  subsequent  acquisitions  of  the  same  segment  were 
specified  as  being  within  one  picture  element  (pixel) 
of  the  first  acquisition.  To  guarantee  these  specifica- 
tions, a rectangle  of  data  (approximately  10  by  12.5 
nautical  miles)  enclosing  each  sample  segment  had 
to  be  stripped  from  the  Landsat  full-frame  image 
(100  by  100  nautical  miles).  To  ensure  there  was  no 
overlap  (a  hardware  requirement),  the  center  points 
of  all  sample  segments  had  to  be  separated  by  a 
minimum  of  10  nautical  miles  north-south  and  12 
nautical  miles  east-west. 

In  some  ! tances,  primarily  in  those  countries 
with  small  political  units,  this  minimum-distance 
constraint  made  it  impossible  to  place  the  allocated 
number  of  sample  segments  in  the  political  area. 
Fewer  segments  were  thus  placed  in  those  countries. 

The  procedures  for  the  agricultural  delineation 
and  sample  segment  location  in  a LACIE  country  are 
discussed  in  this  paper.  These  procedures  also  con- 
tain the  description  of  required  materials  and  the 
definition  of  terms  peculiar  to  LACIE. 


DETERMINATION  OF  AREA 
SAMPLING)  FRAME 

Transparent  acetate  overlays  containing  agri- 
cultural boundaries  within  each  of  the  LACIE  coun- 
tries were  prepared  and  registered  to  ONC's.  Full- 
frame  Landsat  CIR  images  of  the  same  scale  as  the 
ONC’s  (1:1  million)  were  used  to  identify  agri- 
cultural boundaries  based  on  discernible  agricultural 
field  patterns. 
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Preliminary  Step* 

The  following  preliminary  steps  were  taken  in  the 
preparation  of  the  base  map  overlay. 

1.  ONC  base  maps  (1:1  million  scale)  were  ob- 
tained through  the  NASA  LACIE  Physical  Data 
Library  (LPDL).  All  ONC’s  obtained  had  to  be  of 
the  same  series  and  publishing  date  to  simplify  the 
registration  of  the  completed  overlays. 

2.  The  availability  of  ONC  base  physical  feature 
overlays  from  the  Aeronautical  Chart  and  Informa- 
tion Command  (ACIC)  was  checked.  When  avail- 
able, these  base  overlays  should  have  matched  the 
series  and  date  of  the  ONC  map  used  in  the 
agricultural  and  nonagricult.  :a!  delineations,  it  was 
desirable  that  these  physical  feature  overlays  be  in 
some  color  other  than  black  because  of  the  black 
lines  already  on  the  ONC. 

3.  The  Landsat  C1R  imagery  available  from  the 
LPDL  and  the  Classification  and  Mensuration  Sub- 
system (CAMS)  was  researched.  Landsat  CIR  imag- 
ery of  9 by  9 inches  was  used  to  construct  the  agri 
cultural  and  nonarricultural  overlays.  The  regional 
analyst  determined  what  imagery  was  available  to 
complete  the  agricultural  and  nonagricultural  tasU.  A 
request  was  made  to  the  NASA  Systems  and 
Facilities  Branch  to  acquire  imagery  that  was  not 
already  in  the  LACIE  system.  Specific  requirements 
such  as  location  and  dates  of  usable  imagery,  max- 
imum cloud  cover,  and  image  quality  accompanied 
the  request. 

4.  After  the  in-house  CIR  imagery  had  been  ac- 
cumulated, it  was  reviewed  for  the  following 
qualities. 

a.  Scene  number. 

b.  Date  of  imager’/  (to  determine  usability 
with  respec.  to  agriculture). 

c.  Percentage  of  cloud  cover  and  areas  co‘.  ered. 
If  the  cloud  coverage  was  mostly  over  a lake  or  city 
and  not  over  an  agricultural  area  (even  if  the  percent- 
age was  high),  the  imagery  could  still  be  usable. 

d.  Seasonal  coverage  (with  respect  to  agricul- 
tural area). 

e.  Image  quality. 

f.  Multidate  coverage  (to  enhance  interpreta- 
tion). 

After  the  review,  the  imagery  was  logged  and  filed 
on  an  ONC  basts.  The  review  parameters  for  each 
CIR  image  were  noted  in  the  log. 


Construction  of  *n  Overlay  of  th« 

Bom  Mop  Phvolool  Pooturoo 

The  purpose  of  constructing  an  overlay  of  the  base 
map  physical  features  (e.g.,  streams,  rivers,  and 
lakes)  was  to  register  the  Landsat  CIR  imagery  to  the 
overlay  used  in  constructing  the  agricultural  and 
nonagricultural  delineations.  The  procedure  used 
was  as  follows. 

1.  Control  coordinates  were  marked  on  an  ONC 
base  map  overlay . These  marks  were  usually  made  in 
each  of  the  four  corners  and  at  the  upper  and  lower 
center  points  of  the  ONC.  The  marks  were  matte 
with  a straightedge,  and  the  appropriate  geographical 
coordinates  were  printed  near  each  mark. 

2.  All  mqjor  physical  features  (streams,  rivers, 
lakes)  were  delineated  or  traced  on  the  overlay  with 
blue  ink.  Minor  features  were  delineated  only  if  ma- 
jor features  were  lacking,  so  that  the  CIR  imagery 
could  be  registered  to  the  overlay. 

{NOTE:  If  ONC  base  physical  feature  overlays 
were  available  from  the  ACIC,  steps  I and  2 were 
deleted.) 

3.  The  base  map  overlay  was  titled  and  the  date  of 
completion  was  noted.  The  title  on  the  overlay  in- 
cluded the  country,  ONC  map  number,  map  scale, 
and  map  series  and  date  numbers. 


Agricultural  and  Nonagricultural 
Dalinaation  Overlay 

The  agricultural  and  nonagricultural  delineation 
overlay  was  constructed  by  registering  the  agricul- 
tural and  nonagricultural  overlay  to  the  overlay  of 
the  base  map's  physical  features.  This  was  ac- 
complished as  follows. 

!.  The  Landsat  CIR  imagery  was  registered  with 
the  overlay  by  alining  the  imagery  to  the  overlay  of 
the  base  map's  physical  features. 

2.  The  agricultural  and  nonagricultural  areas  were 
delineated  by  outlining  the  field  and/or  nonfield  pat- 
terns, enclosing  all  constructed  lines,  and  marking  an 
appropriate  symbol  (A  for  agricultural  or  N for  non- 
agricultura!)  within  each  constructed  area. 

3.  All  cloud  cover  regions,  regions  in  which  snow 
cover  prohibited  agricultural  and  nonagricultural 
delineation,  or  areas  for  which  imagery  was  not 
available  were  marked. 
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4.  Common  coordinate  marks  were  matched  to 
all  adjoining  ONC  overlays. 

5.  All  agricultural  and  nonagricultural  delineation 
lines  were  matched  between  adjoining  O.sC’s. 


Rational*  for  DoHnoatlofl  of  Agricultural 
•nd  llnnnnrinuttiarnl  Areas 

The  following  criteria  were  used  in  delineating 
areas  as  either  agricultural  or  nonagricultural  in  the 
construction  of  the  overlays. 

1.  All  recognizable  field  patterns  were  classified 
as  agricultural. 

2.  All  other  areas  were  classified  as  non* 
agricultural. 

3.  All  contiguous  nonagricultural  areas  greater 
than  or  equal  to  4 square  miles  in  size  were  delin- 
eated. For  India  and  the  P.R.C.,  where  agricultural 
areas  are  present  but  the  field  sizes  are  too  small  to  be 
viewed  on  the  imagery,  agricultural  was  defined  as  all 
areas  not  meeting  the  nonagricultural  criterion.  The 
nonagricultural  criterion  was  defined  as  including 
only  obvious  areas  such  as  mountains,  desens, 
forests,  flood  plains,  or  other  physical  geomorphic 
phenomena  visible  in  the  imagery.  If  there  was  any 
question  about  whether  the  area  being  viewed  was 
agricultural  or  nonagricultural,  it  was  called 
agricultural, 

4.  All  imagery  used  was  recorded. 

A log  was  maintained  of  each  image  used  to  make 
the  agricultural  and  nonagricultural  delineations.  The 
log  contained  the  scene  number  (orbit  and  row  num- 
ber); date  of  image  scene;  image  quality  (poor,  fair, 
or  good);  cloud  cover  percentage;  and  other  com- 
ments, including  snow  cover,  unusual  features  or 
field  patterns,  and  rationale  for  decisions  made  in  in- 
terpretation. This  log  reflected  the  imagery  used  on 
an  ONC  basis  and  showed  all  LACIE  imagery  used 
up  to  the  agricultural  and  nonagricultural  delineation 
completion  date.  The  completed  overlay  was  indexed 
and  distributed. 

If  a determination  had  to  be  made  about  whether  a 
particular  area  was  agricultural  or  nonagricultural 
(other  than  just  described),  the  delineation  was 
based  on  the  regional  geographic  knowledge  of  the 
area(s)  in  question.  The  rationale  used  in  determin- 
ing the  delineation  was  outlined  in  the  documenta- 
tion that  accompanied  each  individual  agricultural 


and  nonagricultural  product.  The  CAMS  senior 
country  analysts  were  consulted  on  this  matter. 


•AMPLE  SEGMENT  SELECTION 

The  equipment  or  material  required  for  sample 
segment  selection  included  the  following. 

1.  ONC's  of  the  country. 

2.  A dot  matrix,  to  the  scale  of  the  ONC's,  spiced 
6 nautical  mites  in  the  east-west  direction  and  S 
nautical  miles  in  the  north-south  direction. 

3.  An  overlay  of  100  by  100  nautical  mites,  to  the 
scale  of  the  ONC's,  with  a center  point  surrounded 
by  a rectangle  of  5 by  6 nautical  mites,  surrounded  by 
a rectangle  of  10  by  1 2 nautical  mites.  The  boundaries 
of  this  larger  rectangle  are  drawn  out  to  the  edges  of 
the  Landsat  scene  of  100  by  100  nautical  mites 
(frame  boundaries).  Lines  were  drawn  running 
north-south,  10  nautical  mites  inside  both  the  east 
and  west  boundaries  of  the  square  of  100  by  100 
nautical  miles,  producing  a rectangle  80  nautical 
miles  in  the  east-west  direction  and  100  nautical 
mites  in  the  north-south  direction, 

4.  Lists  of  random  numbers  covering  the  ranges 
of  I to  10, 1 to  20, 1 to  30,  etc.,  up  to  1 to  300. 

5.  A data  form  indicating  zone,  stratum,  and 
substratum.  If  there  were  more  than  one  segment  in 
the  stratum/substratum,  the  segment  that  was 
selected  was  indicated.  Space  was  provided  on  the 
fotm  to  enter  the  number  of  segments  (rectangles  of 
5 by  6 nautical  miles)  in  the  stratum/substratum,  the 
number  of  segments  in  nonagricultural  areas  of  the 
stratum/substratum,  the  number  of  segments  in  the 
agricultural  areas  of  the  stratum/substratum,  the 
number  of  the  Landsat  track  passing  through  the 
stratum/substratum,  the  latitude  and  longitude  of  the 
center  of  the  selected  sample  segment,  and  the  spring 
and  winter  wheat  sample  segment. 

The  selection  procedure  consisted  of  the  following 
steps. 

1.  The  dot  grid  was  fastened  to  an  ONC. 

2.  A stratum/substratum  was  selected  from  the 
data  form  and  located  on  the  ONC.  Its  boundaries 
were  then  drawn  on  the  ONC  overlay. 

3.  The  number  of  segments  in  the  stratum/ 
substratum  was  counted  (a  segment  was  in  if  its 
center  point  was  in  the  stratum/substratum)  and  en- 
tered on  the  data  form.  If  the  center  point  of  a seg- 


55 


mem  fell  exactly  on  a boundary,  a coin  was  flipped. 
If  it  came  up  heads,  the  segment  was  placed  in  the 
stratum/substratum  being  worked;  if  it  came  up  tails, 
the  segment  was  placed  in  the  adjacent 
stratum/substratum. 

4.  The  number  of  segments  that  fell  entirely  in 
nonagricultural  areas  was  counted  and  entered  on  the 
form. 

5.  The  number  in  step  4 was  subtracted  from  the 
number  in  step  3,  and  the  difference  was  recorded  on 
the  data  form. 

6.  Starting  with  the  most  northwestern  segment 
all  segments  determined  in  step  5 were  numbered. 

7.  The  lowest  random  number  table  that  included 
the  last  number  entered  in  step  6 was  selected. 

8.  The  first  unused  random  number  from  the  ta- 
ble  was  used  to  determine  which  segment  was  the 
sample  segment. 

As  mentioned  in  the  introduction,  not  all  sample 
segments  were  located  because  of  certain  engineering 
constraints.  These  constraints  and  changes  in  the 
location  of  a sample  segment  were  determined  as  de- 
scribed in  the  following  flow-diagram  procedures. 

1.  Place  the  center  point  of  the  overlay  of  100  by 
100  nautical  miles  on  the  center  point  of  the  sample 
segment. 

2.  Is  there  another  sample  segment  within  the 
rectangle  of  10  by  12  nautical  miles? 

a.  Yes.  Discard  newest  sample  segment  and 
return  to  step  8 of  the  selection  procedure. 

b.  No.  Proceed. 

3.  Are  there  more  than  four  sample  segments  be- 
tween the  extended  boundaries  of  the  rectangle  of  10 
by  12  nautical  miles  running  in  the  east-west 
direction? 

a.  Yes.  Discard  newest  sample  segment  and 
return  to  step  8 of  the  selection  procedure. 

b.  No.  Proceed. 

4.  If  any  other  sample  segments  fall  within  the 
east-wcsi  boundaries  of  the  rectangle  of  10  by  12 
nac'ical  miles,  move  the  center  point  of  the  overlay 
to  the  center  points  of  these  segments  to  determine 


whether  the  new  sample  segment  causes  more  than 
four  sample  segments  to  be  within  these  boundaries. 

a.  Yes.  Discard  newest  sample  segment  and 
return  to  step  8 of  the  selection  procedure. 

b.  No,  Proceed. 

5.  With  the  center  point  of  the  overlay  back  on 
the  center  point  of  the  newest  sample  segment,  are 
there  more  than  eight  sample  segments  between  the 
extended  boundaries  of  the  rectangle  of  10  by  12 
nautical  miles  running  in  the  north-south  direction? 

a.  Yes.  Discard  newest  sample  segment  and 
return  to  step  8 of  the  selection  procedure. 

b.  No.  Proceed. 

6.  tf  any  other  sample  segments  fall  within  the 
north-south  extended  boundaries  of  the  rectangle  of 
10  by  12  nautical  miles,  move  the  center  poini  of  the 
overlay  to  the  center  points  of  these  segments  to 
determine  whether  the  new  sample  segment  causes 
more  than  eight  sample  segments  to  be  within  these 
boundaries. 

a.  Yes.  Discard  newest  sample  segment  and 
return  to  step  8 of  the  selection  procedure. 

b.  No.  Proceed. 

7.  Determine  whether  the  segment  is  a spring  or 
winter  wheat  segment,  based  on  whichever  com- 
prises more  than  30  percent  of  the  total  wheat  area  in 
the  stratum/substratum. 

8.  Record  data  for  this  sample  segment  as  re- 
quired on  data  forms. 

{NOTE:  Latitude  and  longitude  arc  recorded  to 
degrees  and  minutes  only.) 

9.  Return  to  step  2 of  the  selection  procedure  and 
continue  until  the  number  of  allocated  segments  for 
a country  is  selected  and  located. 

# 
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LACIE  Large-Area  Acreage  Estimation 

R.  S.  Chhikora a and  A.  H.  Feivesonb 


INTRODUCTION 

This  paper  describes  the  procedure  for  estimating 
wheat  acreage  for  a targe  area,  given  estimates  for  the 
sample  segments.  A segment  wheat  acreage  estimate 
is  obtained  by  multiplying  its  smali-grains  acreage 
estimate  as  computed  by  the  Classification  and  Men- 
suration Subsystem  (CAMS)  by  the  best  available 
ratio  of  wheat  to  small-grains  acreages  obtained  using 
historical  data.  The  CAMS  approach  for  estimating 
segment  small-grains  acreages  is  described  in  the 
symposium  paper  by  Heydorn  et  al.  entitled 
“Classification  and  Mensuration  of  LACIE  Seg- 
ments," and  the  econometric  models  used  in  predict- 
ing the  ratio  of  wheat  to  small-grains  acreages  are 
given  in  the  paper  by  Umberger  et  al.  entitled 
“Econometric  Models  for  Predicting  Confusion 
Crop  Ratios.” 

In  the  United  States  and  in  other  countries  with 
detailed  historical  data,  sample  allocation  was  made 
at  the  substratum  level.  As  a result,  the  acreage 
estimation  in  countries  with  detailed  historical  data 
requires  one  level  of  aggregation  more  than  it  would 
in  other  countries.  The  estimation  procedure  de- 
scribed in  this  paper  is  for  the  United  States,  but  it  is 
equally  applicable  to  other  LACIE  countries  with 
detailed  historical  data.  Also  described  are  the  essen- 
tial features  of  the  estimation  procedure  for  the  re- 
maining LACIE  countries. 

The  U.S.  counties  correspond  to  substrata  and 
were  grouped  into  three  categories  for  sample  alloca- 
tion. Those  in  Group  I were  allocated  at  least  one  seg- 
ment each,  two-stage  probability-proportiona[-to-size 
(PPS)  sampling  was  used  in  Group  II  counties,  and 
no  sample  segments  were  allocated  to  counties  in 
Group  III.  In  the  United  States,  a stratum  (crop  re- 
porting district  (CRD))  corresponds  to  a collection 
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of  counties,  a zone  corresponds  to  a state,  and  a 
region  , ^responds  to  a collection  of  states  (see  the 
paper  by  Feiveson  et  al.  entitled  “LACIE  Sampling 
Design"  for  details). 

Wheat  acreage  estimates  are  made  for  each 
stratum,  zone,  and  region  in  a LACIE  country. 
However,  no  estimate  is  made  for  a zone  unless  it 
contains  at  least  three  segments  satisfactorily  pro- 
cessed by  CAMS.  A segment  wheat  acreage  estimate 
may  not  be  made  in  the  following  cases  of  non- 
response. 

1.  The  sample  segment  was  obscured  by  cloud 
cover. 

2.  Landsat  data  quality  was  insufficient  to  permit 
processing. 

3.  Landsat  data  acquisition  was  not  properly 
registered  with  the  reference  Landsat  image. 

4.  The  acquisition  and/or  processing  procedures 
failed  to  provide  an  acceptable  estimate. 


ACREAGE  ESTIMATION 

A CRD  (stratum)  acreage  estimate  consists  of 
three  components. 

1.  An  acreage  estimate  is  made  for  the  Group  I 
counties  (substrata)  for  which  segment  data  exist.  (A 
Group  I county  is  treated  as  a Group  III  county  if  it 
does  not  have  at  least  one  segment  with  an  accepta- 
ble wheat  proportion  estimate.) 

2.  An  acreage  estimate  is  made  for  the  entire  set 
of  Group  II  counties  in  the  CRD  if  there  is  at  least 
one  segment  with  an  acceptable  wheat  proportion 
estimate  in  this  set  of  counties.  (Otherwise,  the 
Group  II  counties  are  all  treated  as  Group  III  coun- 
ties.) 

3.  An  acreage  estimate  is  made  for  tne  Group  III 
counties,  including  the  Group  I and  II  counties  being 
treated  as  Group  III  counties. 

The  wheat  acreage  estimates  for  these  three  com- 
ponents are  obtained  as  follows. 
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Group  I Substrata  estimation 


where  My  “ 


Group  I counties  are  treated  as  strata,  and  a 
stratified  random  sampling  estimator  is  used  to  esti- 
mate their  wheat  acreages.  The  estimate  for  the  col- 
lection of  Group  l counties  in  the  7th  CRD  is  given 

by 
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where  Ly  - number  of  Group  I counties  in  the 
7th  CRD 

Ayi  ■»  wheat  acreage  estimate  for  the  Ath 
Group  I county  in  theTth  CRD 
The  county  (substratum)  estimate  is  obtained  by 
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number  of  sample  segments  for 
which  acreage  estimates  are  available 
in  the  Group  U substrata  of  the  7th 
stratum 

ratio  of  the  true  Ath  Group  II 
substratum  area  to  its  GPC  area 
number  of  segments  (after  exclusion 
of  nonagncuitural  segments)  in  the 
Ath  Group  II  substratum  of  the  yth 
stratum 

wheat  acreage  estimate  of  the  sample 
segment  belonging  to  the  4th 
substratum  in  the 7th  si  'turn  (There 
is  only  one  segment  alloc,  d in  each 
selected  Group  II  substratum  ' 
probability  of  selection  for  the  Ath 
Group  II  substratum  of  the  7th 
stratum,  given  by 
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ratio  of  the  true  Ath  substratum  area 
to  its  gross  pseudosubstratum 
(GPC)  area  (see  the  paper  by  Liszcz 
entitled  “LACIE  Area  Sampling 
Frame  and  Sample  Selection”  for 
the  definition  of  GPC) 
number  of  segments  (after  exclusion 
of  nonagricultural  segments)  in  the 
At h substratum  of  the 7th  stratum 
number  of  sample  segments  for 
which  estimates  are  available  in  the 
Ath  substratum  of  theTth  stratum 
estimated  wheat  acreage  for  the  Ah 
sample  segment  in  the  Ath  substra- 
tum of  the 7th  stratum 


Group  1 Substrata  Estimation 

A PPS  estimator  is  used  to  estimate  the  wheat 
acreages  for  the  Group  II  collection  of  substrata.  The 
wheat  acreage  estimate  for  the  Group  II  collection  of 
substrata  in  theTth  stratum  is  given  by 
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where  — wheat  acreage  harvested  in  the  pri- 
mary epoch  year  in  the  Ath  Group  II 
substratum  of  theTth  stratum 
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Ly  - number  of  Group  II  substrata  in  the 
7th  stratum 


Group  III  Substrata  Estimation 

The  wheat  acreage  estimate  for  the  collection  of 
Group  III  substrata  is  obtained  by  means  of  a ratio 
estimator.  Depending  on  the  number  of  segments  in 
a stratum  for  which  estimates  are  available,  three 
categories  of  Group  III  acreage  estimates  are  possi- 
ble. Categories  1,2,  and  3 correspond  respectively  to 
three  or  more  segments,  one  or  two  segments,  and  no 
segments  having  estimates  available.  The  ratio  used 
for  the  Group  III  estimator  is  the  historical  wheat 
acreage  for  the  Group  111  counties  divided  by  the 
historical  wheat  acreage  for  the  combined  Group  I 
and  II  counties. 
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For  category  1 estimates  (three  or  more  usable 
segments  in  the  stratum),  the  ratio  is  based  on 
historical  acreages  only  within  the  stratum.  The 
acreage  estimate  tor  the  Group  III  substrata  in  the  Ah 
stratum  is  given  by 


where  .4,,  and  A ?/  are  given  by  equations  (1)  and  (3) 
and  WXr  WXil  and  WV  are  the  historical  wheat 
acreages  for  Group  l,  II.  and  III  substrata  in  the 
stratum,  respectively. 

For  category  2 and  3 estimates  (less  than  three 
usable  segments  in  the  stratum),  the  ratio  is  based  on 
acreages  in  the  zone  containing  the  stratum  for 
which  the  estimate  is  being  made,  and  the  acreage 
estimate  for  the  Group  III  substrata  in  the  Ah 
stratum  is  obtained  by 


where  a dot  (.)  in  a subscript  denotes  the  summation 
over  all  the  Group  I or  Group  II  substrata,  whichever 
is  the  case,  in  the  zone.  The  reason  for  differentiating 
between  categories  2 and  3 will  become  evident  in 
the  section  dealing  with  stratum  variance  estimation. 


where  Rt  - ratio  of  the  actual  area  to  the  pseudo 
gross  area  for  thejlh  stratum 
Nj  **  number  of  segments  (excluding  the 
nonagricultural  segments)  in  the  Ah 
stratum 

tij  — number  of  sample  segments  for 
which  estimates  are  available  in  the 
Ah  stratum 

AJk  — wheat  acreage  estimate  for  the  kxh 
segment  in  the  Ah  stratum 
It  is  required  to  have  ttj  & 3;  otherwise,  no  acreage 
estimate  for  the  stratum  is  made. 


Higher  Level  Eetlmetlons 

The  wheat  acreage  estimate  for  a zone,  a region,  or 
a country  is  obtained  by  adding  estimates  for  the 
strata  included  in  the  zone,  region,  or  country.  The 
acreage  estimate  at  the  zone  level  is  obtained  by 
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where  S — number  of  strata  in  the  ah  zone,  rth 
region  of  the  country 

- acreage  estimate  of  the  Ah  stratum,  ah 
zone,  rth  region  of  the  country 
The  acreage  estimate  at  the  region  level  is  ob- 
tained by 


Stratum  Estimation 

In  the  United  States  and  in  other  LACIE  coun- 
tries with  detailed  historical  data,  the  wheat  acreage 
estimate  of  each  stratum  is  computed  as  the  sum  of 
the  Group  I.  II,  and  III  component  estimates  which 
comprise  the  stratum,  as  follows. 
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where  R is  the  number  of  zones  in  the  rth  region  of 
the  country. 

The  acreage  estimate  at  the  country  level  is 
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In  the  U.S.S.R.  and  in  other  LACIE  countries 
without  detailed  historical  data,  the  wheat  acreage 
estimate  of  a stratum  is  given  by 
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where  K is  the  number  of  regions  in  the  country 
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ACREAGE  VARIANCE  ESTIMATION 

* 

in  countries  with  detailed  historical  data,  the  prob- 
lem of  acreage  variance  estimation  involves  several 
complexities  resulting  from  the  use  of  a two-stage 
PPS  sampling  scheme  for  the  Group  II  substrata  and 
the  availability  of  only  one  sample  segment  per 
substratum  in  most  cases.  The  estimation  procedure 
in  such  countries  consists  of  a series  of  steps  to  be  de- 
scribed in  the  subsections  to  follow.  On  the  other 
hand,  it  is  fairly  straightforward  to  estimate  the 
variance  for  countries  that  lack  historical  data;  in  this 
case,  no  direct  variance  estimate  is  attempted  for  a 
stratum  containing  less  than  three  processed  seg- 
ments, and  all  strata  belong  to  the  Group  I category. 


ing  variance  estimate  for  the  Group  II  substrata  in 
the  yth  stratum  is  obtained. 
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Group  I Substrata  Variance  Estimation 

The  variance  estimate  for  the  Group  I substrata 
acreage  estimate  is  obtained  using  the  variance  for- 
mula for  a stratified  random  sampling.  For  the  yth 
stratum,  it  is  computed  by 

n,  ■ E <>» 

k”l  »/* 

where  i)y,  Rljk,  and  MlJk  arc  as  defined 
previously  and  S^is  the  within-substratum  variance 
estimate  to  be  computed  according  to  the  procedure 
described  in  a succeeding  section.  The  finite  popula- 
tion correction,  is  negligible;  hence,  it  is 

not  considered  in  equation  (12). 


where  Ly,  Ry,.,  and  ir^  are  as  defined 
previously  and  SsL  is  the  within-substratum  variance 
estimate  to  be  computed  according  to  the  procedure 
in  the  succeeding  subsection.  The  Yy^s  are  the 
Group  il  substrata  historical  wheat  acreages  during  a 
year  other  than  the  primary  epoch  year;  i.e., 

- wheat  acreage  harvested  in  the  second- 
ary epoch  year  in  the  kih  Group  II 
substratum  in  theyth  stratum 

— wheat  acreage  harvested  in  the  second- 
ary epoch  year  in  the  rth  Group  II 
substratum  in  the/th  stratum 

iryk,  “ probability  of  having  the  pair  of  Group  II 
substrata  k and  / selected  in  the  sample. 
For  i f-k,  nykl  is  determined  according 
to  the  procedure  given  by  Hartley  and 
Rao  (ref.  1)  and  is  computed  using  the 
following  formula. 


Group  II  Substrata  Variance  Estimation 

The  variance  of  the  estimate  for  a collection  of 
Group  II  substrata  consists  of  a within-substratum 
variance  component  and  s between-substrata 
variance  component.  The  first  component  can  be 
estimated  in  a manner  similar  to  the  Group  I case, 
but  estimation  of  the  second  component  requires  ad- 
ditional historical  acreages  for  ail  Group  II  substrata 
in  each  stratum.1  Using  the  Hariley-Rao  PPS  sam- 
pling approach  described  in  reference  1.  the  follow- 


*H.  O,  Hartley  and  D.  A.  Lamb;  Formulas  for  Variance 
Estimation  in  Proposed  LAC1E  Sampling  Plan.  Technical  Report 
to  NASA,  1975. 
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where  M2j  is  as  defined  previously. 


Within-Substratum  Varianca  Estimation 

Often,  there  is  only  one  sample  segment  in  a 
substratum;  therefore,  no  direct  estimate  of  the  with- 
in-substratum  variance  is  possible,  if  variances  for 
substrata  are  assumed  to  be  equal,  substrata  are  col- 
lapsed to  form  a new  stratum  and  its  sample  variance 
provides  an  estimate  of  the  within-substratum 
variance.  But  this  technique  generally  leads  to  an 
overestimation  of  the  variance  and  hence  provides  a 
biased  estimate  of  the  variance.  Different  methods  of 
collapsing  strata  have  been  suggested  by  Hensen  et 
al.  (ref.  2),  Cochran  (ref.  3),  and  Seth  (ref.  4). 
Another  variance  estimate  is  possible  using  the 
method  of  Hartley  et  al.  (ref.  5),  who  suggest  the 
regression  approach  and  show  that  their  technique 
may  lead  to  smaller  bias  in  variance  estimation  as 
compared  to  the  collapsed  strata  technique.  But,  as  is 
pointed  out  by  Hartley  et  al.,  this  technique  could 
lead  to  negative  variance  estimates,  particularly  if  the 
concomitant  variables  are  not  well  correlated  with 
the  stratum  means.  In  an  earlier  study  (ref.  6),  ap- 
plication of  the  collapsed  strata  and  Hartley-Rao- 
Kiefer  (ref.  5)  techniques  for  variance  estimation  did 
not  seem  to  provide  satisfactory  results;  the  former 
led  to  overestimation  and  the  latter  to  negative  esti- 
mates. However,  when  the  Hartley-Rao-Kiefer  ap- 
proach was  combined  with  the  collapsed  strata  ap- 
proach, where  first  substrata  were  grouped  into 
groups  of  substrata  as  homogeneous  as  possible  and 
then  a separate  regression  was  performed  for  each 
group  of  substrata,  the  empirical  results  were  more 
satisfactory.  Therefore,  the  method  combining  the 


two  approaches  was  adopted  for  use  in  LAC1E.  It  is 
based  on  the  assumption  that  the  historical  county 
proportions  are  well  correlated  with  the  CAMS  esti- 
mates of  segment  proportions.  The  method  consists 
of  (1)  forming  homogeneous  groups  of  substrata  in  a 
zone  with  respect  to  a priori  estimates  of  within- 
substratum  variability,  (2)  performing  regression  of 
the  CAMS  segment  wheat  proportion  estimates  on 
the  substratum  historical  wheat  proportions,  and  (3) 
taking  the  residual  mean  squared  error  (MSE)  as  an 
estimate  of  the  within-substratum  variance  for  each 
substratum  in  a group. 

Segments  within  a zone  are  grouped  into  collec- 
tions according  to  the  corresponding  a priori  within- 
substratum  standard  deviations  9k  used  in  the  origi- 
nal allocation  (see  the  paper  by  Feiveson  et  al.). 
These  collections  of  substrata  are  required  to  be  as 
homogeneous  as  possible,  and  each  must  have  an 
adequate  number  of  segments  to  allow  a reliable  esti- 
mate of  the  variance.  The  following  conditions 
should  be  satisfied  in  forming  the  collections. 

1.  No  collection  should  contain  less  than  eight 
segments  unless  there  are  less  than  eight  segments 
available  for  the  zone.  This  constraint  is  to  ensure  an 
adequate  number  of  degrees  of  freedom  for  obtaining 
a reliable  linear  regression  equation. 

2.  All  segments  in  the  same  substratum  shall  be 
in  the  same  collection.  It  is  necessary  to  use  every 
available  segment  in  a substratum  for  estimating  its 
variance. 

3.  The  number  of  collections  c shall  be  given  ini- 
tially by 


c * 1 provided  NS  < 16 
c — 2 provided  16  *£  NS  < 24 
c -•  3 provided  NS  3*  24 


where  NS  is  the  number  of  available  segments  in  the 
zone.  If  conditions  1 and  2 cannot  be  satisfied  when 
NS  is  greater  than  or  equal  to  16,  reduce  the  value  of  c 
by  1.  This  condition  is  imposed  to  keep  down  the 
number  of  collections  to  avoid  an  unnecessarily  fine 
grouping  of  substrata. 

4.  If  c is  greater  than  1,  the  collections  should 
maximize  the  ratio  of  the  between-collection 
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variance  to  ihe  within-collection  variance  of  the  where  pr,  *■  ar  + bt  \fl  (the  predicted  value  using  the 
1.e.tlet  regression  equation). 
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where  d/is  the  number  of  segments  in  the  ith  collec- 
tion, 0rj  is  the  a priori  small-grains  standard  deviation 
(see  the  paper  by  Feiveson  et  al.  and  ref.  7)  associ- 
ated with  the  substratum  containing  the  yth  segment 
in  the  rth  collection  (note  that  segments  in  ihe  same 
substratum  have  duplicate  9rj  values). 


'/« t r=l 
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Since  by  constraint  2,  every  substratum  is  associated 
with  one  and  only  one  collection,  the  variance  S,2  is 
assigned  to  every  substratum  having  segments  in  the 
nh  collection  (whether  the  substratum  is  Group  I or 
II);  i.e.,  (or  ) **  Sr2  if  the  segments  in  the  Arth 
substratum  of  theyth  stratum  belong  to  the  rth  collec- 
tion. 


Then,  the  partitioning  of  the  2 dr  segments  into 
collections  should  be  such  that  f is  maximized,  sub- 
ject to  the  constraints  specified  in  conditions  1 and  2. 
If  Fis  less  than  1,  c is  reduced  and  Fis  recomputed. 
This  requirement  is  to  form  as  many  homogeneous 
collections  cf  substrata  as  possible. 

Let  be  ,he  wheat  proportion  estimate  for  the 
yth  segment  of  the  /th  substratum  of  the  rth  collec- 
tion, and  let  xrjbe  the  epoch  year  (historical)  wheat 
proportion  for  the  /th  substratum,  it  is  assumed  that 
the  segment  estimate  can  be  modeled  as 

Prij  = Qr  + Kxrj  + €rif  06) 


Stratum  Variance  Eatlmatlon 

For  countries  with  detailed  historical  data,  the 
stratum  acreage  variance  estimation  depends  on  the 
category  of  its  Group  III  substrata. 

In  a stratum  having  at  least  three  sample  segments 
processed,  where  ratioing  is  done  only  within  the 
stratum,  the  variance  for  the  yth  stratum  acreage  esti- 
mate given  in  equation  (6)  is  easily  seen  to  be 

’/■('  + *^7^7)  (*'./  - '-2/) 


where  E («r„)  ~ \r  and  Var  («,^)  * cr,2.  Then,  for 
each  of  the  c collections,  a regression  of /Lon  the  xrj 
is  performed.  Let  S, 2 denote  the  residual  mean 
squared  error  from  the  rth  regression;  i.e., 


S 2 = (segment  acreage)2  £ ~- 

,= 1 \ar  ~ “I 


and  hence  can  be  estimated  by  equation  (19),  where 
estimates  of  VXj  and  replace  the  actual  variances 
(seeeqs.  (12)  and  (13))  and  where  W\r  Wir  and 
are  as  defined  previously. 

If  the  historical  acreage  ratio  WX)  + Ifj ) is 
different  from  that  of  the  current  year,  it  will  in- 
troduce bias  into  the  estimate;  henc:,  Vj  in  equation 
(19/  provides  a biased  estimate  of  the  stratum 
variance.  However,  because  the  historical  acreage 
ratios  are  not  expected  to  show  any  significant  year- 
to-year  variability,  equation  (19)  and  the  others 
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below  are  regarded  as  providing  unbiased  variance 
estimate. 

If  the  .Ah  stratum  has  at  least  one  but  less  than 
three  segments  processed,  the  stratum  acreage  esti- 
mate  given  by  equation  (6)  can  be  written  as 

Ai  * ( Au  + Av)  * + wrWV 


variance  estimate  for  the  stratum  acreage  estimate  is 
given  by 


If  the  country  lacks  detailed  historical  data,  it  has 
large  strata,  and  the  within-stratum  variance  estimate 
is  directly  computed  by 
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where /4,.W  * Aj,  - Ay 

“ Aj.  — Ay 

Since  the  two  terms  on  the  right  side  of  equation 
(20)  have  independent  acreage  estimates,  the 
variance  of  A;  can  be  easily  obtained.  With  variance 
components  replaced  by  their  estimates,  the  stratum 
acreage  variance  estimate  is  given  by 


vr 
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where 


Hence,  the  stratum  acreage  variance  estimate  is  ob- 
tained  by 


vi = VW  (24) 


where  Rj  and  Nj  are  as  defined  previously. 
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where  S is  the  set  of  indices  /,  such  that  the  Ah 
stratum  in  the  zone  has  at  least  one  processed  seg- 
ment. 

If  the  Ah  stratum  has  no  segment  processed,  the 
whole  stratum  is  in  the  Group  III  category,  and  the 


Variance  Aggregation  to  the  Zona, 

Region,  and  Country  Levels 

For  a zone  in  a country  lacking  detailed  histoncal 
data,  the  stratum  acreage  estimates  are  indepen- 
dently obtained.  Hence,  the  variance  estimate  is  ob- 
tained by  aggregating  the  stratum  variance  estimates 
in  the  zone, 

yt  = (25) 
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where  js  the  variance  estimate  for  the  yth  stratum 
in  the  zth  zone. 

In  countries  with  detailed  historical  data,  stratum 
acreage  estimates  in  a zone  are  correlated  unless  all 
Group  III  ratio  estimation  is  done  only  within  the 
stratum.  In  general,  the  zone  acreage  estimate  given 
by  equation  (9)  is  of  the  form 


holds  for  all  LACIE  countries.  Specifically,  the 
variance  estimate  at  the  region  level  is  obtained  by 

R 

V = £ V„  (28) 

:=l 


i*L 


I t 


WU  kU  itN 

wu*wv  § K ♦ 


* AH  i 


IX  + IX 

, +M ssiL 


E X * "a) 

kS 


E(Au  * Av ) 

/eM 


(26) 


where  S is  the  set  of  indices  associated  with  strata 
having  at  least  one  processed  segment,  M is  the  set 
of  indices  associated  with  strata  that  have  at  least  one 
but  less  than  three  segments  processed,  L is  the  set 
of  indices  associated  with  strata  tiiat  have  at  least 
three  segments  processed,  and  V is  the  set  of  in- 
dices associated  with  strata  that  have  no  processed 
segments.  Since  L and  M are  disjoint  sets,  making 
the  estimates  in  the  two  terms  on  the  right  side  of 
equation  (26)  uncorrelated,  an  estimate  of  the 
variance  of  the  zone  acreage  estimate  is  obtained  by 
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and  the  variance  estimate  at  the  country  level  is  ob- 
tained by 


K 

r = £ i;  (29) 

r=  1 


MIXED  WHEAT  AREA  AND  VARIANCE 
ESTIMATION 


Winter  or  Spring  Wheat  Estimation 

In  a mixed  wheat  area,  separate  aggregations  are 
performed  for  estimating  the  spring  and  winter 
wheat  acreage  estimates  as  well  as  their  variance  esti- 
mates at  the  stratum  and  higher  levels.  In  each  case, 
the  estimation  procedure  is  the  same  as  that  de- 
scribed in  the  two  preceding  sections  for  each  ag- 
gregation level.  Data  from  sample  segments  desig- 
nated as  winter  or  spring  wheat  segments  and  the 
historical  substratum  winter  or  spring  wheat  acreages 
are  used  to  estimate  winter  or  spring  wheat  acreages 
and  their  associated  variance  estimates.  However,  in- 
puts of  7t2/A  in  estimating  the  Group  II  substrata 
acreages  are  based  on  the  historical  total  wheat  area 
and  are  the  same  in  both  cases,  because  the  Group  II 
substrata  for  sample  segment  allocation  were 
selected  with  probabilities  that  were  determined 
from  the  historical  total  wheat  area  for  the  collection 
of  Group  II  substrata. 


No  variance  estimation  is  required  for  ’.ones  that  do 
not  contain  at  least  three  processed  sample  segments. 

Since  zone  acreage  estimates  are  obtained  inde- 
pendently, the  acreage  variance  estimates  at  both  the 
regional  and  country  levels  are  computed  by  adding 
the  zone  acreage  variance  estimates.  This  procedure 


Total  Whaat  Estimation 

The  total  wheat  area  estimate  in  a mixed  wheat 
area  is  computed  by  adding  the  winter  wheat  and  the 
spring  wheat  acreage  estimates  for  the  area  of  in- 
terest; i.e..  if  t u and  -I,  denote  the  winter  and  spring 
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wheat  acreage  estimates,  respectively,  the  total  wheat 
acreage  estimate  A,  is  given  by 

A,  • Aw  + At  (30) 

This  is  done  at  the  stratum  and  higher  levels. 

The  two  estimates  Aw  and  A,  are  correlated  Tor 
overlapping  winter  and  spring  wheat  areas  in  a zone. 
Thus,  the  variance  of  A,  is  given  by 

Var(At)  - V„(AW)  + Var{A,)  ♦ 2Q»(AwJt) 

(31) 


where  the  covariance  , As)  can  be  expected  to  be 
negative.  If  so, 

E«r(A,)  < l^ar(j4H-)+  Var(At) 


Accordingly,  the  variance  estimate  for  the  total 
wheat  is  biased  if  obtained  by  adding  the  variance 
estimates  for  the  winter  and  spring  wheat  area  esti- 
mates. Instead,  if  it  is  obtained  by  way  of  estimating 
the  total  wheat  area  directly,  a better  variance  esti- 
mate is  expected.  The  latter  procedure  is  followed  in 
LACIE  for  the  total  wheat  acreage  variance  estima- 
tion in  the  mixed  wheat  areas.  The  procedure  is  the 
same  as  described  in  the  third  section  and  is  applica- 
ble to  each  level  of  aggregation. 
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Large-Area  Aggregation  and  Mean-Squared 
Prediction  Error  Estimation  for  LAC  IE  Yield  and 
Production  Forecasts 

R.  .S'.  Chliiktinr1  and  .1.  //,  Icivcstm1' 


INTRODUCTION 

In  LACIE,  large-area  wheat  acreage  estimates  are 
matte  from  the  Landsat  data  acquired  according  to 
the  sampling  design  described  by  Feiveson  et  al.  in 
the  paper  entitled  “LACIE  Sampling  Design."  The 
acreage  estimation  procedure  is  described  for  a seg- 
ment (sampling  unit)  by  Heydorn  and  Bizzell  in  the 
paper  entitled  “Methods  for  Segment  Wheat  Area 
Estimation"  and  for  large  areas  by  Chhikara  and 
Feiveson  in  the  paper  entitled  “LACIE  Large  Area 
Acreage  Estimation.”  Yield  is  predicted  by  establish- 
ing a relationship  between  historical  yield  and 
weather  data  (detailed  in  the  paper  by  Strommen  et 
al.  entitled  Development  of  LACIE  CCFA-I 
Weather/Wheat  Yield  Models").  Though  weather 
also  influences  crop  acreages  in  an  area  and  har- 
vested wheat  acreage  and  its  yield  per  acre  may  thus 
be  correlated,  these  are  estimated  independently  in 
LACIE. 

The  terminology  to  be  used  in  the  paper  is 
basically  the  same  as  that  employed  by  Chhikara  and 
Feiveson  in  the  paper  entitled  “LACIE  Large  Area 
Acreage  Estimation";  hence,  no  attempt  is  made 
here  to  define  again  the  terms  which  have  appeared 
in  the  other  paper. 

The  yield  stratification  does  not  necessarily  coin- 
cide with  the  stratification  used  for  the  acreage 
estimation.  A yield  stratum  generally  consists  of 
several  acreage  strata  and  sometimes  crosses  zone 
boundaries.  For  example,  the  Panhandle  yield  model 
covers  some  crop  reporting  districts  (CRD's)  from 
Oklahoma  and  some  CRD's  from  Texas.  No  predic- 
tion is  attempted  for  yield  and  hence  for  production 


"Lockheed  Electronics  Company,  Houston.  Texas 
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below  the  yield  stratum  level.  It  is  necessary  to 
define  the  stratification  by  which  wheal  production 
can  be  estimated  efficiently  from  the  given  acreage 
and  yield  estimates.  Consequently,  pseudozones  are 
created  in  a zone  if  it  is  covered  by  more  than  one 
yield  stratum.  A pseudozone  is  obtained  from  the  in- 
tersection of  a yield  stratum  with  a zone  (described 
in  fig.  1). 

An  estimate1  of  the  production  in  a pseudozone  is 
obtained  by  the  product  of  its  area  estimate  and  its 


ACREAGE  STRATUM  BOUNDARY 

PSEUDO  ZONE  BOUNDARY 

ZONE  BOUNDARY 

FIGURE  I. — Determination  of  ■ pacaSawnt. 


* Estimates  refer  to  forecasts  when  these  are  made  prior  to 
crop  harvest  time 
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yield  prediction,  and  these  estimates  are -aggregated 
to  estimate  zone  and  higher  level  productions.  In 
mixed  areas,  this  is  done  for  each  crop  type  (winter 
and  spring  wheat).  The  total  wheat  production  is  esti- 
mated by  adding  the  two  crop-type  production  esti- 
mates for  a zone,  a region,  and  a country. 


PRODUCTION  ESTIMATION 

In  this  section,  the  aggregation  formulas  are  listed 
for  production  estimation  of  a crop  type. 


Zon*  Estimate 

Suppose  a zone  consists  of  fl  pseudozones,  CiuC2, 
— O'//,  with  acreage  estimates  A;i.  • ■ • • A:it 
and  yield  predictions  F.j,  ....  Y:l, , respectively. 
Then,  the  zone  production  is  estimated  by 


// 

p * Y*  a Y 
/■I 


Region  Estlmata 

Suppose  that  a region  consists  of  H zones  with 

production  estimates  /*,.  Pl2 PlH,  Then,  the 

regional  production  estimate  Is  obtained  by 


MEAN-SQUARED  PREDICTION 
ERROR  ESTIMATION 


Zone  Prediction  Error  Estimate 

The  yield  prediction  error  is  estimated  for  a yield 
stratum  and  is  available  as  a standard  output  from 
the  yield  prediction  algorithm  described  by  Strom- 
men  et  al.  Each  pseudozone  of  a yield  stratum  is 
assigned  the  yield  prediction  error  as  estimated  for 
the  yield  stratum.  On  the  other  hand,  the  acreage 
estimation  error  (variance)  needs  to  be  estimated  for 
a pseudozone.  Since  stratum  acreage  estimates  in  a 
zone  can  be  correlated,  it  is  necessary  to  estimate  the 
covariances  between  different  pairs  of  pseudozone 
acreage  estimates  because  these  estimates  can  also  be 
correlated. 

Each  pseudozone  acreage  estimate  is  obtained  by 
summing  acreage  estimates  for  strata  comprising  the 
pseudozone;  i.c., 


Ati a E 


where  A.(t  is  the  acreage  estimate  for  the ,/th  stratum 
in  the  Ah  pseudozone.  Thus,  the  variance  of  A:I  is 
given  by 
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Country  Estimate 

If  a country  consists  of  A regions  with  production 
estimates  l\,  P2 PK,  then  the  country  produc- 

tion estimate  is  given  by 
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where  I , is  the  acreage  variance  estimate  for  the.Ah 
stratum  (obtained  as  described  in  the  paper  by 
Chhikara  and  Feiveson)  and  is  the  estimated 
covariance  between  acreage  estimates  for  the  Ah  and 
Ath  strata  in  the  zone.  The  covariance  between 
acreage  estimates  for  a pair  of  strata  would  depend 
on  how  the  estimates  arc  obtained  for  the  strata. 
This,  in  turn,  would  depend  on  how  the  Group  111 
substrata  acreage  for  each  stratum  is  estimated. 
When  data  from  at  least  three  segments  are  available 
for  a stratum,  the  estimate  for  its  Group  III  substrata 
is  based  on  the  data  from  the  stratum  alone.  Other- 
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wise,  all  the  available  data  front  the  corresponding  3.  When  each  of  the  Ah  and  Ath  strata  has  no  seg- 
tone  are  used.  Accordingly,  mem  processed, 
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when  acreage  estimates  for  the  Group  ill  substrata  of 
the  Ah  and  Alh  strata  are  obtained  from  data  avails* 
We  within  each  stratum  and  thus  provide  indepen* 
dent  stratum  acreage  estimates.  In  othe.  cases,  it  is 
obtained  as  follows. 

When  the  estimate  for  the  Group  111  substrata  of 
the  Ah  stratum  is  based  on  data  only  from  the 
stratum,  vhereas  it  is  obtained  for  the  Alh  stratum 
using  the  data  available  for  the  zone  in  which  the 
stratum  lies, 


MVV^)  (V'a) 
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When  zone  data  are  toed  to  obtain  acreage  estimates 
for  the  Group  HI  substrata  of  the  Ah  and  Alh  strata, 
the  following  three  cases  arise. 

1.  When  each  of  the  Ah  and  Ath  strata  has  less 
than  three  but  at  least  one  segment  available. 


z K*-„) 

/«IS> 

(8) 

2.  When  the  Ah  stratum  has  less  than  three  but  at 
least  one  segment  processed  and  the  Ath  stratum  has 
no  segment  processed. 
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1 he  quantities  v\f  vlp  Mt/»  Wlf,  and  H3/ , and  the 
set  | .S’)  are  defined  as  follows. 

1.  VXj  and  Vy  are  acreage  variance  estimates  for 
the  Group  1 and  Group  II  substrata,  respectively,  in 
the  Ah  stratum. 

2.  M'jy,  MV  and  M3/  are  the  historical  wheat 
acreages  for  Group  I,  Group  II,  and  Group  III 
substrata,  respectively,  in  the  Ah  stratum. 

3.  {5}  is  the  set  of  strata,  each  of  which  has  at 
least  one  segment  processed  and  the  wheat  propor* 
tion  estimated. 

For  computation  of  VXj  and  I j,  and  for  an  under- 
standing of  the  categories  of  strata,  refer  to  the  paper 
by  Chhikara  and  Feiveson  on  acreage  estimation. 
The  formulas  given  in  equations  (6)  through  (10)  are 
fairly  straightforward  and  are  easily  obtained  by  con- 
sidering different  possible  psirs  of  strata  in  a zone. 

Acreage  estimates  and  yield  predictions  are  inde- 
pendently made  up  to  the  pseudozone  level. 
Although  some  correlation  between  crop  acreages 
and  their  yields  per  acre  within  a pseudozone  is 
possible,  it  is  assumed  that  a pseudozone  acreage  esti- 
mate and  its  yield  prediction  are  uncorrelatcd.  Also, 
both  the  acreage  estimate  and  the  yield  prediction  are 
assumed  to  be  unbiased.  Then,  the  squared  predic- 
tion error  for  a pseudozone  production  estimate 
follows  from  the  formula  for  the  variance  of  product 
of  two  random  variables  (ref.  I,  p.  12);  and  an  un- 
biased estimate  of  it  is  obtained  by 


S2  - VY2  + UA2  VV  (ID 


69 


where  l it  given  by  equation  (5),  A is  the  acreage 
estimate,  ) is  the  yield  prediction,  and  V is  the  yield 
squared  prediction  error  for  the  pseudozone. 

If  wheat  production  estimates  for  pseudozones  in 
a zone  are  uncorreiated,  the  zone  production 
variance  is  given  by  the  sum  of  variances  for  the 
pseudozones.  However,  correlation  between  acreage 
estimates  for  strata  in  « zone  will  probably  test  * in 
some  dependence  between  pseudozone  acreage  esti- 
mates. The  following  formula  for  estimating  the 
mean-squared  error  for  a zone  production  estimate 
accounts  for  such  dependence;  pseudozone  yield  pre- 
dictions for  a zone  are  made  independently.  As  such, 
an  estimate  of  the  mean-squared  error  of  the  produc- 
tion estimate  P for  the  rth  zone  is  obtained  by 
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where  Vu  is  the  estimated  mean-squared  prediction 
error  of  the  yield  for  the  /th  pseudozone,  I is  the 
area  variance  estimate  for  the  /th  pseudozone,  and 
other  quantities  are  as  defined  earlier.  Again,  it  is 
fairly  straightforward  to  derive  equation  (12)  from 
equations  (II).  (5),  and  (1). 


Region  end  Country  Prediction 
Error  Esttmetee 

If  yield  strata  in  a region  cross  zone  boundaries, 
the  production  estimates  for  the  zones  in  the  region 
will  be  correlated.  For  example,  such  is  the  case  in 
the  U S,  Great  Plains,  where  the  Texas  Panhandle 
winter  wheat  yield  model  covers  CRD‘s  from  both 
Oklahoma  and  Texas  and  where  the  Red  River 
spring  wheat  yield  model  covers  two  eastern  CRD‘s 
of  North  Dakota  and  three  western  CRD's  of  Min- 
nesota. Thus,  accounting  for  both  variance  and 
covariance  terms  for  the  zones,  one  derives  an  esti- 


mate of  the  mean-squared  error  of  /’  , the  regional 
production  estimate,  which  is  given  by 
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where  St)  is  the  estimated  mean-squared  error  of 
the  productioM  estimate  for  the  .-th  zone,  and  Sir>  - 
0 if  the  ;th  md  r'th  zones  have  no  yield  stratum  in 
common.  Otherwise, 
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where  Urk  is  the  mean-squared  prediction  error  for 
the  Ath  yield  stratum  estimate  commonly  applicable 
to  the  area  estimate  Ar.  for  the  rth  zone  and  the  area 
estimate  Ar’  for  the  r'th  zone,  and  Cis  the  number  of 
yield  strata  common  to  both  the  nh  and  r'th  zones. 

The  estimate  of  the  mean-squared  prediction  error 
of  P,  the  country  production  estimate,  is  given  by 


K 

S2  - £ s2  (15) 

Ml 


This  computation  of  .V-'  is  made  assuming  that  the 
regional  production  estimates  are  uncorrelated.  This 
assumption  certainly  holds  with  regard  to  the  estima- 
tion procedure.  However,  it  is  possible  for  the 
regional  productions  to  be  correlated  because  of 
weather  and  economic  conditions. 


ZONE  AND  REGIONAL  YIELD  AND 
ITS  PREDICTION  ERROR  ESTIMATION 

When  there  is  a single  yield  model  in  a zone,  the 
yield  prediction  and  its  mean-squared  prediction  er- 
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ror  are  obtained  as  described  in  the  paper  by  Strom- 
men  et  al.  In  case  of  more  than  one  yield  model  in  a 
zone,  these  parameters  (i.e.,  yield  and  prediction  er- 
ror) are  to  be  estimated  for  the  zone  as  well  as  Tor  the 
higher  levels. 

The  weighted  average  yield  given  by  produc- 
tion/acreage is  used  to  determine  the  combined  yield 
per  acre  for  a higher  level.  Consequently,  an  estimate 
of  the  yield  for  level  \S\  is  obtained  by 


,16) 


where  Ps  is  the  production  estimate  and  As  is  the  area 
estimate  for  the  level  (zone,  region,  or  country). 

The  exact  formula  for  the  prediction  error  of  7\ 
is  not  tractable  because  both  Ps  and  As  are  random 
variables.  Only  an  approximation  for  the  variance  of 
this  ratio  is  considered  here.  However,  good  approx- 
imation can  be  achieved  because  of  the  large  sample 
property  of  the  acreage  estimate  As , its  low  coeffi- 
cient of  variatir  1,  and  the  fact  that  A^  > 0. 

Using  the  first-order  approximation  given  in 
reference  1 (Theorem  5.3),  an  estimate  of  the  meajt- 
squared  prediction  erior  of  the  ratio  estimate  )\ 
is  obtained  by 
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where  .S',2  is  the  estimated  mean-squared  prediction 
error  of  Ps,  the  production  estimate;  is  the  esti- 
mated variance  of  As,  the  area  estimate;  is  the 
yield  estimate  for  the  ith  pseudozone;  and  Vf  is  the 
estimated  variance  of  the  acreage  estimate  for  the  /th 
pseudozone. 


PREDICTION  ERROR  ESTIMATION  FOR 
MIXED  WHEAT  AREAS 

The  mean-squared  error  estimation  problem, 
mainly  for  the  zone  and  higher  levels,  is  discussed  in 
this  section.  Three  crop-type  pseudozones  (pure 


winter,  pure  spring,  and  mixed  wheat)  are  possible  in 
a zone  of  mixed  wheat.  The  yield  predictions  and 
their  mean-squared  error  estimates  are  available  sep- 
arately for  the  pure  winter  and  pure  spring 
pseudozones.  On  the  other  hand,  a weighted  average 
of  the  two  yield  predictions,  one  for  spring  wheat  and 
another  for  winter  wheat,  would  provide  a combined 
yield  prediction  for  a mixed  wheat  pseudozone.  The 
two  weights  correspond  to  the  twr  crop-type 
acreages.  However,  it  is  proposed  to  use  the  acreage 
figures  different  from  LACIE  estimated  acreages  so 
that  the  assumption  of  independence  between 
LACIE  acreage  estimates  and  yield  estimates  is  not 
violated;  hence,  the  formulas  given  in  the  two  pre- 
ceding sections  are  applicable.  To  avoid  within-year 
dependence,  the  use  of  historical  acreages  is  sug- 
gested. This  method,  of  course,  may  cause  a certain 
amount  of  bias.  Thus,  for  a mixed  wheat 
pseudozone,  a combined  yield  prediction  and  its 
mean-squared  error  estimate,  and  i:i,  respec- 
tively, are  obtained  as  follows. 
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where  ,?n  ,and  t„,are  the  historical  (primary  epoch) 
year  harvested  winter  wheat  and  spring  wheat 
acreages,  respectively,  in  the  pseudozone;  and  where 
and  L ;,  are  the  mean-squared  prediction  errors 
of  the  winter  wheat  and  spring  wheat  yield  estimates, 
respectively,  for  the  pseudozone. 

By  obtaining  inputs  for  pseudozone  yields  as  de- 
scribed in  equations  (18)  and  (19),  the  mean-squared 
prediction  error  estimates  are  computed  for  produc- 
tion and  yield  estimates  using  equations  (12)  through 
(15)  and  (17). 
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INTRODUCTION 

A goal  in  LA'  E has  been  to  estimate  wheat 
acreage  to  a given  accuracy  using  Landsat  rather  than 
ground-enumerated  data.  In  this  approach,  Landsat 
data  are  classified  into  wheat/non  wheat  classes  in 
each  of  a number  of  randomly  allocated  areal  seg- 
ments. Ine  acreage  classified  as  wheat  is  then 
measured  in  each  segment  and  these  segment  esti- 
mates are  aggregated  to  obtain  the  country  estimate. 
For  the  method  to  work,  a limited  amount  of  manual 
interpretation  is  required  for  each  segment.  From 
this  interpretation,  spectral  samples  of  the  crop  types 
of  interest  are  obtained  and  used  to  estimate 
classification  parameter  values.  These  parameter 
values  specify  the  classification  rule  from  a given 
family  of  possible  rules.  The  process  of  selecting  and 
labeling  samples  for  estimating  classification 
parameters  is  commonly  referred  to  as  “training”  a 
classifier. 

A procedure  lor  manually  training  a classifier  and 
a method  of  machine  classification  were  tested  in  the 
first  two  phases  of  LACIE.  These  tests  revealed  a 
number  of  shortcomings;  consequently,  the  ap- 
proach was  redesigned  for  Phase  III. 

The  theory  of  the  classification  methods  and  the 
fun.-i.onal  steps  in  the  manual  training  process  used 
in  the  three  phases  of  LACIE  are  discussed  in  this 
paper.  In  addition,  the  major  problems  that  arose  in 
using  the  earlier  approach  are  discussed  to  reveal  the 
motivation  that  led  to  the  ' sign  for  the  third 
LACIE  phase. 

A problem  with  both  designs  was  that  wheat  could 
not  be  separated  from  the  other  small  grains. 
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Although  studies  now  in  progress  suggest  that  such  a 
separation  is  possible  if  Landsat  acquisitions  at  cer- 
tain critical  times  in  the  wheat  crop  calendar  are 
available,  conclusive  results  are  not  yet  available. 
Since  wheat  estimates  were  obtained  by  ratioing 
methods  from  small-grains  estimates,  the  class  of  in- 
terest in  the  following  sections  will  be  referred  to  as 
small  grains. 


PHASE  I AND  II  CLASSIFICATION 
AND  MENSURATION  APPROACH 

The  basic  steps  that  were  used  in  Phases  I and  II  to 
estimate  tee  small-grains  area  in  a segment  are  il- 
lustrated in  figure  1.  That  estimate  was  the  result  of 
both  manual  and  machine-processing  operations. 
The  manual  operations  were  required  to  train  a 
classifier;  once  trained,  it  classified  each  pixel  (ex- 
cept for  pixels  designated  as  nonagricultural  or  under 
cloud  cover)  as  either  small  grains  or  non-small- 
grains.  Thore  pixels  classified  as  small  grains  were 
then  totaled  to  obtain  a segment  acreage  estimate. 


FIGURE  I. — Processing  now  In  CAMS. 
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Aside  from  certain  screening  operations,  the  start* 
ing  point  of  this  method  was  the  analyst  labeling 
operation,  The  analyst  was  required  to  choose  exam- 
ples of  small-grains  and  non-small-grains  fields  that 
would  result  in  good  estimates  of  the  probability  den- 
sity of  each  of  these  two  major  classes.'  To  obtain  a 
good  estimate  of  a class  density,  two  fundamentally 
different  selection  processes  were  required.  First,  the 
classification  model,  as  will  be  discussed  later, 
assumes  that  a given  class  can  be  represented  by  a 
sum  of  normally  distributed  densities.  Thus,  the 
analyst  would  first  specify  the  number  of  such  dis- 
tributions to  be  used  in  the  model  and  then  select  ex- 
ample fields  from  which  to  estimate  the  means  and 
covariances  of  each  of  these  distributions.  After  the 
analyst  selected  and  labeled  these  example  fields,  the 
fields  were  approximated  by  polygons  and  the  ver- 
tices of  thes*  polygons  inserted  into  the  computer. 
At  this  point,  the  segment  was  ready  to  be  machine 
processed. 

Certain  jreas  within  an  agricultural  scene  that  are 
not  agricultural  areas  can  appear  spectrally  very  simi- 
lar to  a small-grains  area  on  a given  date.  A common 
example  of  this  is  grassland  Since  these  areas  are 
generally  easy  for  the  analyst  to  spot,  they  are  ex- 
cluded in  the  labeling  process  by  again  bounding 
them  by  polygons  and  labeling  them  “designated 
other"  or  "DO."  These  DO  areas  are  not  classified 
but  are  automatically  included  in  the  non-smaii- 
grains  count. 

As  mentioned  previously,  certain  screening  opera- 
tions are  performed.  These  operations  include  check- 
ing the  imagery  to  ensure  that  it  is,  in  the  opinion  of 
the  analyst,  a processable  image.  Two  factors  con- 
sidered in  making  this  decision  are  image  data 
quality  and  the  crop  phenology  at  the  given  acquisi- 
tion time.  If  possible,  it  is  desirable  to  process  an  im- 
age taken  within  the  wheat  phenology  intervals  of 
planting  to  emergence,  emergence  to  jointing,  joint- 
ing to  heading,  and  heading  to  harvest.  Another 
screening  operation  is  petformed  to  exclude  clouded 


'The  estimation  of  a probability  density  for  a class  « done  by 
estimating  parameters  (means  and  ewariatwes)  of  normal  den- 
sity functions.  This  is  accomplished  in  the  classifier  training  pro- 
cess discussed  in  the  introduction. 

images  ate  automatically  screened  as  part  of  the  registration 
process  at  the  NASA  Goddard  Space  Flight  Center  to  ensure  that 
a given  segment  has  less  than  10-percent  clouded  areas  A second 
screening  is  done  at  the  NASA  Johnson  Space  Center  to 
doublecheck  the  automatic  process. 


areas  from  processing.*  This  is  done  by  enclosing 
these  areas  with  polygons  and  labeling  them  as 
“designated  unidentifiable"  or  “DU"  areas. 

Following  the  labeling  operation,  the  entire  seg- 
ment, leas  the  DO  and  DU  areas,  is  classified  into 
small-grains  and  non-small-grains  areas.  The  un- 
derlying model  on  which  these  classification  deci- 
sions are  based  assumes  that  each  major  class  (i.e., 
the  small-grains  and  non-small-grains  classes)  could 
be  described  by  a mixture  of  multivariate  normal 
densities.  That  is.  if/*  denotes  the  class  density  of 
the  Ath  class,  then 
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where  M*;  m*  Xf)  is  the  /th  component  normal  den- 
sity function  for  the  kx h class.  The  parameters  /»*. 
i ” 1, 2, ....  mn,  are  generally  assumed  to  be  I/n*  and 
the  parameters  nK  (the  number  of  normal  clusters), 
Mf(the  mean  vector),  and  2^(the  covariance  matrix) 
are  estimated  from  the  training  field  data. 

Once  the  densities  ,/j  and./?  are  estimated  (the  esti- 
mates are  denoted  by  ./J  and  f2,  respectively),  the 
classification  rule  for  classifying  a given  pixel  x is  as 
follows:  “Decide  * is  from  class  1 if  tT|/((.v)  > 
ir^fix)  and  from  class  2 if  iT|,/j(.v)  it  2 fix)"  The 
weights  it | and  n,  are  nonnegative  and  add  to  I. 
Their  purpose  is  to  estimate  the  prior  probability  that 
a pixel  is  from  class  1 or  2;  however,  because  this 
prior  information  is  generally  not  known,  the 
weights  are  usually  taken  to  be  1/2. 

Certain  outlier  pixels  (pixels  within  an  extremely 
small  probability  density)  are  thresholded  in  the 
classification  process.  These  thresholded  pixels  are 
subsequently  counted  as  non-small-gruins.  (The 
details  of  thresholding  are  similar  to  those  discussed 
in  the  section  on  classification.) 

The  percentage  of  small  grains  in  a segment  was 
then  estimated  to  be 
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' 22  <>32  A|U, 

where  is  the  number  of  pixels  classified  as  small 
grains  and  (V|H,  is  ihe  number  of  pixels  in  the  Dl> 
areas.  (Note  that  ihere  are  22  932  pixels  in  a LAC1U 
segment) 
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Before  the  estimated  proportion  of  smalt  grains  in 
a segment  was  passed  to  the  Crop  Assessment  Sub* 
system,  it  was  checked  by  the  analyst.  Basicatty,  this 
check  was  made  by  comparing  the  segment  imagery 
to  a classification  map.  The  classification  map  is 
another  piece  of  imagery  on  the  same  scale  as  the 
segment  imagery  with  all  areas  classified  as  small 
grains  appearing  clear  and  all  other  areas  opaque  (see 
fig.  6).  If  the  analyst  believed  that  areas  were  not 
classified  as  he  thought  they  should  be,  the  segment 
was  reworked.  Fundamentally,  the  rework  strategy 
attempted  to  correct  field-labeling  errors  and  to 
select  new  fields  that  were  samples  from  crop 
spectral  distributions  not  originally  sampled  in  the 
training  process. 

In  labeling  a given  field,  the  analyst  would  ex- 
amine a sequence  of  imagery  ranging  over  times 
which  included  the  major  crop  phenological  stages. 
For  the  most  part,  however,  machine  classifications 
were  done  on  only  single-pass  Landsat  data.  Some 
multitemporal  classification  (in  which  each  image 
pixel  is  described  as  a vector  of  Landsat  data  from 
more  than  one  acquisition)  was  done,  but,  in  general, 
such  attempts  proved  to  be  too  difficult  to  execute 
for  all  segments.  A type  of  machine  processing  called 
“no-significant-change  processing"  was  done;  it  at- 
tempted to  select  a single  acquisition  that  would  pro- 
vide a good  estimate.  In  this  process,  the  Landsat  im- 
age from  a new  acquisition  would  be  visually  com- 
pared with  the  class  map  from  the  last  classification 
for  the  given  segment.  If  the  analyst  decided  that 
there  was  poor  agreement  between  the  two  images, 
then  the  new  image  would  be  classified.  On  the  other 
hand,  if  through  this  visual  comparison  it  was 
decided  that  the  previous  classification  was  still  valid 
(i.e.,  no  change),  no  new  classification  would  be 
made. 

instances  arose  where  very  little  wheat  could  be 
found  in  a segment.  In  those  instances,  the  segment 
was  hand-counted  and  no  machine  processing  was 
done. 


PROBLEMS  WITH  THE  PHASE  I AND  II 
APPROACH 

The  Pnase  l and  II  technique  used  by  ihe  analyst  is 
probably  better  described  as  an  art  than  as  a well- 
defined  procedure.  Basically,  the  analyst  was  re- 
quired to  interpret  colo:  imagery  and  ancillary 
numerical  data  to  make  decisions  remted  to  complex 
statistical  questions,  such  as  "How  many  normal  dis- 


tributions will  fit  the  data?"  or  “How  many  fields 
shoo’d  be  sampled  to  determine  an  accurate  estimate 
for  a given  distribution?"  Given  enough  time  and  the 
ability  to  execute  a trial-and-error  process,  a good 
analyst  can  obtain  a reasonably  good  answer  for  a 
segment.  But  in  a highly  automated  environment,  as 
LAC1E  was  required  to  be  in  order  to  make  estimates 
at  regularly  scheduled  intervals,  such  an  approach 
can  be  inefficient.  Moreover,  because  of  the  subjec- 
tive nature  of  the  process  and  the  varying  degrees  of 
talent  among  the  numerous  analysts  who  partici- 
pated in  LACIE,  the  results  can  contain  considerable 
variance  and  bias.  Some  of  the  specific  pitfalls  that 
were  inferred  from  the  LACIE  results  are  discussed 
in  the  following  sections. 


Selecting  and  Labeling  Fields 

For  the  machine  classification  process  to  work 
properly,  the  underlying  assumptions  of  the  model 
given  in  equation  (I)  should  be  satisfied.  This  model 
assumes  that  the  data  can  be  statistically  described 
by  a sum  of  normal  densities.  Thus,  in  his  selection 
of  fields,  the  analyst  must  in  effect  decide  how  many 
normal  densities  are  likely  to  be  present.  If  the  deci- 
sion is  too  high,  a large  number  of  parameters  have 
to  be  estimated  (i.e.,  several  means  and  covariance 
matrices);  this  implies  that  the  number  of  fields  that 
need  to  be  labeled  to  obtain  good  estimates  is  very 
large.3  If  the  decision  is  too  low,  it  is  likely  to  lead  to  a 
poor  fit  of  the  model  no  matter  how  well  the 
parameters  are  estimated.  Even  if  a correct  decision 
is  made  on  the  number  of  distributions,  a decision 
must  be  made  as  to  which  fields  are  samples  from  a 
given  distribution.  This  decision  is  especially 
difficult  when  the  given  distribution  has  a relatively 
large  variance,  because  samples  must  then  be  drawn 
from  the  extremities  (tails)  of  that  distribution  as 
well  as  from  the  center  (around  the  mean).  Samples 
drawn  only  around  the  mean  will  lead  to  gross  under- 
estimates of  distribution  variances.  It  is  more  natural 
for  an  analyst  to  label  a sample  far  from  a given  dis- 
tribution mean  as  being  an  observation  sampled 
from  a different  distribution;  i.e.,  one  whose  mean  is 
closer  to  that  observation. 


■’Because  the  wiihin-fielii  variance  is  generally  much  larger 
than  the  between-field  variance,  just  choosing  large  fields  is  in 
general  not  the  answer.  Hence,  to  estimate  a given  parameter, 
several  fields  are  needed 
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Efficiency 

Tile  need  to  reclassify  a segment,  perhaps  several 
times,  to  improve  the  estimate  can  be  a time-con- 
suming process.  The  amount  of  rework  was  not  al- 
ways proportional  to  the  size  of  the  area 
misclassified.  Even  a small  area  classified  incorrectly 
could  cause  a problem. 

Other  examples  of  inefficiency  existed  in  part 
because  of  the  mechanics  of  the  experiment  imple- 
mentation. A particular  example  was  the  reouire- 
ment  to  select  fields  by  approximating  them  with 
polygons.  This  procedure  led  to  long  delays  in  pre- 
paring computer  cards  because  batch  processing 
rather  than  interactive  processing  was  used. 


Small-Field  Versus  Large -Field  Processing 

The  Phase  I and  II  procedure  was  more  adaptable 
to  areas  that  contained  large  agricultural  fields.  In 
areas  where  the  fields  were  small  (e.g.,  the  spring 
wheat  strip-fallow  fields  in  North  Dakota),  these 
procedures  proved  to  be  difficult  to  implement.  In 
general,  as  the  size  of  the  fields  decreases,  the  re- 
quired number  of  training  fields  increases.  It  became 
increasingly  difficult  to  determine  the  number  of  dis- 
tributions in  the  me  del  and  to  choose  an  appropriate 
number  of  samples  from  each  distribution. 


Multitemporal  Processing 

It  was  well  known  in  the  beginning  of  L ACIE  that 
much  of  the  information  that  distinguishes  one  crop 
from  another  can  be  obtained  from  spectral  observa- 
tions over  time.  In  fact,  the  analysts  worked  on  the 
premise  that,  by  knowing  the  crop  calendar,  it  should 
be  possible  to  relate  spectral  changes  to  phenological 
crop  changes  and  thereby  identify  that  crop.  Unfor- 
tunately, attempts  to  machine  classify  multitemporal 
data  were  largely  unsuccessful.  The  underlying 
reason  for  this  failure  is  probably  related  to  the  prob- 
lem of  selecting  and  labeling  sample  fields.  To  effect 
a multitemporal  classification,  an  analyst  not  only 
must  make  decisions  of  the  type  discussed  in  the  sec- 
tion on  selecting  and  labeling  for  each  image  but  also 
must  account,  again  by  sampling,  for  the  possible  ad- 
ditional classes  and  covariance  terms  that  arise  from 
the  multivariate  nature  of  the  problem.  Thus,  the 
difficulty  in  choosing  good  training  fields  is  greater 
in  such  multitemporal  applications. 


INTRODUCTION  TO  PROCEDURE  1 

Motivated  by  the  problems  experienced  with  the 
Phase  I and  II  design,  a second  approach,  called  Pro- 
cedure 1,  was  designed  and  adopted  in  Phase  111.  This 
design  proved  to  be  a significant  improvement  in 
terms  of  both  estimation  accuracy  and  efficient  use 
of  analyst  abilities.  More  data  could  be  processed 
with  greater  accuracy  using  the  same  manual 
resources.  A key  feature  in  this  improvement  was 
that  the  analyst  was  freed  to  concentrate  on  the  label- 
ing function.  Machine  processing  was  used  to  reduce 
the  variance  of  an  analyst-derived  area  estimate  and 
to  improve  labeling  accuracy.  The  classification  of  a 
segment  was  treated  as  a stratification  of  that  seg- 
ment into  "probably  small-grains"  anr*.  "non-small- 
grains"  strata.  Through  the  use  of  i posistratified 
estimation  method,  the  variance  of  a simple  ran- 
domly allocated  analyst  estimate  was  reduced. 
Moreover,  the  ability  to  cross-check  between 
machine  classification  and  analyst  labeling  of  the 
same  areas  and  the  introduction  of  analyst  labeling 
aids  were  elements  of  the  design  aimed  at  improving 
analyst  labeling  accuracy. 

The  analysis  of  a given  segment  in  Procedure  l 
can  be  described  in  terms  of  four  interrelated  opera- 
tions, which  will  be  called  labeling,  classification, 
area  estimation,  and  evaluation.  These  operations 
generally  follow  the  sequence  illustrated  diagram- 
malically  in  figure  2.  Labeling  refers  to  all  manual 
functions  that  result  in  the  assignment  of  a label  to 
certain  specified  pixels  within  the  Landsat  segment 
image.  The  purpose  of  labeling  is  threefold:  (1)  to 
provide  observations  from  small-grains  and  non- 
small-grains classes  that  are  needed  to  estimate  cer- 
tain classifier  parameter  values,  (2)  to  provide  obser- 
vations for  a stratified  area  estimate  of  small  grains, 
and  (3)  to  provide  observations  for  testing  the 
quality  of  the  segment  estimates.  The  classification 
operation  sorts  each  pixel  in  a segment  into  one  of 
two  possible  classes.  The  result  is  a class  map,  which 
is  subsequently  treated  as  a stratification  of  the  seg- 
ment area  into  two  (not  necessarily  connected) 
regions.  Within  the  limits  of  classification  error,  the 
first  region  contains  pixels  primarily  of  the  first  class 
and  the  second  region  contains  pixels  primarily  of 
the  second  class.  Given  this  stratification,  area 
estimation  is  performed.  This  is  a stratified  area  esti- 
mate using  a second  set  of  labeled  dots  (indepen- 
dently selected  from  the  set  used  to  estimate 
classification  parameters)  allocated  within  the  strata. 
Finally,  the  purpose  of  the  evaluation  operation  is  to 
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FIGURE  2. — Processing  flow  in  Procedure  i. 


provide  a quality  check  on  the  segment  estimate  and 
to  develop  rework  strategies  if  required. 

Before  an  acquisition  of  a segment  is  analyzed,  it 
is  screened.  Two  types  of  screening  are  used.  The 
first  is  a manual  screening  intended  mainly  to  check 
data  quality,  including  haze  distortion  and  missing 
pixel  data.  The  second  is  an  automatic  screening  in- 
tended to  select  from  several  possible  acquisitions 
the  four  acquisitions  that  are  likely  to  give  the  least 
classification  error.  (The  details  of  the  acquisition 
selection  process  are  presented  in  the  section  on 
multitemporal  estimation.) 


Labeling 

Of  the  22  932  pixels  in  a LACIE  segment,  209 
(about  1 percent)  are  selected  as  a set  of  candidates  to 
be  labeled.  These  pixels,  or  dots,  coincide  with  a grid 
of  every  tenth  column  and  every  tenth  row  of  the 
segment  image.  Two  randomly  and  independently 
chosen  subsets  of  dots  (called  type  l and  type  2 dots) 
are  selected  for  labeling.  Mechanically,  the  labeling  is 
done  by  overlaying  a dot  template  on  the  scene  and 
labeling  each  scene  pixel  directly  underneath  an  indi- 
cated dot  on  the  template.  An  example  image  and  the 
templates  for  type  1 and  type  2 dot  labeling  are 
shown  in  figure  3,  The  dot  locations  appearing  on  the 
template  have  been  randomly  selected.  The  same 
template  for  type  1 dot  labeling  is  used  for  each  im- 
age. Similarly,  one  template  is  used  for  all  type  2 dot 
labeling.  Use  of  the  same  templates  does  not  violate 
the  intent  that  the  dot  selection  be  random  because  it 
is  assumed  that  each  segment  is  a randomly  selected 


observation  from  the  set  of  all  possible  LACIE  seg- 
ments. 

Not  all  dots  in  a LACIE  segment  necessarily  fall 
within  an  agricultural  field.  Some  can  fall  on  field 
boundaries,  and  more  will  probably  fa!!  near  Held 
edges  so  that  when  a temporal  sequence  of  images  is 
viewed,  registration  error  will  cause  the  dot  to  appear 
in  different  fields  on  different  acquisitions.  Since  the 
labeling  operation  is  intended  to  be  a process  by 
which  a given  pixel  is  assigned  a generic  name  (e.g., 
small  grains,  non-small-grains),  it  wouid  logically 
follow  that  only  those  pixels  that  do  not  fall  on  or 
near  field  boundaries  should  be  labeled.  This  logic  is 
followed  in  labeling  type  1 dots  but  not  in  labeling 
type  2 dots.  Experiments  using  accurate  labeling  in- 
dicated that  skipping  boundary  and  edge  dots  in  type 
1 dot  labeling  produces  a classification  result  not  sig- 
nificantly different  from  the  result  obtained  by  in- 
cluding these  labels.  (For  test  comparison  purposes, 
boundary  and  edge  dots  were  assigned  labels  based 
on  the  majority  of  material  represented  by  the  dot,) 
Moreover,  since  manual  labeling  of  boundary  and 
edge  dots  is  a highly  error-prone  process,  the  decision 
to  skip  these  dots  is  appropriate.  On  t.ie  other  hand, 
the  type  2 dot  labels  enter  directly  into  the  area  esti- 
mate; therefore,  skipping  the  boundary  and  edge  dots 
could  bias  the  estimate.  Thus,  in  type  2 labeling,  the 
analyst  must  estimate  the  amount  of  material  in  a 
boundary  dot  and  label  that  dot  according  to  the  ma- 
jority of  material  present.  In  the  case  of  an  edge  dot, 
one  acquisition  is  used  as  a reference  and  the  label  of 
the  pixel  in  that  reference  image  is  the  one  assigned. 
If  the  pixel  is  from  an  agricultural  field,  the  idea  is  to 
select  a reference  image  that  clearly  contains  the  dot 
and  assign  the  field  label  to  the  pixel.  Note  that  the 
field  can  easily  be  traced  in  a multitemporal  sequence 
of  images. 

The  templates  in  figure  3 are  designed  to  allow  for 
skipping.  For  type  1 dot  labeling,  the  analyst  is  re- 
quired to  first  label  pixels  appearing  under  the  circle 
symbol,  skipping  boundary  or  edge  dots.  If  the  re- 
quired number  of  dots  is  not  labeled,  pixels  under  the 
square  symbol  are  labeled,  again  skipping  over 
boundary  and  edge  dots.  Finally,  if  the  required 
number  of  dots  is  still  not  labeled,  the  triangie  sym- 
bol is  used.  Type  2 dot  labeling  is  similar.  Skipping  in 
this  case,  however,  is  done  only  if  a pixel  is  in  a DO 
or  a DU  area  (as  explained  in  the  following 
paragraph). 

Often  in  a LACIE  segment,  there  are  large  areas 
which  are  clearly  not  agricultural.  As  indicated  in  the 
discussion  of  the  Phase  1 and  II  approach,  these  areas 
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arc  called  IX)  areas.  Before  labeling  ihe  dots,  the 
analyst  identifies  such  areas.  These  IX)  areas  are 
then  skipped  in  classification  and  automatically 
assigned  to  the  non-small-grains  portion  of  the  area 
estimate  Recall  that  the  purpose  of  assigning  an  area 
to  the  1)0  category  is  to  eliminate  areas  which  could 
be  spectrally  confused  by  the  classifier  with  a small- 
grains  area  Grasslands  are  a common  example  of 
this  category. 

The  labeling  operation  uses  color-infrared  (CIR) 
imagery,  numerical  Landsat  displays,  and  ancillary 
data.  Peculiar  to  Procedure  I are  the  trajectory  plot 
and  scatter  plot  displays  shown  in  figure  4 These 
two  displays  are  intended  to  present  two  different 
types  of  “information”  to  the  Procedure  1 analyst. 
The  trajectory  plot  is  intended  to  summarize  the 
“spectral  pattern”  over  time  of  the  crop  canopy 
represented  by  a dot  A separate  trajectory  plot  is 
presented  for  each  of  the  209  dots.  Knowing  the 
nominal  pattern  of  members  of  the  small-grains  class 
for  example,  an  analyst  can  estimate  the  likelihood 
that  a given  trajectory  indicates  a small-grains 
classification  Whereas  the  trajectory  plot  can  aid  an 
analyst  in  making  decisions  about  the  labeling  of  a 
specific  dot.  the  scatter  plot  is  intended  to  aid  the 
analyst  in  establishing  the  consistency  of  the  label- 
ing. The  basic  idea  in  the  u«e  of  a scatter  plot  is  that 
two  dots  which  are  very  close  are  likely  to  belong  to 
the  same  class.  Both  the  trajectory  plot  and  the  scat- 
ter plot  are  intended  to  be  aids  and  not  infallible  in- 
dicators of  crop  type.  Indeed,  certain  classes  of  grass 
dispL  trajectories  very  similar  to  those  in  the  small- 
grains  class.  Also,  since  two  spectral  classes  can  be 
very  close  together  and  in  fact  have  distributions 
with  intersecting  supports,  the  concept  of  proximity 
in  a scatter  plot  does  not  always  lead  to  correct 
classifications. 

Both  the  trajectory  plot  and  the  scatter  plot  are 
plotted  against  two  coordinates  known  as  brightness 
(abscissa)  ana  green  number  (ordinate).  (For  a 
detailed  discussion  of  these  coordinates,  see  the 
paper  by  Kauih  and  Richardson  entitled  “Signature 
Extension  Methods  in  Crop  Area  Estimation  ")  The 
values  of  these  coordinates  are  obtained  from  an 
affine  transformation4  of  the  four-dimensional 

'For  l ands.u-2  dala.  the  I matrix  is 
033231  0.60316  IKiiSHI  ll  2622 7 

-0  28.tr  -0  660116  -l|s"H  II  388.13 

t alihruinin  dillci o nccs  in  ihe  satellite  data  alter  ihis  uanslnrma 
mm  the  vector  b is  estimated  h.r  each  settmem  and  acquisition 
and  is  intended  to  normalize  greenness  values  nt  hare  soil  to  zero 
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(single-pass)  Landsat  data;  i.e..  letting  x represent  a 
vector  of  values  associated  with  a Landsat  pixel,  the 
coordinate  values  y would  bey  - A%  — b,  where  A is 
a 2 by  4 matrix  and  b is  a 2 by  1 vector. 

Experiments  have  shown  that  much  of  the  infor- 
mation in  Landsat  I and  2 data  is  concentrated  in  a 
two  dimensional  subspace  of  the  Landsat  four- 
dimensional  space  (iei.  1).  This  means  that  informa- 
tion displays  can  be  constructed  to  graphically  repre- 
sent important  data  characteristics  on  a two-dimen- 
sional surface  (i.e.,  a piece  of  paper).  The  affine 
transformation  represents  an  attempt  to  rotate  the 
coordinate  system  in  that  two-dimensional  subspace 
to  bring  out  properties  of  the  data  related  to  soil 
brightness  and  canopy  growth  (or  green  develop- 
ment). Although  there  is  no  claim  tha«  this  transfor- 
mation provides  a means  of  measuring  these  two 
properties  precisely  (even  if  they  were  precisely 
defined), experiments  have  shown  that  the  represen- 
tation is  reasonable  (see  the  paper  by  Kauth  and 
Richardson).  In  particular,  one  finds  that  the  bright- 


ness coordinate  value  for  dark  rails  is  less  than  the 
value  for  tighter  rails.  As  a wheat  crop  grows,  its 
green-number  value  increases  until  the  wheat 
reaches  the  senescent  period.  After  that  period,  the 
green  number  decreases,  presumably  because  the 
crop  is  yellowing. 


Classification 

In  Procedure  1,  the  distributions  of  the  classes  of 
interest  (e.g.,  small  grains  and  non-small-grains)  are 
approximated  by  a weighted  sum  of  normal  distribu- 
tions. The  number  of  distributions  in  these  sums  and 
the  estimates  of  the  parameters  related  to  these  dis- 
tributions are  obtained  through  a clustering  process. 
The  algorithm  used  for  clustering  is  called  ISOCLS 
(ref.  2). 

ISOCLS  sorts  the  pixel  spectral  data  in  the  L ACIE 
segment  into  a set  of  clusters,  and  the  elements  ■„•(  n 
cluster  are  treated  as  a sample  from  one  normal  dis- 
tribution. As  will  be  explained  in  more  detail  later, 
these  samples  are  then  used  to  estimate  the 
parameters  (i.e.,  the  mean  vector  and  covariance 
matrix)  of  that  distribution. 

The  initial  iteration  of  the  algorithm  operates  on  a 
A-means  principle  (ref.  3)  according  to  which  clusters 
are  formed  by  grouping  alt  points  in  the  segments  ac- 
cording to  their  distance  from  a given  set  of  points, 
called  "seed"  points.  Subsequent  iterations  augment 
this  fc-means  principle  with  a split/combine  logic  in 
w hich  clusters  judged  to  be  too  large  are  split  to  form 
smeller  ones  and  clusters  judged  to  be  too  small  are 
combined  with  existing  larger  clusters.  The  main 
operations  in  the  algorithm  are  shown  in  figure  5.  In 

that  figure,  xh  i - 1 «,  denotes  the  fth  clement 

of  a multispcctral  pixel  vector  in  a LACIE  segment. 
(In  a typical  multitemporal  clustering,  n is  16  apd 
there  are  22  932  such  vectors,) 

There  are  several  parameters  that  control  the 
algorithm.  (These  are  indicated  by  quotation  marks 
in  figure  5.)  The  values  of  these  parameters  must  be 
specified  before  the  operation  of  the  algorithm.  The 
first  parameter  is  called  NMIN.  Its  purpose  is  to 
specify  the  lower  limit  of  the  number  of  points  that 
can  be  in  a cluster.  Small  clusters  are  eliminated 
because  they  are  itkely  to  represent  isolated  group- 
ings in  spectral  regions  of  extremely  small  prob- 
ability densities  and  therefore  are  likely  to  be  "out- 
lier” points  from  some  larger  cluster.  Another 
parameter  is  STDMAX.  It  specifies  the  maximum 
sample  standard  deviation  of  the  elements  of  the  pix- 
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el  vectors  in  a cluster.  In  essence,  this  parameter 
bounds  the  volume  of  a cluster  and,  depending  on 
the  dispersion  of  data,  is  a partial  control  on  the 
numbei  of  clusters  that  can  be  formed.  The 
parameter  DLM1N  specifies  a minimum  distance 
between  two  clusters.  This  parameter  is  intended  to 
eliminate  “false  modes"  from  the  data  (U\,  two 
clusters  that  are  more  likely  to  contain  samples  from 
a common  unimodal  distribution  than  from  two  sep- 
arate distributions).  The  number  of  clusters  is  deter- 
mined by  cluster  sues  and  their  volumes  (as  ex- 
plained previously)  up  to  a maximum  number  of 
clusters  specified  by  "CO"  and  also  by  a specified 
logic  sequence  called  the  "split/combine"  sequence. 
An  example  of  a split/combine  sequence  might  be 
SSSSCCS,  which  would  require  that  foi'- c' aster-split- 
ting operations  be  done,  followed  by  two  combining 
orations,  in  turn  followed  by  a final  split  operation. 
The  purpose  v/f  including  the  split/combine  sequence 
is  to  provide  the  algorithm  with  additional  informa- 
tion which  may  be  known  about  the  cluster  structure 
of  the  data  othei  than  that  information  which  is  auto- 
matically obtained  from  splitting  and  combining 
opeiations  based  or  cluster  volume  (as  measured  by 
the  standard  deviation)  and  cluster  size  (as  measured 
by  the  number  of  samples  in  a cluster),  in  the  final 
step  of  the  algorithm,  small  clusters  are  combined, 
where  small  clusters  are  defined  by  the  parameter 
P(N)  and  the  number  of  channels  in  the  data. 

The  algorithm  can  be  run  in  a so-called  "nearest- 
neighbor”  mode.  In  this  mode,  the  samples  are 
grouped  around  the  seed  vectors,  the  sample  means 
and  covariance  matrices  are  computed  for  each 
cluster,  and  the  algorithm  is  terminated.  Experi- 
ments have  shown  that  the  nearcst-r  jighbor  mode 
of  operation  is  somewhat  more  predictable  than  the 
full-iterative  mode  (see  the  paper  by  Heydorn  et  al. 
entitled  “An  Evaluation  of  Procedure  1”).  This  is 
presumably  because  the  cluster  labeling  (discussed  in 
the  following  paragraph)  is  more  predictable  since 
the  cluster  means  are  likely  to  be  closer  to  the  seed 
vectors  than  when  iterative  clustering  is  performed. 
The  nearest-neighbor  operation  can  be  specified  by 
appropriate  setting  of  the  controlling  parameters. 

Once  the  clusters  have  been  formed,  the  cluster 
means  arc  compared  with  each  labeled  type  1 dot 
vector.  The  label  of  the  dot  vector  that  is  closest  (as 
measured  by  the  Euclidean  metric)  to  a given  cluster 
mean  is  assigned  to  that  cluster,  in  this  way,  every 
cluster  is  automatically  labeled  from  a given  set  of 
type  1 dots. 


A Bayesian  approach  is  taken  to  classify  every 
(non-DO  and  non-DU)  pixel  in  the  LACIE  seg- 
ment.5 The  approach  is  implemented  by  first 
estimating  density  functions  and  prior  probabilities 
for  the  small-grains  and  non-smail-grains  classes. 
Class  density  functions  are  modeled  as  mixtures  of 
normal  densities.  Thus,  the  density  function  for  the 
Aih  class  (where  A - I denotes  the  small-grains  class 
and  k *•  2 denotes  the  non-small-grains  class)  can  be 
expressed  as 

nk 
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where  I*)  is  the  /th  component  normal  den- 

sity function  for  the  Ath  class.  The  prior  probability 
p*.  the  mean  and  the  covariance  matrix  if  are 
the  parameters  that  therefore  specify  the  exact  class 
density.  The  parameter  set  (pf,  it}.  If)  is  estimated 
from  all  the  points  whicn  fail  in  the  hh  cluster  that 
has  been  assigned  to  the  Ath  class  in  the  above-men- 
tioned labeling  process.  Generally,  the  prior  prob- 
ability pf  is  estimated  as 


4^  / 

/ 

where  A*  is  the  number  of  points  in  the  rth  cluster 
assigned  to  the  Ath  class.  However,  one  :an  introduce 
other  convex  sets  of  numbers  if  they  aie  considered 
to  be  better  estimates.  The  mean  (if  and  covariance 
matrix  if  are  estimated  using  the  isual  sample 
estimators. 

Let /*(•),  A - 1, 2,  denote  the  estimate  of  the  den- 
sity function,  which  is  obtained  by  substituting  the 
estimates  of  the  parameters  in  equation  (1).  Then, 
the  classification  rule  for  assigning  any  non-DO  or 
non-DU  pixel  x in  the  scene  to  class  1 or  class  2 |s  as 
follows:  “Decide  .vis  a pixel  from  class  1 if  jr\f\(x) 
> "ViLv)  and  a pixel  from  class  2 if  ir |/((.v)  « 


5This  approach  is  similar  to  the  one  discussed  earlier  in  rela- 
tion to  equation  (I)  The  main  differences  are  in  the  way 
parameters  are  estimated. 
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As  before,  any  two  convex  numbers  can  be  used  as 
estimates  of  *r,  and  tr2  if  ancillary  information  indi- 
cates that  these  alternative  numbers  would  be  better 
estimates. 

The  pixels  that  are  assigned  to  DO  areas  in  the 
screening  process  are  automatically  tabulated  with 
those  classified  as  nc.vsmall-grains.  The  pixels 
assigned  to  DU  areas  are  left  unciassiiled  since  they 
represent  areas  under  clouds  and  therefore  are  of 
unknown  classification. 

Pixels  that  are  a large  distance  away  from  any 
mean  of  a small-grains  cluster  are  thresholded;  i.e.. 
they  are  not  classified  as  small  grains  but  rather  are 
assigned  to  the  non-small-grains  class.  To  decide 
whether  or  not  a point  x should  be  thresholded.  it  is 
compared  with  each  small-grains  cluster  sample 
mean  X \ using  squared  metric 


(prestratiftcation).  In  the  latter  case,  the  type  2 dots 
are  allocated  in  proportion  to  the  stratum  sizes. 

In  LACIE,  poststratification  was  used  since  the 
type  2 dots  were  labeled  before  a machine  classifica- 
tion was  obtained.  From  the  analyst's  point  of  view. 
thi£  type  of  allocation  is  simpler  to  use  since  the 
same  type  2 dots  are  labeled  at  each  acquisition.  In 
the  other  allocation,  the  location  of  the  type  2 dots 
woul  i be  a function  of  the  classification  result  and 
hence  would  require  that,  in  multitemporal  applica- 
tions, the  analyst  would  label  more  dots.  The 
prestratiftcation  approach  has  the  advantage  that  the 
resulting  area  estimator  has  a smaller  variance  than 
the  poststratified  estimator.  However,  as  the  number 
of  type  2 dots  increases,  the  variances  of  the  two 
estimators  converge,  The  estimator  for  both  the 
poststratified  and  the  prestratified  estimates  is  given 
in  the  following  paragraphs. 

For  the  case  where  the  machine  classification  pro- 
duces nonempty  strata  for  both  the  small-grains  and 
the  non-small-grains  stratum  and  type  2 dots  fall  in 
both  strata.6  the  estimator  is 

K ~ + /A’i(!("Kl  x>  (3) 


J2(x,X,)  * (x  X,)V(*  *i)  {2) 


where  5/ is  the  sample  covariance  matrix  for  the  Ah 
cluster.  If  dhxX,)  exceeds  a given  threshold  value 
8,  for  all  indexes  / related  to  the  small-grains  clusters, 
then  that  pixel  is  thresholded.  The  threshold  is 
selected  so  that  Pr[(P(x,  JT,)  > &,]  ■ 0.01.  Because 
d2(x,  X,)  has  an  F distribution  with  p and  V,  - p 
degrees  of  freedom,  where  N,  is  the  sample  size  (used 
to  compute  S,  and  X,)  and  p is  the  dimensionality  of 
the  x vector,  the  threshold  values  are  tabulated  in 
standard  references. 


Atm  estimation 

The  small-grains  area  estimate  is  obtained  by  com- 
bining an  analyst's  estimate  with  the  machine  esti- 
mate. From  a statistical  sampling  point  of  view,  the 
estimate  is  a stratified  area  estimate  where  the 
stratification  is  performed  after  the  type  2 dot  alloca* 
;ion  (poststratification)  or  before  the  allocation 


where  P\\(n) 


/,l0(«) 


x 

V, 

,v 


the  number  of  type  2 dots  called 
class  1 (small  grains)  by  the  analyst 
and  classified  as  class  I by  the 
machine  divided  by  the  number  of 
type  2 dots  machine  classified  as 
class  I 

the  number  of  type  2 dots  called 
class  I by  the  analyst  and  classified 
as  class  0 (non-small-grains)  by  the 
machine  divided  by  the  number  of 
type  2 dots  machine  classified  as 
class  0 

,y.v 

number  ot  pixels  machine 
classified  as  class  1 
22  932  minus  the  number  of  DO 
pixels  minus  the  number  of  DU 
pixels  minus  the  number  of 


b|n  the  postsiraiilted  case,  it  is  possible  noi  id  base  ant  ttpe  2 
dots  Call  in  a given  stratum  In  the  prestratilted  ease,  the  aliueaium 
is  eontrolled  and  this  still  not  happen  provided  enough  dot*,  are 
allocated 


* 
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thresholded  pixels 

n - index  denoting  the  total  number  of 
type  2 dots  used  in  the  estimate 
For  the  case  where  the  machine  classification  pro* 
duces  an  empty  stratum  or  if  no  type  2 (tots  fall  in  a 
given  stratum,  a simple  random  sample  estimate 
based  only  on  the  type  2 dots  is  used. 


Evaluation 

Each  segment  estimate  in  Procedure  I is  checked 
in  an  effort  to  spot  estimates  that  deviate  by  large 
amounts  from  the  true  value.  If  the  estimates  P\  t and 
Pio  in  equation  (3)  are  a result  of  an  unbiased  process 
(which  could  be  (he  case  if  labeling  were  (tone  by 
ground  sampling),  then  P in  equation  (3)  is  an  un* 
biased  estimate  of  the  true  proportion.  Under  these 
conditions,  (ty/S)  ~ £is  an  unbiased  estimator  of 
the  classifier  bias  (i.e.,  an  estimator  that  permits  a 
test  of  the  hypothesis  that  the  proportion  estimate 
,V|/.Vdirectly  from  the  classifier  is  "too  far"  from  the 
true  valve  P ).  This  would  imply  that  classification  er* 
rors  of  omission  or  commission  or  both  are  large  and 
hence  that  the  stratification  done  by  the  classifier  is 
resulting  in  an  inefficient  process.  (See  the  paper  by 
lieydorn  et  al.  for  a discussion  of  the  relation  be* 
tween  the  omission  and  commission  errors  and  sam- 
pling efficiency.)  Unfortunately,  because  of  labeling 
error,  the  manual  labeling  process  is  not  unbiased  at 
the  segment  level  and.  consequently,  under  these 
conditions,  the  above  statistical  lest  mer'*';  accepts 
or  rejects  the  hypothesis  that  'he  classifier  estimate  is 
"too  far"  from  the  analyst-expected  value.  When 
manual  labeling  is  used,  the  underlying  philosophy 
in  the  evaluation  process  is  to  check  th*  consistency 
between  the  machine  classification  and  the  manual 
labeling. 

A variety  of  statistics  and  visual  displays  is  pro- 
vided for  evaluating  a given  estimate.  As  a result  of 
the  machine  classification  process,  a classification 
map,  a cluster  map,  a conditional  cluster  map.  and  a 
classification  summary  are  produced.  Examples  are 
shown  in  figures  6 and  7.  The  classification  map  is  a 
transparency  ihe  size  cf  a LACIE  CIR  segment  im- 
age in  which  small-grains  areas  are  clear  and  non- 
stnall-grains  areas  are  opaque.  If  (he  small-grains 
class  is  split  into  winter  small  grains  and  spring  small 
grains,  then  the  wimcr-small-grains  areas  are  clear 
and  the  spring-small-grains  areas  arc  gray.  A visual 
evaluation  of  machine  classification  performance 
can  be  made  by  overlaying  the  classification  map  or. 


a LACIE  CtR  image.  Evaluation  using  clutter  maps 
and  conditional  duster  maps  is  a similar  proem.  The 
conditional  cluster  map  is  similar  to  the  classification 
map  in  that  small-graini  and  non-small-grains  areas 
are  clear  and  opaque,  respectively;  however,  in  addi- 
tion, areas  in  the  image  corresponding  to  spectral 
clusters  whose  mean  is  farther  from  the  closest  type 
I dot  than  a specified  threshold  value  are  primed  in  a 
color.  These  colored  areas  are  then  "flagged"  for  the 
analyst  as  regions  where  the  spectral  values  are  not 
representative  of  the  spectral  values  of  the  type  I 
dots.  The  analyst  can  then  judge  whether  the  areas 
corresponding  to  these  conditional  clusters  have 
been  correctly  or  incorrectly  labeled  in  the  automatic 
duster-labeling  process.  The  cluster  map  is  a map  in 
which  an  area  corresponding  to  a given  spectral 
duster  is  assigned  a distinct  color. 


Multitomporal  Estimation 

By  observing  the  biomass  development  of  a crop 
canopy  at  specific  phenologica!  stages,  it  is  possible 
to  discriminate  that  crop  from  generically  different 
crops  with  reasonable  accuracy.  This  fact  has  moti- 
vated the  use  of  crop  calendar  predictions  in  the 
labeling  process  and  multitemporal  classification  in 
the  machine  p: xessing.  Consequently,  Procedure  i 
was  designed  to  process  Landsat  data  in  u multitem- 
poral fashion.  The  basic  muititempora!  strategy  is  to 
acquire  Landsat  data  at  least  once  within  each  of  up 
to  four  time  inte.’  Ms  of  the  small-grains  crop  calen- 
dar sale  (if  the  Landsat  coverage  is  not  obscured  by 
cloud  cover),  concatenate  the  Landsat  pixel  observa- 
tions into  an  n-dimenstonal  vector  (where  n could  be 
4, 8. 12,  or  16,  depending  on  the  number  of  Landsat 
acquisitions  used),  and  classify  those  vectors  as  dis- 
cussed previously. 

If  Procedure  I is  to  be  used  to  make  sequential 
estimates  (as  was  the  case  in  LACIE  where  estimates 
were  made  in  the  early,  middle,  and  harvest  portions 
of  the  wheat-growing  season),  then,  since  a max- 
imum of  four  acquisitions  can  be  used,  it  is  desirable 
to  select  the  best  set  of  acquisitions  to  process.  The 
strategy  followed  in  Procedure  I is  based  on  the 
assumption  that,  as  the  season  progresses,  .ater  ac- 
quisitions will  be  better  for  crop  discrimination  than 
the  worst  of  the  acquisitions  that  have  already  been 
processed.  The  strategy  is  implemented  by  estimat- 
ing a quantity  related  to  the  classification  error  that 
would  be  incurred  by  classifying  the  best  M — I of  M 
acquisitions  previously  processed.  The  acquisition 
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that  is  dropped  from  the  \f  given  acquisitions  is  con- 
sidered to  be  the  worst,  and  it  is  replaced  in  the 
muliitempoi.il  processing  by  the  newly  acquired  ac- 
quisition. As  an  example  of  this  strategy  , consider 
the  case  where  four  acquisitions  have  already  been 
proccssetl  and  another  has  been  acquired  In  this 
case,  classification  performance  lor  each  combina- 
tion of  three  acquisitions  from  the  four  already  in  the 
data  base  (this  does  not  include  the  newly  acquired 
acquisition)  is  estimated  The  best  comnination  t.  re- 
tained and  combined  with  the  new  acquisition  to 
form  a new  combination  of  four  acquisitions 

Classification  pufoiniance  is  estimated  using  the 
average  Bhattacharyya  coefficient  between  compo- 
nent distrinulions  of  the  two  competing  classes 
Thus,  if  i “ 1.2.  . m, . are  the  compo- 


nent distributions  of  the  first  class  (eg  . ihe  small- 
grams  class!  and  V( v . K I.  / “ 1 . 2.  . nit.  are  the 

component  distributions  of  the  second  class  (eg  the 
non-xmall-grains  c’assl.  then  the  Hh.Mtacharyy a 
coefficient  for  the  /th  and  rth  distributions  of  op- 
posite classes  is 

I I 
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CLASSIFICATION  SUMMARY  REPORT 
DOT  SUMMARY 


DOT  DOT  IOC  OOT  DOT  CATEGORY  DOT  DOT  GREEN 

NUMBER  LINPXt  TYPE  LABEL  CLASSIFIED  BRIGHTNESS  NUMBER 
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H(>1  RK  7. — Example  classification  summary  (only  the  first 
three  dots  are  sho.'it). 


The  average  Bhattacharyya  coefficient  is  then 
defined  as 
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CONCLUSIONS 

The  Procedure  1 design  is  an  attempt  to  efficiently 
merge  machine  classification  and  analyst  interpreta- 
tion processes.  Efficiency  is  interpreted  in  terms  of 
both  the  amount  of  time  spent  by  the  analyst  in  ob- 
taining an  estimate  and  the  variance  of  that  estimate 
as  a function  of  the  number  of  dots  interpreted. 

Given  that  an  analyst  can  label  randomly  selected 
dots  from  an  image  with  reasonable  accuracy,  a seg- 
ment estimate  can  be  obtained  directly  from  this 
labeling  process  without  the  intervening  machine 
classification.  The  benefit  of  using  Procedure  1 is 
that  the  classification  process  will  stratify  the  scene 
into  potential  small-grains  and  non-small-grains 
areas  and  thereby  can  reduce  the  variance  of  the 
analyst's  random  dot  estimate  through  the  stratified 
area  estimation  process.  The  more  accurate  the 
classification  process,  the  lower  the  variance  of  the 
estimates. 


Classification  errors  are,  of  course,  dependent  on 
the  observations  that  are  to  be  classified,  and  one  can 
therefore  expect  these  errors  to  vary  from  segment 
to  segment  and  for  different  acquisition  histories  of 
the  Landsat  data.  Experiments  have  shown  that  the 
variance  reduction  in  using  Procedure  1 over  a sim- 
ple random  sampling  approach  differs  considerably 
from  segment  to  segment  (see  the  paper  by  Heydorn 
et  a!.>.  On  the  average,  however,  this  reduction  ap- 
pears to  be  near  0.7  when  compared  to  the  variance 
that  can  be  achieved  using  only  the  type  2 dots  for 
the  simple  random  estimate.  This  suggests  that  im- 
provement in  classification  methodology  is  required, 
assum'rg  that  the  Landsat  data  are  more  than  just 
marginally  effective  in  discriminating  crop  types. 

When  compared  to  the  Phase  I and  II  design,  Pro- 
cedure I provides  a framework  wherein  the  interac- 
tion between  analyst  and  machine  is  more  controlled 
and  therefore  more  easily  studied.  Hence,  the  Pro- 
cedure 1 approach  offers  considerable  research  po- 
tential in  studying  error  propagation  and  a potential 
means  of  developing  improvements  for  future 
designs.  As  an  example,  the  mere  use  of  a fixed  label- 
ing grid  has  provided  a means  for  more  accurately 
determining  the  effect  of  labeling  error  on  classifica- 
tion performance  since  field  selection  is  controlled 
and  therefore  not  confounded  in  the  error.  As 
another  example,  three  proportion  estimates  can  be 
computed  for  a segment  estimate;  namely,  a direct 
machine  estimate  (as  given  by  \ discussed  pre- 
viously), a simple  random  estimate  (as  obtained 
from  the  labeling  of  type  2 dots),  and  the  stratified 
area  estimate.  Thus,  for  each  segment  analysis,  it  is 
possible  to  compare  these  three  estimates  and 
thereby  study  the  advantages  of  machine  and  manual 
processing. 

As  a final  point,  it  should  be  noted  that  Procedure 
1 was  a design  improvement  over  the  Phase  I and  II 
design  both  in  terms  of  increasing  the  number  of  seg- 
ments that  could  be  processed  (see  the  paper  by 
White  entitled  “LACIE  Applications  Evaluation 
System  Efficiency  Report”)  and  in  terms  of  the  ac- 
curacy of  the  estimates  (see  the  paper  by  Potter  et  al. 
entitled  “Accuracy  and  Performance  of  LACIE  Area 
Estimates"). 
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INTRODUCTION 

The  basic  requirements  which  have  been  imposed 
on  the  NASA  Goddard  Space  Flight  Center  (GSFC) 
LACIE  processing  system  are 

1.  To  e.v tract  specified  test  sites  (sample  seg- 
ments) from  Landsat  multispectral  scanner  (MSS) 
data 

2.  To  apply  geometric  corrections  and  perform 
correlations  to  ensure  registration  between  suc- 
cessive data  acquisitions  to  within  1 pixel  (root  mean 
square  (rms)) 

The  processing  flow  necessary  to  meet  these  re- 
quirements is  shown  in  figure  1.  The  NASA  Johnson 
Space  Center  (JSC)  defines  test  sites  by  specifying 
the  geodetic  latitude  and  longitude  of  the  site  center 
plus  the  biological  window  or  time  window  during 
which  MSS  data  acquisitions  are  to  be  extracted.  As 
indicated  in  figure  1,  the  output  of  the  GSFC  LACIE 
processing  is  a digital  tape  containing  sample  seg- 
ments of  all  four  MSS  bands.  A sample  segment  con- 
sists of  117  lines  and  196  pixels  representing  a rec- 
tangular ground  area  approximately  9.3  by  11.1  km 
(5  by  6 n.  mi.). 

The  first  step  in  processing  is  to  determine  which 
sa.nple  segments  are  contained  in  an  MSS  image 
frame.  The  time  windows  specified  for  each  test  site 
are  compared  with  the  MSS  data  acquisition  date  to 
identify  the  sites  desired.  The  latitude  and  longitude 
of  each  of  these  desired  sites  are  then  compared  to 
the  MSS  image  frame  center  latitude  and  longitude. 
Any  site  located  within  3.1°  of  the  image  center  is  in- 
itially assumed  to  be  contained  within  the  MSS  image 
frame.  For  each  of  these  sites,  the  line  and  pixel  loca- 
tion within  the  MSS  image  frame  is  determined. 
Valid  line  and  pixel  numbers  provide  the  final  test  of 
whether  a sample  segment  is  contained  on  the  MSS 
image  frame. 

If  available  attitude  and  ephemeris  data  were  pre- 
cise, the  location  of  the  sample  segments  determined 
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FIGURE  I.— LACIE  processing  flow. 


as  described  in  the  preceding  paragraph  would  be 
sufficient  (although  a more  sophisticated  calculation 
than  that  actually  used  would  be  required).  However, 
errors  in  attitude  and  ephemeris  data  were  originally 
estimated  to  be  as  great  as  ±4.5  km.  These  errors  re- 
quired the  definition  of  search  areas  containing  234 
lines  and  354  pixels  (approximately  18  by  20  km). 
Provided  locational  errors  do  not  exceed  the  ±4.5- 
km  limits,  the  desired  sample  segment  data  will  be 
contained  within  the  larger  search  area.  After  deter- 
mination of  the  estimated  line  and  pixel  number 
location  of  a test  site,  geometric  correction  coeffi- 
cients are  calculated  using  Landsat  attitude  and 
ephemeris  data.  The  line  and  pixel  number  location 
is  used  to  extract  search  areas  from  radiometrically 
corrected  (normal  Landsat  MSS  corrections)  MSS 
data.  Using  these  geometric  correction  coefficients, 
the  search  area  data  for  each  of  the  four  MSS  bands 
are  geometrically  corrected  using  nearest  neighbor 
methods  (i.e.,  corrections  rounded  to  nearest  integer 
pixel). 

For  the  initial  acquisition  of  a test  site,  the  sample 
segment  is  assumed  to  be  the  center  of  the  search 
area.  Based  on  previous  error  estimates,  the  initial 
sample  segment  may  be  mislocated  by  ±4.5  km. 
However,  the  initial  sample  segment  becomes  the 
reference,  and  subsequent  acquisitions  of  a test  site 
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are  extracted  to  coincide  with  the  initial  extraction. 
The  location  uncertainty  of  subsequent  acquisitions 
is  removed  by  an  edge-dependent  correlation 
scheme.  During  initial  sample  segment  extraction,  an 
edge  detection  operation  is  performed  on  the  sample 
segment  data,  and  the  edge  data  are  stored  (edge  file) 
as  the  correlation  reference  for  later  acquisitions. 
Subsequent  search  areas  are  then  processed  using  the 
same  edge  detection  process,  and  coincidence  of 
edges  relative  to  the  reference  edge  data  determines 
the  location  of  the  correlated  sample  segment  to  be 
extracted.  Having  reviewed  the  general  processing 
flow  within  the  GSFC  LACIE  processing  system, 
the  following  steps  will  now  be  described  in  detail. 

1.  Determination  of  line  and  pixel  location  of  a 
search  area  within  an  MSS  frame 

2.  Determination  of  geometric  correction  coeffi- 
cients and  application  of  geometric  corrections 

3.  Edge  detection 

4.  Correlation  by  coincidence  of  edges 

TEST  SITE  LOCATION  CALCULATION 

As  discussed  previously,  test  sites  are  specified  by 
the  geodetic  coordinates  (latitude  and  longitude)  of 
the  site  center.  The  calculation  of  the  corresponding 
search  area  locations  (line  and  pixel  number)  within 
a Landsat  MSS  frame  (computer-compatible  tape 
(CCT))  requires  the  parameters  listed  in  table  I.  The 
diagram  in  figure  2 illustrates  the  simple  geometry 
assumed  in  the  location  calculations.  The  X and  Y 
axes  are  alined  with  the  scan  line  direction  and  the 
orbit  plane,  respectively,  with  the  minus- Taxis  in  the 
direction  of  spacecraft  heading  (i.e.,  spacecraft 
velocity).  The  origin  is  chosen  to  be  the  format 
center  of  the  Landsat  MSS  frame  (CCT).  The  K|  axis 
is  in  the  direction  of  north  (i.e.,  direction  of 
longitudinal  meridian  through  the  format  center), 
and  the  Afj  axis  is  in  the  direction  of  east  (i.e.,  parallel 
of  latitude  through  format  center).  The  orientation 
of  north  ( T, ) and  east  (^t ) is  relative  to  the  format 
center  orbit  plane  (Y)  and  scan  line  (X),  respec- 
tively. For  simplification,  it  is  assumed  that  parallels 
of  latitude  and  meridians  of  longitude  are  straight 
lines.  It  is  also  assumed  that  parallels  of  latitude  are 
orthogonal  to  the  meridian  of  longitude  passing 
through  the  format  center.  Ignoring  geometric  distor- 
tions inherent  in  raw  MSS  data,  the  displacement 
( T, ) of  a test  site  relative  to  the  origin  is  given  by  the 
arc  length  R{La-  La  ) on  a spherical  Earth.  Simi- 
larly, the  displacement  (Afj ) of  a test  site  is  given  by 
the  arc  length  R(L0  - LJ ) cos  lu  on  the  circle  of  a 


parallel  of  latitude.  Any  estimated  errors  in  format 
center  location  can  be  accounted  for  by  empirical 
offsets  Ai  and  Br  The  Tj.  coordinates  of  a test  site 
are  then  given  by  the  expressions 
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Tabu.  L— Input  Parameters  for 
Search  Area  Location 


Parameter 

Suiihnl 

1 atilt 

l ml 

Tvs!  site 


Latitude3 

t.(l  Variable 

rad 

Longitude*5 

Variable 

rad 

l.amhat  MSS  frame 

Format  center 


Latitude8 

V 

Variable 

rad 

Longitude*5 

V 

Variable 

rad 

Spacecraft  heading  angle 

h 

Variable 

rad 

Earth  rotation  skew  angle 

<’/ 

Variable 

rad 

Normalized  spacecraft 

w- 

Variable 

percent 

altitude  change1. 

Normalized  spacecraft 

ft 

Variable 

percent 

velocity  change1* 

Lines  per  MSS  frame  (CCT I 

>t 

2340 

lines/frame 

Pixels  per  MSS  line 

>'t 

3291 

pixels/line 

(line  length) 


( miMaiin 

Earth  radius 

H 

6368 

km 

Nominal  pixel  scale 

K 

0.0565 

km/pixel 

Nominal  line  scale 

M 

0.0799 

kni/line 
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X * SCAN  DIRECTION 

-V  « SPACECRAFT  HEADING 

X|  * PARALLEL  OF  LATITUDE 
THROUGH  FORMAT  CENTER 

Tt  = kONGmiUNAL  MERIDIAN 
THROUGH  FORMAT  CENTER 


spacecraft  velocity  and  altitude  and  the  adjusted 
scales  K'and  W are  accordingly  defined  as 


K' 


* Ml 


(5) 


FIGURE  2.— Search  are*  location  geometry. 


(6) 


During  the  early  period  of  operation  using  Land- 
sat-1  data,  errors  in  attitude  values  caused  format 
center  errors  to  occasionally  exceed  the  ±4.5-km 
predicted  limit.  Those  errors  were  found  to  be 
seasonally  dependent,  and  measurements  of  known 
ground  control  points  in  the  imagery  were  used  to 
determine  /I,  and  /?(for  each  month  (/  - month). 
Subsequently,  these  empirical  data  were  used  to  ad- 
just the  Attitude  Measurement  System  (AMS) 
model  which  calculates  the  Landsat  spacecraft  at- 
titude. On  January  5. 1975,  the  adjusted  AMS  model 
was  incorporated  into  the  GSFC  operational  system 
for  Landsat-1  and  Landsat-2  data  processing.  This 
change  reduced  formal  center  errors  to  ±2.5  km,  and 
the  .4,  and  B , parameters  have  been  set  equal  to  zero 
in  the  LACIE  processing  system. 

As  shown  in  figure  2,  the  transformation  from  Aj , 
)j  coordinates  to  X,  > coordinates  is  simply  a rotation 
of  angle  c «*  b - sr  as  given  by 


Using  equations  (3)  to  (6)  and  recalling  that  the  X ’, 
Korigin  is  at  pixel  PL  H and  line  Lf  /2,lhe  line  (£.,.) 
and  pixel  (Pt. ) location  of  the  test  site  is  given  by 


l. 


0) 


(8) 


An  adjustment  is  required  in  equation  (8)  to  ac- 
count for  Earth  rotation.  This  adjustment  requires  a 
shift  of  ) tan  in  the  A' coordinate  to  yield  the  final 
expressions 


.V  * A't  cos  c )’j  sin  c (3) 


)'  = .V  sin  c + >j  cos  c (4) 


(7) 


The  coordinates  used  are  in  units  of  kilometers. 
To  obtain  pixel  and  line  numbers,  the  pixel  and  line 
spacings  in  kilometers  are  required.  The  nominal 
Kales  A and  M defined  in  table  I are  adjusted  for 
frame-dependent  scale  variations.  The  significant 
contributors  to  scale  variations  (±2  percent)  are 


P 

C 


P,  X )’  tan  a,, 

1 x L 

2 J? 


(9) 


Within  the  GSFC  LACIE  processing  system,  the 
upper  left  corner  of  a search  is  used  to  extract  the 
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search  area  data.  These  data  are  obtained  from  Lc  and 
Pc  as  defined  previously  by  simply  subtracting  117 
lines  and  176  pixels,  respectively.  A final  adjustment 
in  the  pixel  location,  P - Pc  - 176,  is  required  by 
the  relationship  between  actual  video  line  length  and 
the  parameters  PL  and  K defined  as  input 
parameters.  The  scale  factor,  K - 0.0S6S  km,  as 
defined  in  table  I is  the  pixel  spacing  at  the  nadir 
point.  Distances  in  LACIE  algorithms  to  be  dis- 
cussed later  are  calculated  as  true  distances  on  the 
surface  of  the  Earth.  For  a nominal  MSS  total  scan 
angle  of  0.201586  rad,  the  total  scan  arc  on  the  sur- 
face of  the  Earth  is  18S.9  km.  Thus,  at  the  defined 
scale,  the  total  of  3291  pixels  corresponds  to  a 
nominal  MSS  scan  line.  However,  the  pixel  location 
desired  for  extraction  of  a search  area  must  be  in 
terms  of  actual  MSS  pixels  (equally  spaced  in  time, 
not  ground  spacing).  Since  the  magnitude  of  the 
correction  is  estimated  to  be  less  than  3 percent,  a 
simple  scale  adjustment  by  a factor  of  PL  ILL  (LL  m 
actual  MSS  line  length)  is  considered  adequate.  The 
upper  left  corner  of  a search  area  is  defined  by 


£■1.-117  (10) 
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Although  a detailed  analysis  has  not  been  pursued 
to  define  the  errors  introduced  by  all  the  simplifying 
assumptions,  the  successful  correlation  of  sample 
segments  within  the  search  areas  defined  by  these 
algorithms  indicates  sufficient  accuracy  for  the  cur- 
rent GSFC  LACIE  processing  system.  The  m^jor 
problems  experienced  early  in  the  program  were  er- 
rors in  format  center  location,  as  described  pre- 
viously. 

It  is  appropriate  at  this  point  to  discuss  the  use  of 
the  all-digital  geometrically  corrected  data  (fully 
processed— HDT-PM  tapes)  which  will  be  available 
from  GSFC's  Information  Processing  Facility 
system  late  in  1978.  These  data  will  be  resampled  to  a 
specified  map  projection  (presently  Hotine  Oblique 
Mercator  (HOM))  and  will  include  improved  loca- 
tion when  ground  control  is  available.  The  orienta- 
tion of  MSS  frames  relative  to  the  orthogonal  coordi- 


nates of  the  map  projection  will  be  specified  and 
fixed.  The  previously  described  location  calculations 
will  then  be  replaceable  by 

1.  Transformation  from  latitude  and  longitude  to 
orthogonal  map  coordinates 

2.  Rotation  of  coordinates  based  on  specified 
orientation  to  horizontal  and  vertical  MSS  coordi- 
nates 

3.  Division  by  0.057  km  of  both  dimensions  to  ob- 
tain pixel  and  line  location 

The  accuracy  of  such  extractions  will  be  commen- 
surate with  the  expected  accuracies  of  the 
geometrically  corrected  data:  ±1  pixel  when  ade- 
quate ground  control  is  used  and  ±44  pixels  without 
ground  control.  Withiut  ground  control,  the  ac- 
curacy of  location  will  continue  to  be  affected  by  the 
±2.5-km  uncertainty  in  Landsat  attitude  values. 


GEOMETRIC  CORRECTIONS 

Four  types  of  geometric  corrections,  as  defined  by 
the  diagrams  in  figure  3,  are  performed  within  the 
GSFC  LACIE  processing  system  As  indicated  by 
the  vectors  in  figure  3,  these  are  linear  corrections 
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with  the  magnitude  of  the  correction  increasing  with 
displacement  from  the  center  toward  the  edges,  As 
indicated  in  figure  3,  each  correction  is  defined  by  a 
single  correction  coefficient  a,  b,  c,  or  d.  The  correc- 
tion  algorithms  are  defined  as 


P'~P  + a(p-P^b(L-Lc)  (12) 


</(/.-*)  (13) 


where  a.b.c.d  - correction  coefficients  defined 
later 

F.L  « pixel  and  line  number  of  an  ele- 
ment in  corrected  data 

P',L'  « pixel  and  line  number  in  uncor- 
rected data 

Pt  ,L(  » pixel  and  line  number  of  center  (as 
determined  by  eqs.  (7)  and  (9), 
with  Pt  value  from  eq.  (9)  adjusted 
by  the  factor  PL  ILL  to  account  for 
actual  line  length) 

As  outlined  earlier,  the  geometric  corrections  are 
implemented  as  nearest  neighbor  resampling.  This 
simply  means  that  the  calculations  defined  by  equa- 
tions (12)  and  (13)  are  rounded  to  the  nearest  in- 
teger. The  resampling  or  geometric  correction  pro- 
cess is  really  a reformatting  process — for  each  loca- 
tion P.L  in  the  desired  corrected  search  area  array; 
equations  (12)  and  (13)  are  used  to  calculate  the  P\ 
/.'location  in  the  original  array  for  the  pixel  value  to 
be  used. 

Before  proceeding  with  a discussion  of  the 
geometric  correction  coefficients,  the  order  of  opera- 
tions within  the  LACIE  processing  flow  (fig.  t) 
should  be  recalled.  The  geometric  corrections  are  ap- 
plied to  search  areas  before  edge  extraction  and  cor- 
relation. This  order  was  selected  to  avoid  correlation 
errors  or  sample  segment  location  errors  due  to 
geometric  distortions.  However,  this  order  does  con- 
tribute to  registration  errors.  Consider  the  special 
case  of  two  data  acquisitions  with  identical  geometry 
(i.e.,  a.b.c.d equ^l  between  acquisitions)  but  different 
location  errors  in  extraction  of  a search  area.  The 
calculations  of  equations  (12)  and  (13)  are 


referenced  to  the  search  area  center  element,  and  the 
location  of  round-off  threshold  points  (an  input  pixel 
either  repeated  or  deleted)  will  be  at  the  same  dis- 
placement from  search  area  center  for  both  cases. 
When  the  sample  segments  are  extracted,  the  sample 
segment  centers  will  not  necessarily  be  the  same  dis- 
tance from  search  area  centers  for  each  acquisition. 
Thus,  the  round-off  threshold  points  for  the  two 
sample  segments  will  not  coincide  and,  in  the  worst 
case,  SO  percent  of  the  pixels  will  be  misregistered  by 
1 pixel,  although  the  geometry  of  the  data  is  identi- 
cal. These  errors  can  be  eliminated  by  applying  the 
geometric  corrections  to  raw  sample  segments  after 
correlation  is  accomplished.  Since  the  present  order 
was  selected  to  ensure  correlation  success,  a test 
study  would  be  required  to  determine  the  effective- 
ness of  correlation  before  geometric  corrections.  If 
corrections  are  necessary  before  the  correlation  step, 
the  sample  segment  location  determined  by  correla- 
tion could  be  used  to  extract  data  from  the  uncor- 
rected search  area  at  the  improved  location,  and  a 
separate  geometric  correction  could  then  be  applied 
to  the  sample  segment. 

The  four  geometric  corrections  defined  previously 
were  selected  by  reviewing  the  parameters  affecting 
the  geometry  of  Landsat  MSS  data.  In  table  II,  the 
parameters  considered  to  be  significant  and  esti- 
mates of  their  contributions  to  registration  errors  are 
summarized.  The  error  estimates  are  worst  case — 
opposite  extremes  of  parameter  values  and  at  the 
edges  of  a sample  segment,  where  errors  are  greatest 
(refer  to  fig.  3).  The  maximum  absolute  values  of  the 
correction  coefficients  using  the  worst-case  param- 
eter values  are 

|«|<  0.04  |c|<  0.039 

|i>|  < 0.U72  |d|<  0.024 

The  impact  of  each  parameter  will  become  evi- 
dent with  the  presentation  of  the  correction  coeffi- 
cient algorithms. 

Input  parameters  required  for  the  calculation  of 
correction  coefficients  are  defined  in  table  III.  The 
along-scan  scale  coefficient  a is  defined  as 


(14) 
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where  K » 0.0565  km  - pixel  spacing  defined  for 
a corrected  sample  segment  and  AT  is  the  pixel  spac- 
ing in  an  uncorrected  search  area.  The  uncorrected 
pixel  spacing  A"  is  defined  by 


Table  U.— Maximum  Registration  Errors  Within  a 
Sample  Segment 

[Uncorrecteddata) 


where 


'(<■;  *hco- 7 ('01 

1 {^r  ” (» * “4  - <t  * %) 

and 

(ut/  \ 

0 * ,4  sin  y-j? — + BJ  + C - 0f 

The  0 calculation  accounts  for  the  nonlinear  MSS 
mirror  velocity  defined  by  the  parameter  values 
listed  in  table  III.  Landsat-2  mirror  parameters  have 
always  been  used  under  the  assumption  that  Land- 
sat-1  and  Landsal-i  mirror  mechanisms  have  ap- 
proximately the  same  characteristics,  and  any  small 
differences  will  be  negligible  over  the  size  of  a 
LACIE  area.  The  y(P')  calculations  transform  from 
scan  and  pointing  angle  (roll  angle,  a K ) reference  at 
the  spacecraft  to  subu-nded  arc  angle  from  nadir  on 
the  surface  of  a spherical  Earth.  The  /(/y./y ) 
represents  the  ground  arc  length  on  the  surface  of  a 
spherical  Earth.  The  factor  Pl/LL  in  equation  (15) 
adjusts  for  the  use  of  PL  in  the  0 calculation  (similar 
to  previous  line  length  adjustment  discussion).  It 
should  be  noted  here  that  the  use  of  K in  search  area 
location  calculations  was  another  approximation. 
The  uncorrected  value  AT 'really  applies;  however,  K' 
is  not  calculated  until  after  search  area  location 
verifies  that  the  sample  segment  is  contained  on  the 
Landsat  image  frame. 

The  along-track  scale  coefficient  d is  defined  as 


d 


M 

HP 
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Parameter 

Max.  error,  pertent 

Pixel 

Line 

Along-scan  Kale 

A Altitude  < 1.6  percent 

1.6 

— 

Mirror  velocity  < 1 percent 
Perspective— Earth 

1.0 

curvature  < O S percent 

.5 

— 

Roll<  P (— »0.l  percent) 

.1 

— 

Root  aunt  square 

1.9 

— 

Translation  of  Kan  lines 

A Roll  (±0.0037sec)  < 0.00017 

1.3 

— 

rad 

Adjacent  orbit  ( 1. 1°  rotation) 

I.S4 

— 

Root  sum  square 

2.02 

— 

Along-track  Kale 

A Velocity  < 0.2  percent 

— 

0.12 

A Pilch  ( ±0.0037sec)  < 0.00017 

— 

1.0 

rad 

Root  sum  square 

1.01 

Rotati>  n of  scan  lines 

Yaw  < 0.6° 

— 

.IS 

Adjacent  orbit  ( 1 . 1 * rotation) 

— 

1.31 

Root  sum  square 
Total  root  sum  square 

— 1 

151 

Without  adjacent  orbits 

<2.7 

— 

Using  adjacent  orbits 

<3.3 

— 

where  M » 0.0799  km  « line  spacing  defined  for  a 
corrected  sample  segment  and  A/'  is  the  line  spacing 
in  an  uncorrected  search  area.  The  uncorrected  line 
spacing  M'  is  defined  by 


M‘  * M 


(l  A & 

\ ^ 


li  AerjA 
M m:  ) 


(17) 


The  A/'calculation  accounts  for  changes  in  ground 
distance  covered  by  the  fixed  number  of  lines  in  an 
uncorrected  search  area  caused  by  spacecraft  velocity 
variations  from  nominal  and  by  spacecraft  tilting  for- 
ward or  back  from  first  to  last  line  of  a search  area. 


p 


4 


4 
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Table  ML— ‘Definition  of  Terms 


Symbol 

Definition 

1 alite 

Unit 

Genera 1 

/*./. 

Pixel  and  line  number  of  an  element  in  corrected  data 

Variable 

None 

/*./. 

Pixel  and  line  number  in  uncorrected  data 

Variable 

None 

'V', 

Pixel  and  line  number  of  center  element  of  search  area 

Variable 

None 

'V'Y 

Pixel  numbers  fur  first  and  last  elements  in  an  uncorrected  search  area  line 

Variable 

None 

A /’  A/. 

Total  number  of  pixels  and  lines  in  uncorrected  search  area 

Variable 

Quantity 

I.L 

Number  uf  pixels  in  a full  tine  of  Landsat  data  tactual  line  length) 

Variable 

pixels 

>'l 

Length  of  MSS  scan  line  (nominal  line  length) 

3291 

pixels 

u ) 

Spacecraft  yaw  angle 

Variable 

rad  (deg) 

II 

Rotation  angle  between  adjacent  MSS  orbit  frames 

Variable 

rid  (deg) 

Difference  in  roll  angle  from  first  to  last  line  of  search  area 

Variable 

rad  (degt 

Au  ft 

Difference  in  pitch  angle  from  first  to  last  lines  of  search  area 

Variable 

rad  (deg) 

At 

Normalized  velocity  change 

Variable 

percent 

II 

Spacecraft  altitude 

Variable 

km 

K 

Earth  radius 

6368 

km 

P 

MSS  scan  angle  (0  at  nadir) 

Variable 

rad  (deg) 

yil') 

Arc  angle  on  Earth  corresponding  to  pixel  /’ 

Variable 

rad  (deg) 

l.andsat-2  MSS  mirror  velocity  parameters 



0 100793 

rad 

-1 

— 

36954 

rad 

H 

— 

-.2672$ 

rad 

( 

— 

.097588 

rad 

U> 

— 

17  0903 

rad/scc 

's 

— 

.032330 

tec 

Comments  made  previously  regarding  the  use  of  K 
in  search  area  location  calculations  also  apply  to  the 
choice  between  M and  A/'. 

The  scan  line  translation  (norizomal  skew)  coefli 
cient  b is  defined  as 


. H SaRJ_  At  . . , ,1B, 

h.  - + sin  w (ig) 


The  firit  term  accounts  for  translations  between 
MSS  lines  caused  by  the  spacecraft  tilting  to  the  side 


from  the  first  to  the  last  line  of  a search  area.  The 
second  term  represents  one  component  of  the  image 
rotation  correction  required  to  register  sample  seg- 
ments from  adjacent  Landsat  orbits.  Data  acquired 
from  iicncoincident  orbit  paths  have  an  inherent  im- 
age rotation  which  is  corrected  in  the  GSFC  LACIE 
processing  system  by  rotating  the  vertical  image  axis 
through  this  ierm  end  by  rotating  the  scan  lines  using 
coefficient  c.  Normal  orbit  drifts  will  introduce  this 
image  rotation;  however,  this  correction  was  imple- 
mented to  take  advantage  of  the  considerable  overlap 
of  MSS  image  data  between  adjacent  orbits  at  higher 
latitudes  (above  50°N).  Use  of  data  from  adjacent  or- 
bits allows  consecutive-day  coverage  of  LACIE  sites. 
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The  definition  or  the  rotation  angle  W will  be  pre- 
sented  later. 

As  defined  in  equation  (18),  the  coefficient  b may 
be  oversimplified.  Earth  rotation  skew  would  nor- 
mally be  accounted  for  by  a correction  of  this  type. 
Early  in  the  development  of  the  GSFC  LAC1E  proc- 
essing system,  it  was  decided  that  an  Earth  rotation 
correction  was  not  essential  for  registration  of  sam- 
ple segments.  The  rationale  for  this  decision  was 
based  on  the  fact  that  the  Earth  rotation  correction  is 
a constant  correction  for  a specified  location  as  in  the 
case  of  any  LACIE  site.  In  retrospect,  it  can  be  seen 
that  this  is  not  entirely  correct  since  the  MSS  sensor 
scans  six  lines  in  each  mirror  sweep.  Earth  rotation 
introduces  translations  between  mirror  sweeps  (sets 
of  six  lines),  not  between  lines.  Since  the  MSS  scan  is 
asynchronous,  the  location  of  the  Earth  rotation  be- 
tween mirror  sweeps  does  not  remain  fixed  within 
the  sample  segments  extracted  for  a test  site  and  will 
contribute  to  registration  errors.  This  correction 
could  not  be  made  when  this  effect  was  recognized 
since  the  reference  sample  segments  would  have  had 
to  be  reprocessed  and  the  registration  of  significant 
potions  of  the  LACIE  data  base  would  have  been 
impacted.  A similar  (although  smaller)  six-line  mir- 
ror sweep  effect  is  introduced  by  the  roll  correction 
term  in  equation  (18)  since  it  applies  a separate  cor- 
rection to  each  line  rather  than  to  each  six-line 
sweep. 

Within  the  six  lines  of  an  MSS  mirror  sweep,  there 
is  an  0.08-pixel  offset  between  lines  due  to  the  finite 
sampling  interval.  Although  an  0.08-pixel  offset  is 
small,  the  accumulated  effect  is  a 0.4-pixel  offset  be- 
tween the  first  and  the  sixth  lines  of  a mirror  sweep 
or  equivalently  a 0.4-pixel  offset  between  the  sixth 
line  of  a mirror  sweep  and  the  first  line  of  the  next 
mirror  sweep.  This  correction  is  also  not  accounted 
for  by  equation  (18).  These  points  should  be  evalu- 
ated and  equation  (18)  modified  in  any  future 
LACIE  processing  system.  The  scan  line  rotation  of 
coefficient  < is  defined  as 


tion  from  an  actual  orbit  path  to  e fictitious  orbit 
path  passing  through  the  lest  site  center.  This  fic- 
titious orbit  path  was  devised  to  minimize  the  rota- 
tion required  and  to  avoid  the  complication  of 
referencing  rotations  to  the  orbit  during  which  an  in- 
itial sample  segment  was  extracted.  This  image  rota- 
tion angle,  due  to  the  orbit  path  not  passing  through 
sample  segment  (or  search  area)  centers,  is  defined 
by 


If  • tan  1 (sin  tan  A/.) 


(20) 


where  ■ geodetic  latitude  of  test  site  center 
Ad  — difference  in  longitude  between  orbit 
path  and  test  site  center  at  geodetic 
latitude 

Sign  conventions  used  are  north-positive,  south- 
negative,  east-positive,  and  west-negative.  The 
longitude  difference  Ad  is  defined  by 


where  dv  - longitude  of  test  site.  d„  - nadir 
longitude  at  Landsat  frame  center,  and  /v,  /„,  and 
6d,  are  defined  by  algorithms 


cos 


i 

I (.‘OS*  COS'/)' 

sin  / cosO^ 

J 


(22) 


c*jjptan(ay  If)  (19) 


This  coefficient  accounts  for  the  combined  rotation 
of  scan  lines  due  to  yaw  ana  adjacent  orbit  image 
rotation  If. 

The  image  rotation  angle  If  is  defined  as  the  rota- 


/„  • CO,  1 


k 


COS-  <4 


cos 


:')1 


sin  / cosO' 

n 


(23) 


(24) 


where  / ~ nomine)  orbit  inclination  angle  •» 
80.886° 

T - nominal  orbit  period  » 103.267 
minutes 

- geocentric  latitude  or nadir 
-tan-1  t(l  - /)Jtan$J 

<t>s'  m geocentric  latitude  of  test  site 

- tan*1  JO  - /)?tan*sJ 
/ - 1/298.3 

- geodetic  nadir  latitude 
<6*  - geodetic  test  site  latitude 

Table  IV  is  a summary  of  the  estimated  maximum 
registration  errors  in  geometrically  corrected  sample 
segment  data.  These  estimates  are  conservative. 


Tabu: /K — Maximum  Registration  Errors 


IResampled  data) 


Parameter 

Max.  rrrw,  penrni 
Pixel  Lme 

Along-scan  scale  - a 

AAltitudc  < 0 02  percent 

0.02 

— 

Mirror  velocity  < 0.2  percent 

i 

— 

Roll  < 0.14* 

02 

— 

Rout  sum  square 

2 

— 

Translation  of  scan  lines  - b 

A Roll  < 000002  rad 

15 

— 

Image  rotation  < 0 4' 

.58 

— 

Root  sum  square 

.60 

— 

Atung-irack  scale  “ d 

A Velocity  < 002  percent 

— 

002 

A Pitch  < 000002  rad 

— 

12 

Root  sum  square 

— 

.12 

Rotation  of  scan  lines  **  ■ 

Va»  < 0 I4‘ 

— 

18 

Image  rotation  0 4 

- 

52 

Root  sum  square 

— 

55 

Total  root  sum  square 

<8 

* 

especially  with  regard  to  the  uncertainty  in  the  image 
rotation  angle  It' . The  0.5-percem  round-off  error  in* 
herent  in  the  nearest  neighbor  process  and  another 
possible  ±0.5*percent  error  due  to  nearest  pixel  cor* 
relation  are  also  present.  An  accurate  combination  of 
these  errors  with  the  geometric  uncertainties  of  table 
IV  has  not  been  devised.  It  is  estimated  that  an  over- 
all registration  accuracy  of  ± 1 pixel  (rms)  is  attained 
through  the  OSPC  LACIE  processing  system. 


EDGE  DETECTION 

The  correlation  of  temporally  separated  MSS  data, 
such  as  L/iCiE  processing  requires,  is  complicated 
by  changer  in  gray  level  (even  contrast  reversals) 
during  the  seasons  of  the  year.  A fully  satisfactory 
method  for  treating  these  changes  using  the  normal 
correlation  process  (i.e.,  gray-level  correlation)  has 
not  been  devised.  This  problem  led  to  the  OSFC 
selection  of  an  edge  detection  process.  It  was 
assumed  that  even  though  the  MSS  signal  levels  for 
fields  may  change  during  the  year,  it  would  be  possi- 
ble to  detect  edges  (e.g.,  field  boundaries)  at  any 
time.  A study  (ref.  I ) was  conducted  to  determine  an 
edge  detection  process.  The  edge  value  E for  pixel  t is 
defined  as 


E « |j  - /|  + \c  g | ♦ |h  - /i|  + (</  f\  (25) 


where  the  letters  a through  / represent  the  MSS  pixel 
values  at  the  locations  diagramed  in  figure  4. 

Using  equation  (25),  the  edge  value  E is  calculated 
for  each  pixel  in  the  search  areas  of  MSS  bands  5 and 
7.  These  two  bands  were  selected  for  edge  detection 
after  studies  (ref.  I ) demonstrated  that  some  features 
appear  best  in  band  5.  others  in  band  7.  and  some 
even  appear  as  edges  in  band  5 in  one  season  and 
then  in  band  7 during  another  season.  This  procedure 
led  to  the  derivation  of  edge  images  which  are  a com- 
posite of  MSS  bands  5 and  7 as  described  later.  After 
calculation  of  the  edge  > <lue  for  each  pixel  in  a 
search  area  the  edge  talucs  for  a band  are 
histogramed.  A threshold  edge  value  is  then  defined 
such  that  15  percent  of  the  pixels  in  a search  area 
have  edge  values  greater  than  the  threshold  value. 
All  pixel'  having  edge  values  grater  than  the 
threshold  a*;  called  edges  and  those  with  edge  values 
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a 

b 

c 

d 

e 

f 

9 

h 

i 

band  5,  a resultant  cloud  edge  detection  of  minus  1 
Tor  a band  5 pixel  is  used  to  eliminate  the  corre- 
sponding band  7 pixel.  Sufficient  testing  was  not  per* 
formed  to  optimize  the  cloud  threshold  value; 
however,  the  impact  of  cloud  edges  has  been  reduced 
using  an  operational  value  of  60.  The  investigator 
also  attempted  to  establish  a lower  threshold  level  to 
identify  cloud  shadows,  which  would  be  given  an 
edge  value  of  minus  2.  The  minus*!  and  minus*2 
edge  values  assigned  to  clouds  and  cloud  shadows 
would  be  useful  in  identifying  pixels  which  are  not 
valid  for  LACIE  analysis.  The  cloud-shadow 
threshold  definition  is  complicated  by  the  fact  that 
actual  ground  data  can  have  values  down  to  pixel 
level  zero,  and,  thus,  some  true  ground  data  would  be 
identified  as  cloud  shadows.  The  cloud  and  cloud- 
shadow  identification  capability  has  not  been  imple- 
mented. Additional  studies  utilizing  Sun  elevation 
considerations  were  not  completed. 


FIGURE  4. — Tkrtt  by  Ihrtt  array  a*H  for  «St»  calculation*. 


below  the  threshold  are  not  edges  Binary  edge  im- 
ages for  MSS  bands  $ and  7 are  then  constructed  by 
setting  edges  equal  to  one  and  nonedges  equal  to 
zero.  The  final  composite  edge  images  tire  obtained 
by  a logical  OR  process;  i.e.,  a pixel  which  was 
classified  ac  an  edge  in  either  band  5 nr  band  7 is  an 
edge  in  the  composite  edge  image.  The  15-percent 
edge  density  criterion  was  selected  by  testing  correla- 
tion success  rate  for  several  edge  density  values. 
Using  the  15-percent  edge  density  for  the  individual 
MSS  band  5 and  band  7 edge  images  results  in  a com- 
posite edge  image  density  on  the  order  of  20  percent. 

During  operational  use  of  the  GSFC  LACIE  proc- 
essing system,  it  was  found  that  this  edge  detection 
technique  was  sensitive  to  small  scattered  clouds. 
Since  the  threshold  edge  process  selects  the  strongest 
edges  and  clouds  are  usually  brighter  than  any  other 
target,  the  presence  of  small  scattered  clouds  results 
in  cloud  edges  predominantly  rather  than  in  the  tem- 
porally invariant  ground  edges.  A study  (ref.  2)  was 
conducted  to  determine  solutions  to  this  problem.  A 
simple  threshold  test  on  MSS  band  S pixel  values  was 
devised-  Before  calculating  the  edge  value  for  a pixel, 
each  of  the  pixels  in  the  3 by  3 array  (fig.  4)  is  com- 
pared to  a defined  value.  If  any  one  of  the  9 pixels  is 
greater  than  the  threshold  value,  the  edge  value  is  set 
equal  to  minus  I,  which  eliminates  it  from  conten- 
tion as  an  edge.  Although  the  cloud  test  is  only  on 


BDOE  CORRELATION 

The  edge  detection  process  is  applied  to  evt.y 
geometrically  corrected  search  area.  In  the  case  of  an 
initial  extraction  (first  acquisition  of  a test  site),  the 
sample  segment  is  defined  as  the  center  portion  of 
the  corrected  search  area  and  is  extracted  accord- 
ingly. The  corresponding  portion  of  the  composite 
search  area  edge  image  is  also  extracted  and  stored  in 
the  edge  image  file  for  future  correlations. 

When  a sample  segment  extraction  is  to  be  ac- 
complished for  a previously  acquired  test  site,  the  in- 
itial edge  image  of  a sample  segment  is  retrieved 
from  the  edge  image  file  for  correlation  with  the  new 
corrected  search  area.  The  correlation  process  con- 
sists of  simply  counting  coincidence  of  edges  (a  logi- 
cal AND  operation)  between  search  area  and 
reference  sample  segment  edge  images  for  every 
possible  overlay  position.  The  overlay  positions  are 
restricted  to  those  for  which  no  sample  segment  edge 
pixels  fall  outside  the  search  area.  The  location  with 
the  maximum  number  of  coincident  edges  defines 
the  center  location  of  the  sample  segment  to  be  ex- 
tracted. 

To  reduce  cori  elation  processing  time,  a rapid  cor- 
relation rejection  technique  based  on  partial  correla- 
tion estimates  was  devised  (ref.  3k.  The  complete 
correlation  is  performed  for  only  one  thirty-second 
of  the  possible  overlay  alinements:  i.e..  every  six- 
teenth column  and  every  other  row.  For  each  of 
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these  alinements,  the  normalized  correlation  coeffi- 
cient p is  calculated. 


Nc 

P - F (26) 

c 


where  jVf  «■  number  of  coincident  edge  pixels 

« number  of  edge  pixels  contained  within 
overtaxed  portion  of  search  area 

The  mean  correlation  coefficent  p is  calculated  as 
well  as  the  standard  deviation  a about  this  mean.  A 
rapid  rejection  threshold  TRR  is  then  defined  as 


Trr  = P + 3o  (27) 


For  all  remaining  alinements,  a correlation  esti- 
mate is  determined,  using  one-ninth  of  the  edge  pix- 
els. The  edge  pixels  in  the  reference  edge  image  are 
formatted  in  16-bit  words  (16  edge  pixels  per  word), 
and  the  correlation  estimate  uses  every  third  word  on 
every  third  line.  The  correlation  estimate  for  these 
edge  pixels  is  determined  using  equation  (26).  If  the 
value  of  p obtained  is  less  than  trr  , the  alinement  is 


rejected  and  the  next  alinement  correlation  is  initi- 
ated. The  complete  correlation  is  then  performed 
only  for  alinements  that  produce  correlation  esti- 
mates greater  than  trr-  It  is  estimated  that  after  the 
threshold  TRR  is  defined,  only  about  5 percent  of  re- 
maining alinements  require  complete  determination 
of  the  correlation  coefficient.  Naturally,  the  aline- 
ment producing  the  maximum  correlation  coeffi- 
cient p determines  the  location  for  extracting  the 
sample  segment  from  the  search  area. 

Additional  studies  (ref.  2)  were  pursued  to  deter- 
mine the  feasibility  of  correlating  subareas  of  sample 
segment  edge  images  in  order  to  improve  geometric 
corrections  and  registration.  Preliminary  results  indi- 
cated such  methods  are  feasible.  Further  testing  and 
analysis  would  be  required  before  actual  implemen- 
tation could  be  initiated. 
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Development  of  LACIE  CCEA-I  Weather/Wheat 

Yield  Models 

N.  D.  Strommen ,a  C.  M.  Sakamoto, bS.  K.  LeDuc,b  and  D.  E.  Umberger* 


INTRODUCTION 

The  LACIE  required  an  estimate  of  wheat  yield 
throughout  the  wheat  growing  season.  Hie  initial  test 
area  of  the  experiment  included  the  winter  wheat 
and  spring  wheat  regions  of  the  U.S.  Great  Plains.  In 
subsequent  phases,  Canada,  the  U.S.S.R.,  Argentina, 
Brazil,  Australia,  and  parts  of  India  were  added  to  the 
coverage. 

A basic  premise  behind  the  LACIE  project  was 
that  there  existed  considerable  technology  for  yield 
estimation  which  could  be  further  developed  and 
tested  for  its  ability  to  make  accurate,  real-time  yield 
estimates  in  a quasi-operational  setting  The  task  of 
developing  the  initial  operational  yield  models  in 
LACIE  was  assigned  to  NOAA’s  Center  for  Climatic 
and  Environmental  Assessment  (CCEA),1  which 
was  officially  started  in  November  1974.  The  U.S. 
Great  Plains  yield  models  were  scheduled  to  be  in 
operation  by  April  1975. 


MODELING  APPROACHES 

Crop/climate  modeling  can  be  approached  in 
many  ways.  The  selection  of  an  approach  depends 
primarily  on  the  objective  for  which  that  model  is 
developed  and  the  data  and  resources  that  are  avail- 
able for  development.  For  example,  if  the  primary 
objective  is  to  better  reflect  the  physiological  pro- 
cesses of  the  crop  in  response  to  its  environment,  the 


aNQAA  Environmental  Data  and  Information  Service,  Wash- 
ington, D.C. 

bNOAA  Environmental  Data  and  Information  Service, 
Columbia,  Missouri. 

CUSDA  Economics,  Statistics,  and  Cooperatives  Service, 
Columbia,  Missouri. 

*ln  a NOAA  reorganization  during  1978,  CCEA  was  made  a 
part  of  a new  Center  for  Environmental  Assessment  Services. 


best  modeling  approach  may  differ  from  that  taken 
where  the  objective  is  to  estimate  grain  yield  over 
large  areas. 

The  various  approaches  to  modeling  for  grain 
yield  may  be  classified  as  follows:  (1)  causa] 
(phenoiogical,  dynamic,  physiological),  (2)  statistical 
regression,  and  (3)  analog.  Each  of  these  approaches 
has  its  advantages  and  limitations. 


CatiMl  Approach 

Early  discussions  at  CCEA  recognized  that  con- 
ceptually a crop/climate  model  should  have  the 
capability  to  simulate  closely  plant  development  as  a 
response  to  changing  environmental  elements.  The 
estimation  of  model  relationships  using  this  ap- 
proach has  been  done  on  research  data  where  the 
crop  being  modeled  was  grown  either  under  con- 
trolled conditions  or  within  a homogeneous  environ- 
ment (refs.  1 to  3).  This  approach  tries  to  model  the 
biological  effects  of  environment  (climate,  cultural 
practices,  etc.)  on  crop  growth  and  grain  yield.  The 
models  attempt  to  specify  the  complex  processes  in- 
herent in  plant  development  and  reproduction. 
Many  researchers  are  attempting  to  use  experimen- 
tally derived  information  to  model  these  relation- 
ships. Statistical  procedures  are  required  to  estimate 
the  coefficients  of  some  of  these  relationships  once 
sufficient  data  become  available.  However,  the  pres- 
ent lack  of  data  and  the  uncertain  accuracy  of  much 
of  the  available  data  for  wide  ranges  of  the  environ- 
ment make  the  potential  gain  from  such  models 
questionable.  There  are  two  mqjor  problems  associ- 
ated with  this  approach  in  an  operational  system. 
First,  development  and  adaptation  of  the  causal 
crop/climate  model  to  large  areas  would  require  ac- 
quisition of  data  much  of  which  is  not  routinely 
available  for  many  areas  of  the  world.  For  example, 
soil  moisture  capacities  are  not  available  for  all  areas. 
Second,  knowledge  of  causal  relationships  needed  to 
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quantify  the  effects  of  weather  events  on  biologi- 
cal/physical processes  and  ultimately  on  yield  was  in- 
complete at  the  beginning  of  LACIE  and  in  many 
areas  remains  incomplete  today. 


Statistical  Regression  Approach 

The  historical  regression  approach,  as  it  has  been 
applied,  attempts  to  “shortcut"  the  complex  biologi- 
cal processes  and  to  estimate  statistically  some  rela- 
tionship between  observed  environmental  informa- 
tion and  grain  yield.  This  approach  usually  has  been 
limited  to  single  equation  models.  Historical  data 
series  have  been  the  most  commonly  used  sources  of 
information  to  estimate  the  coefficients  of  the  model 
for  large  areas.  Explanatory  variables  may  be 
seasonal,  monthly,  daily,  or  for  some  other  time 
frame.  Considerations  include  environmental  factors 
that  are  hypothesized  to  significantly  affect  grain 
yield.  These  models  are  usually  limited  to  a few  ex- 
planatory variables. 


Analog  Approach 

In  the  absence  of  historical  data  for  a region  of  in- 
terest, it  is  sometimes  feasible  to  seek  a like  area  for 
which  data  are  available,  to  develop  a model,  and 
then  to  use  that  model  as  a surrogate  for  the  area  of 
interest.  Unfortunately,  exact  agricultural  analog 
areas  are  rare,  if  they  exist  at  all.  Prime  candidates 
for  this  type  of  approach  are  China  and  portions  of 
Alaska,  where  reliable  records  of  weather  and  crop 
yield  are  not  available.  The  concept  of  this  approach 
is  similar  to  developing  a “universal"  wheat  yield 
model,  with  selected  agronomic  and  climatic  data  for 
application  to  areas  where  such  data  do  not  exist,  the 
distinction  being  that  only  one  specified  area  is  con- 
sidered in  the  strict  analog  model. 


CCEA-I  MODEL  DEVELOPMENT 


Model  Philosophy 

Given  LACIE's  primary  goal,  that  of  estimating 
wheat  production  for  the  large  areas  of  eight  major 
wheat-growing  regions,  the  statistical  regression  ap- 
proach of  correlating  historical  yield  and  climate  data 


offered  the  greatest  potential  return  within  the  con- 
straints of  time  and  data  resources.  Thus,  CCEA's 
decision  was  to  develop  models  which  utilized  the 
best  currently  available  data.  Monthly  weather  data 
were  readily  available  for  the  United  States  and  were 
available  with  reasonable  effort  for  other  geographic 
areas  in  a form  compatible  with  the  yield  data.  The 
advantages  included  the  availability  of  sufficient  data 
to  build  these  models  in  the  LACIE  test  areas  and  an 
established  flow  of  current  meteorological  data  capa- 
ble of  supporting  a real-time  operational  program. 
These  data  are  available  from  the  World 
Meteorological  Organization's  Global  Telecom- 
munications System  (GTS).  This  is  the  system 
through  which  global  weather  information  is 
transmitted  and  made  accessible  to  all  participating 
World  Meteorological  Organization  members.  A 
more  detailed  account  of  the  GTS  is  given  in  the 
plenary  paper  entitled  “The  Impact  of  LACIE  on  a 
National  Meteorological  Capability”  by  N.  Strom- 
men,  M.  Reid,  and  J.  Hill. 


Model  Form 

Time  and  data  constraint  considerations  and  the 
promise  shown  by  the  models  developed  by 
Thompson  (refs.  4 to  6)  led  CCEA  to  develop  its 
first-generation  wheat  yield  model,  the  CCEA-I, 
using  the  historical  regression  approach. 

The  basic  equation  for  the  CCEA-I  model  is  of  the 
form 

P = constant  + ^technology  trend)  + g(  WX) 

where  g(  WX)  includes  variables  measuring  the  im- 
pact on  yield  of  moisture  stress  and  heat  stress  based 
on  monthly  temperature  and  precipitation  data.  The 
term  ./(technology  trend)  can  include  variables  in- 
dicating the  impact  on  yield  from  changes  in  hybrid 
varieties,  fertilization  rates,  herbicides  and 
pesticides,  and  other  management  or  cultural  prac- 
tices. In  the  absence  of  long-term  quantitative  data 
on  these  technology  variables,  time  (year)  is  used  as 
a surrogate  for  technology  trend.  The  basic  concept  is 
to  have  the  trend  term  describe  the  sustained  yield 
achieved  in  the  region  of  interest  and  let  the  weather 
term  reflect  the  variation  of  yearly  yield  around  the 
trend  term.  The  variation  of  yield  on  a global  scale 
due  to  weather  is  believed  to  be  close  to  10  percent. 
The  variation  of  yield  is  often  larger  than  10  percent 
in  many  areas,  particularly  drier  areas.  Yield  varia- 
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tion  may  be  less  than  10  percent  in  areas  where 
moisture  is  less  restrictive. 

High  multicollinearity  among  the  predictor  varia- 
bles causes  statistical  problems  in  determining  the 
most  appropriate  variables  and  also  in  estimating 
their  impact.  Two  different  weather  variables— tem- 
perature and  precipitation— are  highly  correlated, 
and  variables  for  successive  months  are  also  corre- 
lated because  of  persistence.  Since  complete  indepen- 
dence is  not  practically  attainable,  one  must  deter- 
mine a priori  what  variables  have  important  causal 
relationships. 

The  use  of  monthly  meteorological  data  also 
d*sumes  implicitly  that  particular  phenol ogi cal  stages 
occur  at  the  same  time  each  year.  If  occurrence  of  a 
particular  stage  varies  between  two  calendar  months, 
the  model  may  not  be  able  to  estimate  the  impact  of 
meteorological  conditions  during  one  or  both  of 
those  months.  The  necessary  assumption  of  an 
average  crop  calendar  is  one  of  the  limitations  of  the 
use  of  monthly  data. 


Truncation 

One  of  LACIE’s  requirements  was  within-season 
estimates;  i.e.,  an  estimate  before  harvest.  This  re- 
quirement led  to  the  concept  of  truncated  models  for 
making  within-season  estimates.  Separate  regression 
equations  or  truncated  models  were  developed  to 
capture  information  on  yields  contained  in  weather 
data  available  before  harvest.  If  regression  analysis 
indicated  that  weather  during  periods  prior  to  and 
through  a month  contributed  to  the  final  yield  esti- 
mate, a model  was  truncated  for  that  month.  This 
estimate  was  made  without  any  explicit  assumptions 
about  weather  during  the  rest  of  the  crop  season.  The 
implicit  assumption  is  either  that  weather  in  later 
months  will  be  similar  to  that  which  occurred  in  the 
period  used  to  develop  the  model  or  that  much  of  the 
final  yield  is  determined  by  conditions  prior  to  the 
time  of  the  truncation.  Several  alternatives  to  this  ap- 
proach are  possible,  but  the  intent  in  CCEA-I  was  to 
determine  how  much  information  on  Final  yield 
might  be  available  from  early-season  weather  and  no 
predictive  knowledge  of  future  weather. 


Trend  Variable 

Bond  and  Umberger  (ref.  7)  discuss  seven  factors 
that  affect  changes  in  the  technological  trend  of 


wheat  yield.  These  are  irrigation,  varietal  changes, 
fertilizer  application  rates,  pesticide  usage,  cultural 
practices,  soil  productivity  base,  and  government 
programs  (e.g.,  the  land  bank  program).  A complete 
series  of  historical  data  for  all  these  nonweather  fac- 
tors does  not  exist  for  areas  compatible  with  areas  for 
which  meteorological  data  exist.  Consequently,  the 
effects  of  these  nonweather  factors  cannot  be 
statistically  estimated  with  sufficient  confidence. 
With  little  or  no  data  on  these  factors,  it  is  necessary 
to  assume  a certain  yield  based  on  known  practices  in 
a given  area  and  then  subjectively  specify  the  trend. 

How  a trend  term  and  possible  changes  in  direc- 
tion can  be  specified  is  illustrated  by  analyzing  the 
historical  yield  data  series  for  North  Dakota  (fig.  1). 
The  initial  CCEA  models  were  derived  from  obser- 
vations for  the  period  1931-74  and  were  updated  each 
year  for  Phases  II  and  III.  For  illustration  purposes, 
however,  the  period  1879-1976  is  plotted  in  figure  1. 
The  yield  series  shows  that  yield  trended  downward 
from  1879  until  the  drought  period  of  the  1930's. 
This  downward  trend  is  partially  attributed  to  two 
factors:  (1)  soil  fertility  deterioration  with  time  and 
(2)  expansion  of  wheat  acreage  to  the  less  humid 
western  areas  of  the  state. 

After  World  War  II,  an  upward  trend  in  yield  oc- 
curred. The  period  after  World  War  II  was  a period  of 
gradual  increase  in  fertilizer  application,  while  during 
the  19S0’s  the  mqjor  impact  was  the  introduction  of 
new  varieties.  Other  factors  that  have  led  to  changes 
in  the  North  Dakota  trend  include  increased  use  of 
summer  fallowing,  changes  in  the  location  of  wheat 
growing  activities,  improved  weed  control,  and 
governmental  actions,  such  as  beginning  or  ending 
the  land  bank  program.  For  North  Dakota,  at  least, 
the  yield  trend  appears  to  have  leveled  in  1972.  The 
reasons  for  this  1972  break  can  be  traced  to  increases 


FIGURE  1. — Plot  of  average  state  wheat  yields  versus  year  for 
North  Dakota  spring  wheat,  1879-1976. 
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in  acreage  planted,  which  increased  the  use  of  land 
with  lower  wheat  production  potential,  reduced  the 
percent  of  wheat  grown  on  fallow,  and  reduced  the 
fertilizer  application  rates. 

The  CCEA-I  models  were  not  capable  of  jointly 
estimating  the  weather  effects,  the  technology 
effects,  and  the  possible  weather-technology  interac- 
tions. An  example  of  this  interaction  is  shown  for 
Oklahoma  (fig.  2),  where  the  September  through 
December  precipitation,  a significant  variable  in  that 
area,  clearly  shows  a dry  period  in  the  1950's  and  a 
more  favorable  moisture  regime  in  the  1960’s.  The 
abrupt  shift  in  the  yield  series  in  the  mid-1950’s, 
therefore,  may  not  be  associated  solely  with  chang- 
ing technology. 

Changes  in  technology  can  contribute  to  rapid 
changes  in  yield.  In  Uttar  Pradesh,  India,  for  exam- 
ple, yields  have  increased  substantially  since  1965, 
although  monsoonal  precipitation  has  tended  to 
decrease  from  the  early  1960's  through  the 
mid-1970's  (fig.  3).  The  combination  of  high-yielding 
varieties  and  the  corresponding  increase  in  irrigation 
and  fertilization  on  acreage  planted  to  high-yielding 
varieties  is  responsible  for  much  of  the  increase  in 
yield.  But  would  the  yield  response  have  been  higher 
still  had  the  precipitation  trend  been  on  the  upswing 
rather  than  on  the  downswing?  There  is  no  question 
that  quantitative  relationships  between  trend 
changes  and  known  agronomic  and  climatic 
variability  are  urgently  needed. 


Selection  of  Weather  Variables 

In  choosing  the  type  of  weather  variables  to  in- 
clude in  the  model  for  a given  area,  important  factors 
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FIGURE  2.— Plots  of  average  Oklahoma  wheal  yield  and  early- 
season  precipitation  * 1931-76. 
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FIGURE  3.— Plots  of  avenge  wheat  yield  and  crop  season  rain- 
fall for  the  State  of  Uttar  Pradesh,  India,  1959-77. 

that  must  be  considered  are  (1)  data  availability,  (2) 
the  phenological  stage  of  wheat,  and  (3)  the  long- 
term climate  for  a given  modeled  area.  For  example, 
climatically,  frequent  May  storms  in  the  form  of 
heavy  thundershowers  and  gusty  winds  can  be  detri- 
mental to  yield  in  Oklahoma.  Basically,  the  weather 
variables  of  a model  estimate  the  effects  of  moisture 
stress  and  temperature  stress  on  yield  during  various 
assumed  average  growth  stages.  As  a first  approx- 
imation, precipitation  and  temperature  values  were 
utilized  as  surrogates  for  moisture  and  temperature 
stress. 

A soil  moisture  variable  that  included  a water  bal- 
ance might  have  been  a better  indicator  for  soil 
moisture  stress.  However,  no  reliable  soil  moisture 
estimating  procedures  using  monthly  data  are  known 
to  exist 

Variables  were  initially  selected  from  a priori  in- 
dications that  they  should  be  important.  The  final 
choice  of  which  variables  to  include  in  the  model  de- 
pended on  two  criteria:  (1)  most  importantly,  the 
estimated  effect  of  the  variable  on  yield  at  the 
assumed  phenological  stage  had  to  agree  with  the 
agronomic  expectations;  i.e.,  the  sign  of  the  coeffi- 
cient had  to  be  correct;  and  (2)  each  variable’s  coeffi- 
cient had  to  be  statistically  significant  from  zero,  but 
the  level  of  significance  was  not  a set  value  and  at 
times  was  higher  than  the  usual  5-percent  signifi- 
cance level.  The  omission  of  a meaningful  predictor 
was  undesirable;  thus,  variables  were  accepted  that 
might  otherwise  have  been  rejected.  In  addition  to 
the  application  of  these  two  criteria,  the  potential 
candidate  variables  were  plotted  with  the  detrended 
yield  to  permit  a visual  evaluation  of  their  impor- 
tance 


102 


In  addition  to  the  departure  from  average  of 
monthly  precipitation  amounts,  two  other  candidate 
variables  were  tested  as  possible  moisture  stress  in* 
dicators  in  most  models.  These  were  (I)  the 
difference  between  precipitation  and  potential  evap- 
otranspiration  (PET)  and  (2)  the  ratio  of  evapotrans- 
piration  (ET)  to  PET.  Potential  evapotranspiration 
was  estimated  by  Thornthwaite's  procedure  (ref.  8). 
Conceptually,  these  variables  are  moisture  supply* 
and-demand  indicators,  so  that  when  demand  (PET) 
exceeds  supply  (precipitation  plus  soil  moisture 
reserves),  the  crop  is  under  a measure  of  stress.  The 
ratio  of  ET/PET  has  been  used  as  an  indicator  of 
stress,  although  the  form  1 — ET/PET  has  been 
preferred  (refs.  9 and  10). 

In  the  models  for  Australia  and  Argentina,  a 
monthly  water  budget  procedure  (ref.  11)  was  in- 
cluded to  consider,  in  part,  runoff.  A soil  moisture  in- 
dex called  the  Z-index  is  in  effect  another  moisture 
stress  variable  that  uses  monthly  climatic  data  and  a 
knowledge  of  the  local  soil  water-holding  capacity. 
Sakamoto  (ref.  12)  used  the  Z-index  in  the  semiarid 
zones  of  Australia  and  found  it  to  be  a reliable  indica- 
tor of  moisture  stress  and  hence  a predictor  of  wheat 
yield  for  South  Australia.  The  Z-index  was  not  in- 
cluded in  the  U.S.  Great  Plains  models,  which  were 
developed  before  the  Australian  models.  The  model 
for  Australia  with  the  Z-index  responded  well  to  the 
wide  fluctuations  in  the  data.  A completely  indepen- 
dent 10-year  (1963-72)  test  was  also  run  for  the 
model  developed  from  the  data  set  of  1940-62.  The 
results  indicate  that  the  variables  selected  for  the 
Australian  wheat  yield  models  were  stable  for  the 
two  data  sets. 

In  some  cases,  it  was  also  prudent  to  include  pre- 
cipitation outside  the  growing  period.  The  total  pre- 
cipitation prior  to  planting  for  spring  wheat  or  the 
fall-plus-winter  precipitation  for  winter  wheat  was 
often  used  as  an  indicator  of  the  soil  moisture 
reserve.  For  example,  in  the  Canadian  models,  the 
monthly  precipitation  for  the  period  20  months 
before  planting  was  included  in  order  to  consider  the 
beneficial  effects  of  summer  fallowing.  Unfor- 
tunately, an  excessive  amount  of  precipitation  within 
a short  period  of  time  often  leads  to  runoff,  depend- 
ing on  soil  type,  slope,  and  preexisting  soil  moisture 
levels.  Runoff  was  not  considered  in  the  initial 
models. 

To  cope  with  this  problem  and  also  the  problem  of 
evaluating  the  predictive  capability  of  the  model  for 
climatic  events  outside  the  range  within  which  it  was 
developed,  a censoring  procedure  was  instituted. 


Values  of  precipitation  and/or  temperature  outside 
the  limits  of  the  data  base  were  adjusted  to  bring 
them  within  the  data  from  which  the  models  were 
developed.  In  the  case  of  CCEA-I,  monthly  pre- 
cipitation values  used  in  the  model  for  the  prediction 
year  could  be  no  greater  than  the  90th  percentile  of 
the  historic  data,  and  temperature  values  were  cen- 
sored to  between  the  Sth  and  the  95th  percentile. 
This  means  that  if  the  precipitation  for  the  predic- 
tion year  exceeded  the  90th  percentile,  the  input 
value  reverted  to  the  90th  percentile. 

The  temperature  stress  indicator  variable  was  the 
departure  from  normal  of  the  mean  monthly  tem- 
perature. Although  it  would  have  been  desirable  to 
include  mean  maximum  or  mean  minimum  tem- 
peratures as  candidate  variables  in  all  the  models, 
lack  of  data  and  resources  did  not  permit  this  to  be 
done  in  the  allocated  time.  However,  for  some  cases 
in  Canada,  mean  minimum  and  mean  maximum 
temperatures  were  included  as  candidate  tem- 
perature variables,  allowing  consideration  of  the  po- 
tential effect  of  temperature  on  the  length  of  the 
growing  season  and  the  reduction  in  yield  that  may 
be  produced  by  untimely  freeze. 

Excessively  high  temperature  during  the  (lower- 
ing to  heading  stages  is  detrimental  to  wheat  yield.  In 
the  U.S.  Great  Plains  models,  a dichotomous  variate 
“degree  days  above  90°"  was  developed  to  consider 
this  effect.  The  number  of  degrees  that  the  max- 
imum temperature  is  above  90°  is  the  daily  degree 
days.  This  value  was  accumulated  for  a month  and 
plotted  with  detrended  yield  to  estimate  the  relation- 
ship between  the  two.  In  alt  cases,  the  correlation  was 
negative.  From  the  plots,  it  was  observed  that  if  the 
number  of  degree  days  exceeded  a critical  value 
(which  differed  for  different  regions),  yield  was 
reduced;  but,  if  the  threshold  value  was  not  ex- 
ceeded, no  decrease  occurred. 

In  the  U.S.S.R.  winter  wheat  area,  winterkill  is  a 
major  yield-reducing  factor  in  some  years.  This 
seems  to  be  more  of  a problem  in  the  U.S.S.R.  than 
in  the  U.S.  Great  Plains.  It  was  found  that  in  many 
areas  of  the  U.S.S.R.,  the  December-January  or 
January-February  mean  temperature  was  often  a sig- 
nificant indicator  of  winterkill  effect. 


Episodic  Events 

An  episodic  event  is  defined  as  an  occasion  in 
which  yield  is  affected  by  a relatively  rare  occur- 
rence, natural  or  social.  Examples  of  natural  events 
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include  frost,  hail,  rust  outbreak,  flood,  cattle  tram- 
pling  the  crop,  etc.  (ref.  13).  Social  events  might  in- 
clude revolutions  or  significant  changes  in  national 
agricultural  policy  which  cause  widespread  changes 
in  farming  practices.  Generally,  episodic  events  are 
not  modeled  by  the  selected  set  of  independent  varia- 
bles. When  a particular  episodic  event  was  known  to 
greatly  depress  yield,  the  datum  for  that  year  could 
be  eliminated  from  the  analysis,  and  in  some  cases 
this  was  done.  No  known  quantitative  estimation 
technique  is  currently  available  to  handle  episodic 
events.  However,  an  objective  adjustment  of  yield 
based  on  historically  averaged  damage  or  a very  gross 
adjustment  as  was  done  for  Argentina  (ref.  14)  is 
sometimes  possible. 


DATA 


8tr«ta  Selection 

For  the  United  States,  historic  yield  estimates 
from  the  U.S.  Department  of  Agriculture  (USDA) 
are  available  at  county,  crop  reporting  district 
(CRD),  state,  and  national  levels.  These  estimates 
are  made  using  statistical  sampling  techniques.  The 
coefficient  of  variation  of  the  wheat  estimates  is 
about  2 percent  at  the  national  level  and  from  3 to  8 
percent  at  the  state  level  depending  on  the  statistical 
sampling  in  use  (ref.  IS).  In  most  states,  the  historic 
yield  estimates  at  the  state  and  CRD  levels  are 
geographically  compatible  with  meteorological  data; 
i.e.,  meteorological  data  are  summarized  by  state  and 
crop  district.  The  compatibility  of  the  historic  yield 
and  climatic  data  at  these  levels  minimizes  the  data 
handling  problems  for  the  United  States.  Common 
boundaries  exist  for  historic  yield  and  meteorological 
data  in  other  regions  of  the  globe,  but  these  data  for 
most  other  global  regions  are  limited  to  larger  areas. 
Therefore,  state  models  were  developed  in  the 
United  States,  although  in  some  areas  smaller  strata 
were  selected  (fig.  4)  because  of  climatic  differences. 
In  other  count  ies,  the  model  included  that  area  for 
which  the  wheat  yields  were  reported. 


Operational  Data  Flow 

Real-time  meteorological  data  flow  to  support  the 
operational  program  of  LACIE  requires  summariza- 
tion of  daily  maximum  and  minimum  temperatures 
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FIGURE  4.— Boundaries  of  regions  for  which  U.S.  wheat  yield 
models  were  developed. 


and  precipitation.  This  information  is  available  from 
the  synoptic  scale  network  supplemented  by  the 
monthly  climatic  data.  These  are  limited  data,  and  in- 
dividual stations  must  often  cover  large  areas.  Exten- 
sion of  the  limited  ground  data  could  be  ac- 
complished subjectively  using  meteorological 
satellite  data.  The  meteorological  satellite  data  are, 
however,  considered  most  effective  in  identifying 
areas  of  no  precipitation,  noted  by  absence  of  clouds. 
For  example,  the  meteorological  satellite  confirmed 
the  extent  of  the  drought  in  the  U.S.S.R.  wheat 
regions  in  1975. 


Weighting 

In  the  estimation  of  yield  for  a selected  area  such 
as  a state,  the  weather  data  for  each  CRD  are 
weighted  to  obtain  a state-level  value.  The  weight  for 
each  CRD  is  given  by  that  CRD's  percentage  of  the 
total  harvested  area  in  the  state,  using  historical 
acreage.  No  weighting  of  area  climatic  data  was  re- 
quired for  foreign  areas. 


MODEL  RESULTS 

The  results  of  a 13-year  (1965-77)  simulated 
operational  run,  based  on  the  “bootstrap”  test,  are 
shown  in  figure  5.  Comparisons  are  made  for  five 
sample  states.  The  bootstrap  test  is  a simulation 
where  the  years  prior  to  the  prediction  year  are  used 
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FIGURE  5. — USD  A when  yield  estimates  versus  CX'F.A  model 
estimates  for  selected  states,  1965-77. 


in  developing  the  coefficients  for  the  equations.  Each 
subsequent  year  adds  a year  for  estimating  the  coeffi- 
cient. For  example,  in  1977,  the  years  1931  through 
1976  were  used  in  recalculating  the  model  coeffi- 
cients whose  variables  remained  unchanged.  Produc- 
tion estimates  using  the  USDA  acreages  are  com- 
pared in  figure  6.  In  most  test  years,  CCEA  model 
estimates  compare  favorably  with  USDA  estimates 
In  1971  and  1974.  unusual  departures  of  the  observed 
crop  calendar  from  the  historical  average  led  to  the 
large  yield  discrepancies  in  the  estimates.  The  ex- 
perience in  1977  suggests  a need  to  reevaluate  the 
trend  component  of  the  CCEA-1  model. 


FIGURE  6. — USDA  wheat  production  nllniln  vtntn  Ul'EA 
estimate*  (using  reported  USDA  acreage),  1965-75. 


In  the  development  of  the  model,  historical  CRD 
precipitation  and  temperature  data  are  used.  These 
data  come  from  both  the  synoptic  network  (principal 
weather  stations)  and  the  cooperative  climatological 
network.  In  real-time  operation$,  often  data  from 
only  the  principal  (first-orcur)  stations  are  available. 
To  test  if  these  data  (precipitation  in  particular)  can 
be  improved  using  a denser  network  of  weather  sta- 
tions, data  from  which  are  available  in  a delayed  time 
frame  for  the  United  States,  these  models  were  rerun 
and  the  results  compared  with  the  results  using  only 
the  synoptic  network  (re*1.  16).  The  differences  be- 
tween the  two  estimates  were  very  small.  Thus, 
timeliness  and  cost  considerations  may  indicate  that 
the  operational  data  base  should  be  limited  to  the 
synoptic  scale. 

The  CCEA-I  models  also  revealed  a potential  for 
making  rirlv -season  estimations.  Figures  7(a),  7(b), 
and  7(c)  show  an  example  of  this  tracking  for  the 
U.S.  Great  Plains  and  the  USSR.  Although  much 
can  happen  to  alter  yields  for  localized  areas,  the 
results  of  the  aggregated  yields  over  large  areas  indi- 
cate that  reasonably  accurate  information  is  possible 
in  certain  areas  with  early-season  truncated  models. 
The  performance  of  these  models  over  the  past  three 
(1975-1976*1977)  crop  seasons  is  evidence  that  they 
can  effectively  provide  useful  real-time  forecasts  of 
yield  for  the  principal  wheat-growing  regions  of  the 
world. 

FUTURE  CONSIDERATIONS 

New  efforts  are  currently  being  undertaken  by 
CCEA  to  improve  or  complement  this  initial 
performance.  Using  the  logic  shown  in  figure  8.  the 
CCEA-I  models  are  being  reviewed  with  the  intent 
to  include  variables  that  may  better  explain  the  year- 
to-year  variability.  This  review  is  the  first  major 
attempt  to  reanalyze  these  initial  CCEA-I  models. 

The  reader  will  recall  that  the  CCEA-I  truncated 
yield  models  were  developed  and  operated  on  the  im- 
plicit assumption  that  the  weather  after  the  time  of 
the  truncation  had  only  a limited  effect  on  the  yield. 
LeDuc,  in  an  unpublished  report  (ref.  17),  investi- 
gated two  alternative  approaches.  She  used  the 
CCEA-I  truncated  yield  model  for  North  Dakota 
spring  wheat  and  made  estimates  in  two  different 
ways;  ( 1 ) by  assuming  normal  weather  after  a trunca- 
tion and  (2)  by  using  the  current  weather  reported  to 
the  month  of  truncation,  then  inputting  historical 
data  (1932-74)  for  the  remainder  of  the  season  to  ob- 
tain a distribution  of  possible  yields.  The  result  of  (2) 


105 


v*u>. 


*UO I 


lb) 


rmo, 

Vlt*  10  \ 

* 

0 


JOS  11.1 


1.1 

• 1 kt  )0.| 


ill 


• uu 


(c) 


m^ianaEaciicnizsEaciEacBra 


FIGURE  l.-Conpirhw  of  LACIE  and  USD  A monthly  wheat 
yield  estimates  in  U.S.  and  U.S.S.R.  for  1976.  (a)  U.S.  Gnat 
Plaint  winter  wheat,  (b)  U.S.S.R.  winter  wheat,  (c)  U.S.S.R. 
sprint  wheat. 
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FIGURE  8.— Logic  flow  for  review  ar.»  revision  of  CCEA-I 
wheat  yield  models. 


FIGURE  9. —Comparison  of  three  methods  for  early-teaton 
estimation  of  wheat  yields,  (a)  April.  <b>  May.  (e)  June. 
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is  the  set  or  histograms  shown  in  figures  9(a),  9(b), 
and  9(c).  Also  indicated  is  the  single  estimate  from 
the  CCEA-I  truncated  model  as  used  in  LACIE  and 
the  estimate  resulting  from  (1).  From  these  figures, 
it  is  evident  that  the  assumption  of  normal  weather 
(NW)  for  the  remaining  crop  season  or  the  mean  of 
the  histogram  E ( Y/j)  may  provide  an  estimate  closer 
to  the  final  observed  yield  (USDA).  This  approach 
needs  additional  investigation  for  operational  use. 

LeDuc  (ref.  18)  has  also  reported  on  a statistical 
phenological  spring  wheat  model  for  North  Dakota, 
which  uses  the  crop  reporting  district  as  a basic  unit 
and  considers  the  crop  calendar  as  well  as  a soil 
moisture  budget  and  heat  stress  term.  Steyaert  et  al. 
(ref.  19)  have  also  developed  a procedure  to  use  at* 
mospheric  pressure  directly  in  large-area  mode'ing. 
Eventually,  other  approaches  may  lead  to  improved 
models  for  areas  where  historical  data  are  not  avail- 
able for  model  development  and  to  improved  ac- 
curacy over  the  time-series  regression  approach. 
However,  until  improved  long-term  weather 
forecasts  are  available,  it  is  unlikely  that  these 
models  can  Mgnificantly  improve  estimates  of  har- 
vested wheat  yields  in  many  of  the  large  wheat- 
growing  regions. 


SUMMARY  AND  CONCLUSIONS 

The  statistical  regression  approach  to  crop  model- 
ing, under  many  conditions,  represents  an  effective 
way  to  achieve  reasonably  accurate  estimates  of 
wheat  yields  for  large  areas  in  several  important 
wheat-producing  regions  of  the  world.  The  opera- 
tional performance  during  the  last  three  growing 
seasons  has  demonstrated  that,  with  the  current  state 
of  the  art,  the  historical  regression  approach  is  a 
feasible  method  to  convert  the  flow  of  meteorologi- 
cal data  available  into  useful  wheat  yield  informa- 
tion. 

The  yield  estimates  have  been  provided  in  a 
timely  manner  at  a low  cost.  Such  estimates  supplied 
regularly  in  an  operational  system  would  help  pro- 
vide needed  information  to  government  planners, 
agribusiness  decisionmakers,  and  farmers.  The  ex- 
perience of  LACIE  has  provided  better  insight  into 
the  problem  areas  that  need  further  work. 
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Growth  Stage  Eatimation 

V.  S,  Whitehead,0  D.  E.  Phinney\h  and  W.  E.  Crva' 


CROP  CALENDAR  MODELING  APPROACH 

« 

Identification  of  wheat  by  the  analyst-interpreters 
requires  that  they  integrate  all  the  knowledge  availa- 
ble to  them  concerning  the  appearance  of  wheat  and 
the  farming  practices  and  natural  events  that  can 
change  that  appearancc.-Onc  of  the  tools  employed  is 
a crop  calendar  that  describes  the  p>  egression  of  the 
crop  through  detectable  and/or  agronomically  signifi- 
cant events  in  its  life  cycle  (i.e.,  planting  date,  date  of 
emergence,  date  of  heading,  etc.).  This,  of  course,  can 
change  from  place  to  place  and  from  year  to  year; 
this  calendar  is  also  a function  of  variety.  Localized 
mean  crop  calendars  for  wheat  and  some  confusion 
crops  were  derived  from  the  data  available  at  the 
beginning  of  LACIE  for  U.S.  areas.  The  local  yrar-to- 
yeer  changes  in  this  crop  calendar  due  to  differences 
in  weather  and  the  normal  crop  calendars  for  foreign 
areas  were  not  so  well  understood,  however. 

Early  in  the  preparation  for  LACIE,  it  became  ap- 
parent that  these  year-to-year  variations  in  the 
seasons  made  the  use  of  localized  normal  crop  calen- 
dars to  aid  in  distinguishing  wheat  from  other  crops  a 
questionable  procedure.  Further,  it  was  recognized 
that  because  wheat  yields  could  be  drastically 
affected  by  unusual  events  at  critical  times  in  its 
development  (c.g.,  hot  temperatures  at  heading), 
yield  models  to  be  developed  would  probably  require 
a good  estimation  of  the  development  stage  of  the 
crop  throughout  the  crop  year  for  the  year  of  interest. 

A literature  search  was  performed  for  candidate 
approaches  to  adjustment  of  the  crop  calendar  to  ac- 
count for  year-to-year  weather  differences.  Three 
candidates  were  identified:  the  heat  unit,  a function 
of  temperature  alone;  the  photothermal  unit,  a func- 
tion of  temperature  and  day  length:  and  the 
Robertson  triquadratir  unit,  a nonlinear  function  of 
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maximum  and  minimum  temperature  and  day 
length  (ref.  I ).  After  comparative  testing  on  an  inde- 
pendent set  of  data,  the  Robertson  model  was  chosen 
as  best  describing  the  rate  of  phonological  develop- 
ment of  wheat  because  ( I ) there  existed  empirical 
and  theoretical  evidence  of  nonlinear  responses  to 
temperature  and  day  length;  (2)  the  number  of 
phases  and  related  interval  lengths  in  the  corre- 
sponding Kale  appeared  pteaningful  and  convenient; 
and  (3)  application  of  the  Robertson  model  to  both 
winter  and  spring  wheat  had  met  with  preliminary 
success  (accuracies  28  to  14  percent  better  than  the 
heat  unit  and  photothermal  unit  models  between 
emergence  and  heading). 

Data  required  to  operate  this  model  are  initiation 
(planting)  date,  duration  of  daylight  (date  and 
latitude  dependent;  and  daily  maximum  and 
minimum  air  temperatures.  Planting  date  can  be 
taken  as  normal  or  modeled;  date  and  latitude  are 
known;  and  maximum  and  minimum  temperatures 
can  be  taken  from  reported  maximum  and  minimum 
values  or  high  anu  low  hourly  values,  or  estimated  by 
use  of  3-  or  6-hourly  synoptic  reports.  Since  no 
equivalent  model  for  winter  wheat  was  available,  a 
contract  was  let  to  Kansas  State  University  to  modify 
the  spring  wheat  model  so  that  it  would  track  the 
development  of  winter  wheat  and  account  for  the 
dormancy  characteristics  of  that  crop. 

The  following  error  sources  and  constraints  in  the 
initial  models  were  recognized. 

1.  Coefficients  were  derived  for  spring  wheat 
varieties  used  in  Canada.  The  applicability  of  these 
coefficients  to  U.S.  and  U.S.S.R.  spring  wheat 
varieties  and  particularly  to  winter  wheat  and  dwarf 
wheat  was  questionable. 

2.  Use  of  normal  planting  dates  could  lead  to  sig- 
nificant errors,  particularly  in  early  season  (before 
and  immediately  after  dormancy  for  winter  wheat). 
Further,  even  these  dates  were  questionable  in  some 
foreign  areas. 

3.  The  period  of  vegetative  growth  before  ver- 
nalization in  winter  wheat  and  the  handling  of  dor- 
mancy posed  definite  problems.  The  initial  model 
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did  not  account  for  vernalization,  and  it  also  showed 
sporadic  development  to  continue  during  warm 
periods  in  midwinter. 

4.  Reports  of  daily  maximum  and  minimum  tem- 
peratures were  not  generally  available  from  foreign 
areas,  nor  were  hourly  reports  generally  available  for 
these  areas.  Techniques  to  estimate  the  daily  tem- 
perature extremes  from  3-  and  6-hourly  synoptic  re- 
ports existed,  but  they  were  not  available  initially. 

It  was  believed,  however,  that  even  these  crude 
early  models  would  be  preferable  to  the  use  of  the 
normal  values  and  that  the  use  of  these  crop  calendar 
models  should  lead  to  their  refinement.  This  did 
prove  to  be  the  case. 


MODEL  FORMULATION 


Initial  Model  Form 

The  adjustable  crop  calendar  ( ACC)  developed  by 
Robertson  (ref.  2)  describes  the  progress  of  spring 
wheat  toward  maturity  as  a function  of  daily  max- 
imum and  minimum  temperatures  and  day  length. 
The  adjustable  crop  calendar,  as  implemented  for 
LACIE  (ref.  3),  is  used  to  calculate  the  daily  incre- 
ment of  development  (DID)  through  six  physiologi- 
cal stages  of  growth.  These  stages  are  tabulated  as 
follows. 


Iki  I'lup  merit  stone  for 
SO  percent  of  imp 

AC  C stone 

Planting 

1.0 

Emergence 

2.0 

Jointing 

3.0 

Heading 

4.0 

Soft  dough 

5.0 

Ripe 

6.0 

A triquadratic  equation  is  used  to  calculate  the  DID 
within  each  stage.  The  DID's  are  accumulated  from 
stage  to  stage. 

The  rate  equation  for  each  stage  may  be  written  as 


0) 


This  may  be  written  more  simply  as 


Each  of  the  increment  development  terms  fj, 
and  V3  is  examined  to  see  if  it  is  negative;  if  negative, 
the  value  of  the  term  is  set  to  zero. 

Since  wheat  responds  differently  to  the  environ- 
ment during  each  physiological  stage  of  growth,  five 
separate  equations  are  required.  The  individual 
regression  coefficients  are  given  in  table  I.  For  the 
techniques  used  to  arrive  at  the  values  of  these 
coefficients,  the  reader  is  referred  to  Robertson's 
original  work. 


Dormancy  Modeling 

The  Robertson  crop  calendar  was  developed  for 
Marquis  spring  wheat  grown  in  Canada.  Systematic 
bias  due  to  varietal  differences  in  maturation  rate 
and  to  large  variation  in  the  length  of  dormancy  was 
observed  when  the  initial  model  was  applied  to 
winter  wheat. 

To  apply  the  ACC  to  winter  wheat,  Feyerherm 
(ref.  4)  developed  modifications  to  reflect  the  effect 
of  dormancy  on  winter  wheat.  Each  DID  from  the 
emergence  to  the  heading  stage  is  multiplied  by  a fac- 
tor calculated  from  the  following  equation. 


M = 0.5684  + 0.02508 1 (ADTJ)  0.0061  W(AAPR ) 

(3) 


The  use  of  normal  average  daily  temperature  for 
January  and  normal  annual  precipitation  was  based 
on  an  observed  systematic  bias  in  the  original  crop 
calendar  from  cold/wet  to  hot/dry  conditions.  The 
ADTJ  term  was  found  to  be  related  to  the  length  and 
severity  of  the  winter  dormancy  period.  The  A A PR 
term  was  used  to  compensate  for  increased  develop- 
ment rate  under  conditions  of  increasing  moisture 
stress. 

This  multiplier  was  derived  for  winter  wheat 
varieties  typically  planted  in  the  U S.  Great  Plains 
during  the  early  1970’s.  For  foreign  areas  where  suffi- 
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Table  I. — Characteristic  Coefficients  Developed  by  Robertson  for  the  Spring  Wheat  Crop  Calendar 


Coefficient 

Development  stane  of  crop 
(a) 

PE 

E-J 

j-u 

ns 

S-R 

c,  - I 

8.413 

10.93 

10.94 

24.38 

«, 

V,  - > 

1.005 

.9256 

1.389 

-1.140 

at 

K ~ > 

0 

-.06025 

-.08191 

0 

44.37 

23.64 

42.65 

42.18 

37.67 

o, 

.01086 

-.003512 

0002958 

.0002458 

.00006733 

-.0002230 

.00005026 

0 

0 

0 

fi 

.009732 

.0003666 

.0005943 

.00003109 

.0003442 

-.0002267 

- ,000004282 

0 

0 

0 

*P-E  — planting  to  emergence;  E-J  ” emergence  lo  jointing:  J-H  - jointing  to  heading;  H-S  ” heading  to  soft  dough; 
and  S-R  “ soft  dough  to  ripe. 


dent  information  is  available  on  varietal  maturities  Af  (late)  = 0 7243  + (0.00%23)/t/J77 

to  relate  them  to  U.S.  varietal  maturity  classes,  addi-  - (0.003536)  A APR  (7) 

tional  adjustment  factors  were  derived  (ref.  4). 


M (early)  = 0.7037  + (0.023445)  ADTJ 

~ (0.006735)  AAPR 

(4) 

M (mid-early)  = 0.7613  + (0.01 8766)  ADTJ 

-(0.00725 \)AAPR 

(5) 

M (mid-late)  = 0.7905  + (0.01 2568)  ADTJ 

-(0.005733 )AAPR 

(6) 

The  equations  may  be  used  for  varieties  similar  to 
those  shown  in  table  II. 

Occasionally,  seasons  arose  in  which  the  model 
showed  jointing  occurring  before  dormancy.  Since 
this  is  physiologically  impossible  for  winter  wheat, 
some  adjustment  was  required.  Feyerherm  suggested 
that  if  the  accumulated  DID's  exceeded  2.85  (3.0  is 
jointing)  on  any  day  before  January  1,  the  accumu- 
lated value  be  reset  to  2.80. 


TABLE  II. — Winter  Wheat  Varieties  Used  to  Define  Maturity  Classes 


Maturity 

ADTJ  < :trp 
Hard  wheals 

Haul  wheats 

ADTJ  > 1 

Soft  u heats 

Early 

Lancer,  Warrier,  Hume 

Triumph  class 

Monon.  Bcnhur.  Knox 

Mid-early 

Nebred,  Winoka,  Winalta 

Scout  class 

Arthur 

Mid-late 

Minter 

Comanche.  Pawnee 

Dual,  Tairlicld 

Late 

Kharkof,  Yogo,  Cheyenne 

Turkey . Kharkof 

Trumbull.  Redcoat 
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Spring  Restart  Model 

As  an  alternative  to  dormancy  modeling,  the 
possibility  of  simply  restarting  the  crop  calendar 
after  dormancy  was  examined  (ref.  5).  The  normal 
end  of  dormancy  for  the  U.S.  winter  wheat  regions 
was  determined  from  climatological  data  by  plotting 
the  mean  monthly  minimum  temperature  against 
midmonth  day  length  at  each  station.  This  effort 
results  in  climagraphs  like  the  one  shown  in  figure  1. 
There  are  two  lines  in  this  figure,  one  that  intersects 
the  fall  portion  of  the  climagraph  and  one  that  inter- 
sects the  spring  portion.  These  lines  represent  the 
beginning  and  end  of  dormancy  as  defined  by  the 
following  criteria. 

1.  When  the  sum  of  the  development  rates  for  the 
last  1$  days  becomes  less  than  0.02  of  a unit,  that  day 
is  said  to  be  the  beginning  of  dormancy. 

2.  When  the  sum  of  the  development  rates  for  the 
last  IS  days  becomes  greater  than  0.10  of  a unit,  that 
day  is  said  to  be  the  end  of  dormancy. 

The  Robertson  model  was  run  with  the  emergence- 
to-jointing  coefficients  from  actual  historical 
emergence  dates  to  obtain  the  geographic  distribu- 
tion of  beginning-  and  end-of-dormancy  dates. 

Climagraphs  were  prepared  for  all  the  synoptic 
weather  stations  in  the  winter  wheat  areas  in  the 
United  States,  the  U.S.S.R.,  and  China.  These  were 
used  in  determining  the  climatic  analogs  and  to 
transfer  the  dormancy  criteria  to  foreign  areas  in  the 
early  Phase  II  crop  calendar  adjustments.  Later  in 
Phase  II.  this  approach  was  replaced  by  the  use  of 
mean  planting  dates  witlf  the  Feyerherm  multipliers, 
as  that  method  was  simpler  to  use  and  provided  simi- 
lar accuracies. 


FltilRK  I.— Da>  length  versus  minimum  temperature 
etlmagraph  for  Dodge  City,  Kansas  (37*45'  N 99*58'  W,  2594-fool 
elevation). 


8prlng  Wheat  Starter  Modal 

In  order  to  use  crop  calendar  models,  knowledge 
of  the  planting  dates  is  required.  Feyerherm  (refs.  4 
and  6)  considered  the  effects  of  temperature  and  pre- 
cipitation on  accumulated  warming/planting  (WP) 
days.  The  general  form  of  the  model  was  as  follows. 


WP  * 0 TA  < 32 

= a (TA  - 32)  (PRE)  32  < TA  < 32  + 1 /a 
= I TA>  32  + 1/a  (8) 


His  study  found  that  for  spring  wheat,  a =*  0.1. 
Tests  of  this  spring  wheat  planting  model  indicated 
no  statistically  significant  precipitation  effect,  and 
PRE  ” 1 was  ultimately  used  for  operations. 

The  date  for  50-percent  planting  of  spring  wheat  is 
estimated  from  a degree-day-type  summation  begin- 
ning on  January  19.  When  the  number  of  accumula- 
ted warming/planting  days  reaches  35.5,  it  is 
assumed  that  50  percent  of  the  crop  has  been 
planted. 

Stuff  and  Phinney  (ref.  7)  developed  an  equation 
for  the  daily  rate  of  spring  wheat  planting  based  on 
temperature,  precipitation,  and  the  normal  planting 
date. 


R = 0.77  + 0.045(r)  0.032(P)  + 0.053(A')  (9) 


Tests  on  independent  data  indicated  that  this  model 
did  as  well  as  but  no  better  than  the  Feyerherm 
model  already  in  use,  and  it  was  dropped  without  im- 
plementation. 


Winter  Wheat  Starter  Model 

Feyerherm  (ref.  6)  confirmed  earlier  studies  con- 
ducted at  the  NASA  Johnson  Space  Center  which 
found  no  agrometeorological  variables  that  showed 
improvement  over  the  use  of  the  normal  fall  planting 
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date  for  winter  wheat.  Consequently,  local  normal 
planting  dates  for  winter  wheat  have  been  used  as  the 
starting  date  for  ACC  model  operation,  it  appears 
that,  with  regard  to  weather  in  the  U.S.  Great  Plains, 
the  farmer  has  a great  deal  of  leeway  in  choosing  the 
optimum  date  for  fall  seeding  and  the  planting  date  is 
driven  primarily  by  other  factors. 


Inclusion  of  Moisture  Variable 

The  effect  of  moisture  variation  on  crop  develop- 
ment was  studied  by  Seeley  et  al.  (See  paper  entitled 
"Prediction  of  Wheat  Phenological  Development — 
A State-of-the-Art  Review.")  The  moisture  was 
treated  indirectly  through  the  use  of  rain-days.  The 
model  is  a triquadratic  form  similar  to  the  Robertson 
model  for  spring  wheat  except  that  the  day-length 
variable  has  been  replaced  by  a moisture  variable. 
This  new  variable  is  based  on  the  mean  frequency  of 
rain-days,  which  is  computed  on  a daily  basis  by 
means  of  a low-pass-filter  function. 


RD.  = RD. 


K0,  ,) 


(10) 


where  RD 

i 


RD. 


K 

RD. 

i 


RD. 


the  weighted  running  mean  value  of 
RD  at  day  i 

the  weighted  running  mean  value 
of  RD  at  day  i I 

an  arbitrary  constant 

I for  a day  with  measurable 

precipitation 

0 for  a day  with  no  measurable 
precipitation 


A value  of  0.1  was  selected  for  A and  used 
throughout  development  of  this  model.  This  allows 
RD,  to  account  for  approximately  95  percent  of  the 
variation  of  RD  in  the  past  30  days. 

The  form  of  the  new  crop  calendar  model  is  as 
follows. 


where  RD  = the  daily  weighted  running  mean 
value  of  the  filter  function  of  the 
rain-day  occurrence 

Results  of  this  approach  may  be  found  in  the 
paper  by  Seeley  et  al.  This  improvement  occurred  too 
late  for  inclusion  in  the  Phase  111  adjustable  crop 
calendar  operational  program. 


Display  of  Crop  Stage  Estimation  Results 

It  was  not  until  the  users  (that  is,  the  analyst- 
interpreters  performing  labeling)  had  the  oppor- 
tunity to  work  with  several  potential  display  formats 
that  this  part  of  the  system  could  be  designed.  Dur- 
ing Phase  1,  the  adjustments  were  made  on  the  crop- 
reporting-district level.  Several  experimental  formats 
were  considered  during  this  period.  The  display 
finally  developed  by  Wilcox  (ref.  8)  employed  a grid 
system.  The  crop  calendar  models  would  be  operated 
at  the  grid  point  nearest  the  data  input  (meteorologi- 
cal station)  location,  and  an  objective  analysis  in- 
terpretive scheme  was  employed  to  extend  the  esti- 
mate from  the  meteorological  stations  to  the  sample 
segments. 


CONCLUSIONS 

Assessment  of  the  ACC  accuracy  over  the  period 
of  LACIE  operation  indicates  that  the  adjustable 
crop  calendars  used  did  provide  more  accurate  infor- 
mation than  would  have  been  available  using  histori- 
cal normals.  The  models  performed  best  under  the 
conditions  from  which  they  were  derived  (Canadian 
spring  wheat)  and  most  poorly  for  the  dwarf 
varieties  and  Southern  Hemisphere  applications. 
Refinements  introduced  into  the  model  during 
LACIE  resulted  in  some  improvement  in  accuracy, 
and  the  supporting  research  and  development  ac- 
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tivities  have  ready  for  use  other  modifications  that 
appear  to  provide  increased  accuracy.  Any  major  im- 
provement in  accuracy,  however,  will  be  dependent 
on  (1)  a reliable  starter  model  and  (2)  a developmen- 
tal data  set  collected  over  a wide  range  of  conditions 
with  the  specific  goal  of  supporting  development  ol 
crop  calendar  models.  Recognition  of  this  improve- 
ment, once  obtained,  would  ••equire  acquisition  of  a 
more  reliable  test  data  set. 
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Accuracy  Assessment:  The  Statistical  Approach  to 
Performance  Evaluation  in  LACIE 

A.  G.  Houston ,a  A.  H.  Feiveson.a  R.  S.  Chhikam,b  and E.  M.  HsiP 


INTRODUCTION 

An  important  function  in  the  LACIE  is  the 
evaluation  of  results  obtained  at  various  stages  of  the 
experiment.  The  objective  of  LACIE  is  not  only  to 
demonstrate  the  technological  feasibility  of  estimat- 
ing large-area  wheat  production  using  the  LACIE  ap- 
proach but  also  to  produce  estimates  which  satisfy 
certain  accuracy  and  reliability  goals.  The  accuracy 
assessment  effort  is  designed  to  check  the  accuracy 
of  the  products  of  the  experimental  operations 
throughout  the  crop  growing  season  and  to  deter- 
mine whether  the  procedures  used  are  adequate  to 
accomplish  the  desired  accuracy  and  reliability  goals. 
These  goals  are  set  out  in  greater  detail  in  the  LACIE 
requirements  documents  (refs.  I to  3). 


Objectives 

The  objectives  addressed  in  the  development  of 
statistical  methodology  for  assessing  LACIE  per- 
formance are  as  follows. 

1.  To  determine  whether  the  accuracy  goal  of  the 
LACIE  estimate  of  wheat  production  for  a region  or 
a country  is  being  met — The  LACIE  accuracy  goal  is 
a “90/90”  criterion  for  at-harvest  wheat  production, 
meaning  that  the  at-harvest  wheat  production  esti- 
mate for  the  region  or  country  should  be  within  10 
percent  of  the  true  production  with  a probability  of  at 
least  0.9. 

2.  To  determine  the  accuracy  and  reliability  of 
early-season  estimates  and  estimates  made  at  regular 
intervals  throughout  the  crop  season  before  har- 
vest—This  objective  includes  a determination  of  the 
degree  to  which  the  90/90  criterion  is  supported  at 
these  intervals  during  the  crop  season. 


“NASA  Johnson  Space  Center,  Houston.  Texas 
'’Lockheed  Electronics  Company  , Houston.  Texas. 


3.  To  investigate  the  various  sources  of  error  in 
the  LACIE  estimates  of  wheat  production,  area,  and 
yield;  to  quantify  and  relate  these  error  sources  to 
causal  elements  in  the  LACIE  estimation  process; 
and  to  recommend  procedures  for  reducing  error 

Such  an  effort  satisfies  the  need  to  provide  timely 
identification  of  major  problem  areas  that  require 
improvement  so  that  LACIE  goals  can  be  met.  Once 
a major  problem  area  is  identified,  it  is  the  respon- 
sibility of  the  accuracy  assessment  function  to  relate 
the  problem  to  causal  elements  in  the  LACIE  estima- 
tion process  and  then  to  make  recommendations  to 
improve  the  technology. 

Most  of  the  accuracy  assessment  investigations 
u re  performed  in  the  U.S.  Great  Plains,  which  is 
called  the  "yardstick”  region.  This  region  was 
selected  because  reliable  independent  estimates  of 
wheat  production,  area,  and  yield  for  the  state  and 
higher  levels  are  available  from  the  U.S.  Department 
of  Agriculture  (USDA)  Statistical  Reporting  Service 
(SRS,  recently  renamed  the  Economics.  Statistics, 
and  Cooperatives  Service)  and  because  ground  obser- 
vations may  be  obtained  at  the  segment  level 
through  the  assistance  of  personnel  in  the  USDA 
Agricultural  Stabilization  and  Conservation  Service 
( ASCS).  However,  the  studies  in  the  yardstick  region 
are  used  to  promote  the  development  of  LACIE  pro- 
cedures for  obtaining  reliable  estimates  for  other 
countries. 


Background 

The  LACIE  was  conducted  in  three  phases. 

Phase  /. — During  LACIE  Phase  I.  wheat  acreage 
in  the  U.S.  Great  Plains  was  estimated.  Yield  and 
production  feasibility  studies  also  were  performed, 
but  the  accuracy  assessment  effort  consisted  of 
evaluating  only  the  acreage  estimation  and  aggrega- 
tion procedures.  The  specific  objectives  for  LACIE 
Phase  l were  (1)  to  develop  consistent  estimators  of 
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the  variance  of  the  LAC1E  acreage  estimates,  (2)  to 
assess  the  sampling  and  classification  components  in 
terms  of  their  respective  contributions  to  the 
variability  of  a large-area  acreage  estimate,  and  (3)  to 
isolate  the  factors  significantly  affecting  the 
classification  performance  and  segment  wheat  pro- 
portion estimation  by  the  Classification  and  Men- 
suration Subsystem  (CAMS). 

The  bias  in  the  L ACIE  acreage  estimate  for  a par- 
ticular region  was  estimated  by  the  difference  be- 
tween the  LAC1E  estimate  and  the  at-harvest  esti- 
mate for  that  region  released  by  the  USDA/SRS  for 
the  1974-75  crop  year.  Separate  classification  and 
sampling  error  components  were  estimated,  the 
former  by  comparisons  of  LACIE  proportion  esti- 
mates with  ground  observations  ob*ained  from  27  in- 
tensive test  sites  (ITS’s)  in  8 states  and  from  30 
ground-observed  segments  (blind  sites)  in  2 states 
(Montana  and  North  Dakota)  and  the  latter  by  com- 
parisons with  county-level  data  from  the  1969  U.S. 
Agricultural  Census.  Several  investigations  were  per- 
formed to  develop  a better  estimator  of  the  variance 
of  the  stratum  acreage  estimate  and  to  study  factors 
important  to  the  processing  of  segment  data  for  sub- 
sequent wheat  proportion  estimation  (refs.  4 and  5). 

Phase  II. — In  Phase  II,  the  accuracy  assessment 
group  continued  to  test  and  evaluate  LACIE  acreage 
estimates  but  expanded  its  efforts  to  include  evalua- 
tion of  yield  and  production  estimates  as  well. 
Methodology  for  assessing  LACIE  performance  in 
terms  of  the  90/90  criterion  and  for  estimating  sam- 
pling and  classification  error  components  was 
developed.  Detailed  error  source  investigations  were 
made  employing  LACIE  proportion  estimates  and 
ground-observed  proportions  for  150  blind  sites  and 
27  US's. 

Estimates  of  the  coefficient  of  variation  (CV)  and 
the  bias  were  used  to  evaluate  the  LACIE  production 
estimate  in  terms  of  the  90/90  criterion  at  the  U.S. 
Great  Plains  level,  and  a sensitivity  analysis  was  per- 
formed to  determine  the  effect  of  various  errors  on 
the  LACIE  production  estimate.  In  the  foreign  area, 
10  ITS's  in  Canada  were  studied  and  evaluated. 

Phase  III. — During  Phase  III,  the  accuracy  assess- 
ment group  continued  to  enlarge  the  scope  of 
detailed  evaluation  of  LACIE  estimates  and  pro- 
cedures over  the  nine-state  yardstick  spring  and 
winter  wheat  region.  Evaluations  were  also  per- 
formed for  the  U.S.S.R.  and  Canada.  The  investiga- 
tions made  in  Phase  III  were  similar  to  those  per- 


formed in  Phase  II.  The  relative  contributions  of 
classification  and  sampling  error  components  were 
assessed  using  212  blind  sites  and  24  ITS's  in  the 
United  States,  30  test  sites  and  10  ITS's  in  Canada, 
and  the  1974  U.S.  Agricultural  Census  data.  In  addi- 
tion, in  support  of  the  development  of  CAMS  Pro- 
cedure 1,  efforts  were  made  to  determine  analyst 
labeling  and  classification  omission  and  commission 
errors. 


STATISTICAL  ANALYSIS  OF  PRODUCTION 
ESTIMATION 


Evaluation  Technique 

A major  part  of  the  accuracy  assessment  effort  is 
devoted  to  determining  whether  the  operational  pro- 
cedures produce  an  estimator  that  meets  the  90/90 
accuracy  goal  of  LACIE.  This  accuracy  criterion  was 
specified  by  the  experimenter  to  derive  technology 
improvements.  To  describe  the  criterion  in  exact 
terms,  it  is  formulated  statistically  as  follows. 

Let  Pbe  the  LACIE  estimate  of  wheat  production 
for  the  region  or  country  and  let  P be  the  true  wheat 
production  of  the  same  region  or  country.  The  ac- 
curacy goal  of  LACIE  is  a 90/90  criterion  for  at- 
harvest  wheat  production,  which  is  defined  by  the 
following  probability  statement. 


Pr  p />(  < O.ipj  > 0.9  (1) 


Equation  (1)  is  a statement  that  the  accuracy  goal  is 
for  the  LACIE  estimate  of  wheat  production  to  be 
within  10  percent  of  the  true  wheat  production  with  a 
probability  of  at  least  90  percent. 

In  LACIE,  estimation  of  acreage,  yield,  and  pro- 
duction is  made  for  large  areas,  using  data  from 
many  sample  segments.  Thus,  it  is  assumed  that  the 
LACIE  estimate  P is  normally  distributed,  with 
mean  (P+  B)  and  variance  o-^-,  where  B is  the  bias 
given  by 

II  = / (/’)  P (2) 
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Accordingly,  the  probability  statement,  equation  (1), 
can  be  expressed 


/> 
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< / i 
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where  Z *■  (P  - (P  + B)]/<rp follows  the  standard 
normal  distribution  N(0,\).  Theparameter  C V(P)  is 
the  coefficient  of  variation  of  P defined  by 


CV(P)  = 


op  op 

i:(f)  p+b 


(4) 


A A 

The  term  RB(P)  is  called  the  “relative  bias”  of  Pand 
is  defined  by 


RB(P)  = 


P(P)  - P m B 
l.(P)  P + B 


(5) 


It  follows  that  the  accuracy  goal  of  LACIE  is  attained 
if 


0.1 


t.iRBtn 


~K 
CV(P) 
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o.i  o.^Rmn 


cv<n 


> 0.‘»0  (6) 


where  4>  represents  the  cumulative  standard  normal 
distribution.  The  area  under  the  curve  in  future  1 
contains  combinations  of  CV(P)  and  RB(P)  for 
which  equation  (6)  is  satisfied. 


Variance  and  Bias  Estimation  for  the  Wheat 
Production  Estimate 

To  apply  the  evaluation  technique  described  in  the 
previous  section,  knowledge  of  the  variance  &p2  and 
the  bias  Sof  the  LACIE  wheat  production  estimate 
'or  a country  or  r.  region  is  required.  Since  values  for 
these  parameters  are  unknown  in  LACIE,  estimates 
have  to  be  obtained.  The  estimation  of  the  produc- 
tion variance  at  different  aggregation  levels  is  de- 
scribed in  detail  in  the  paper  entitled  “Large-Area 


HG  IRE  1.— Relative  bias  cum  oared  to  coefficient  of  variation 
of  LACIE  production  estimator  F.  The  shaded  area  includes  com- 
binations of  RB(A  and  CVtft  that  will  satisfv  the  9(1/90  ac- 
curacy criterion. 


Aggregation  and  Mean-Squared  Prediction  Error 
Estimation  for  LACIE  Yield  and  Production 
Forecasts"  by  Chhikara  and  Feiveson. 

An  estimate  of  bias  can  be  obtained  from  the 
difference  between  a LACIE  production  estimate 
and  the  corresponding  USDA  estimate  of  produc- 
tion, but  this  estimate  is  realistic  only  in  the  United 
States  and  is  based  only  on  a single  sample.  For 
foreign  countries,  the  USDA  Foreign  Agricultural 
Service  (FAS)  makes  periodic  forecasts,  which  are 
generally  for  total  grain  production,  using  ad  hoc 
methods.  Although  FAS  estimates  may  be  used  to 
indicate  a major  problem,  they  cannot  be  used  for  a 
quantitative  assessment  of  bias  in  a LACIE  estimate. 


Th«  90/90  Evaluation 


Given  the  LACIE  estimator  cr£of  the  standard 
deviation  afrof  the  LACIE  production  estimator,  an 
estimate  of  CV(P)  is 


(7) 


The  computation  of  er*is  described  in  detail  in  the 
paper  by  Chhikara  and  Feiveson  and  is  the  result  of 
the  Crop  Assessment  Subsystem  (CAS)  aggregation 
software.  An  estimate  of  RB(r)  is 


A A a 

RB(/’)=£ 

p 


(8) 
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where  0 is  the  difrerence  between  the  LACIE  pro* 
duction  estimate  and  the  corresponding  SRS  esti- 
mate. 

The  distribution  of  the  estimated  valueof  the  left 
side  of  equation  (6)  with£V(/>)  andRB(P)  replaced 
by  their  estimates.  CV(P)  and  RB(P),  respectively, 
has  been  found  to  be  intractable  because  of  problems 
in  obtaining  a joint  .distribution  of  C V(P)  and 
RB(P).  However,  if  C V(P)  > 0.061,  there  is  a fair 
indication  that  the  LACIE  estimate  may  not  satisfy 
the  90/90  criterion  even  if  P is  assumed  unbiased. 
Since  CV(P)  has  been  found  to  be  very  stable  at  the 
country  level  (U.S.  Great  Plains  level  in  the  case  of 
the  United  States)  and  less  than  0.061,  one  can  treat 
CV(P)  as  the  parameter  CV(P)  and  solve  eauation 
(6)  to  determine  the  tolerable  values  of  RB(P)  that 
would  meet  the  90/90  accuracy  goal.  That  is,  given 
CV(p ),  there  exist  real  numbers  Rq  (fy  < 0)  and  0, 
(/?!  >0)  such  that  equation  (6)  is  satisfied  for 

R0  < RB(P)  < 0,  (9) 


Equivalently,  there  exist  corresponding  tolerable 
bias  limits  Bq  and 

0O  « 0 « 0,  (10) 


90/90,  is  rejected  if 


max  11(0*)  < a (12) 


where  a is  a predetermined  significance  level.  If  the 
test  fails  to  reject  W0,  it  is  not  immediately  inferred 
that  the  LACIE  production  estimator  is  a 90/90 
estimator.  (The  test  has  low  power  since  only  one  ob- 
servation is  available  to  estimate  RB(P).)  In  this 
situation,  the  statement  is  made  that  “support  of  the 
90/90  accuracy  goal"  is  indicated;  however,  results 
obtained  from  blind-site  analyses  and  other  accuracy 
assessment  tasks  are  then  considered  for  further 
assessment  of  whether  or  not  the  90/90  criterion  is 
achievable. 


Comparison  of  LACIE  Eatlmataa  With 
Roforoneo  Standards 

The  reference  standards  to  which  the  LACIE  esti- 
mates are  compared  are  the  USD A/SRS  estimates  for 
the  United  States  and  the  FAS  estimates  and/or  offi- 
cial country  estimates  for  foreign  countries.  The 
statistic  used  for  making  these  comparisons  is  the 
relative  difference  (RD)  in  percent  defined  as 
follows. 


where  0fo  - l/fy/(l  - Pq))P  and  0|  — [P|/(l  - 
0,  )]P,  where  Pis  the  actual  production. 

Suppose  next  a null  hypothesis  H0  that  the 
LACIE  production  estimate  is  from  a 90/90  estima- 
tor; Le„  suppose  CV(P)  = Cv(P)  < 0.061  and 
RB(P)  c [0a,0|]  and  hence  Be  To  test  the 

hypothesis  that  H0  is  true,  first  fix  a value  of  0,  say 
B*  e [%0|1,  then  test  the  subhypothesis  B — 0* 
ainst  the  alternative  B £0*.  using  the  statistic  B ” 
— PSRS  and  assuming  0 ~ N(B,*p2).  A “p-valce" 
for  this  test  is  given  by 

Il(0*)  = Pr(|0  0*|  >|  b 0*|  J (11) 


given  & ~A  /V(0*,<y£2),  where  b is  the  observed 
difference,  P — PSRS.  The  overall  hypothesis.  H0:P is 


(UC'E.^ANMRD),  m (1J) 


where  LACIE  stands  for  the  LACIE  estimate  of 
wheat  production,  area,  or  yield  and  STANDARD 
represents  the  corresponding  reference  standard  esti- 
mate. This  definition  expresses  the  difference  be- 
tween the  two  estimates  as  a percentage  of  the 
LACIE  estimate. 

Significance  tests  of  no  difference  are  made  only 
at  the  region  or  country  level  for  the  LACIE  produc- 
tion. area,  and  yield  estimates  of  spring  wheat,  winter 
wheat,  and  total  wheat.  For  a significance  test,  the 
LACIE  estimate  (of  wheat  production,  area,  or 
yield)  is  assumed  to  be  approximately  normally  dis- 
tributed with  unknown  mean  ,t  and  variance 
^acie-  A test  of  the  hypothesis  /i0:m  — STAN- 
DARD against  the  alternative  hypothesis  H4:n  * 
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STANDARD  is  then  made  using  this  assumption. 
The  test  statistic  is  given  by 


LAClt  STANDARD 
°LAt  II 


which,  under  the  null  hypothesis,  is  approximately 
normally  distributed  with  mean  0 and  variance  1. 
The  null  hypothesis  is  rejected  in  favor  of  the  alter- 
native at  the  a level  of  significance  if 


\Z\  > za/2  (15) 


where  :a/i  is  the  (I  - a/2)  critical  point  of  the  stan- 
dard normal  distribution.  For  a ■ 0.10,  :n/2-  1 .645, 
and  if  \Z\  > 1.645,  it  is  concluded  that  the  mean  of 
the  LACIE  estimator  is  significantly  different  from 
the  reference  standard  estimate. 


ERROR  80URCE8  IN  LACIE 

Any  uncertainty  in  a LACIE  prediction  of  wheat 
production  at  the  country  level  is  directly  related  to 
errors  in  wheat  acreage  estimates  and  yield  predic- 
tions at  the  zone  level.  These  errors  are  incurred  in- 
dependently and,  hence,  estimated  accordingly. 

The  yield  prediction  error  is  evaluated  on  the  basis 
of  the  residual  mean  square  error  obtained  by 
regressing  yield  on  weather  data  for  past  years.  The 
error  in  country-level  yield  prediction  is  assessed  by 
taking  into  account  the  variability  with  which 
LACIE  acreage  estimates  are  obtained.  (See  the 
paper  by  Chhikara  and  Feiveson.)  When  extreme 
weather  conditions  prevail,  the  yield  prediction  is 
likely  to  be  biased.  In  addition  to  making  a com- 
parison between  LACIE  yield  estimates  and  the 
reference  standard  estimates,  another  evaluation  is 
made  using  historical  data.  (For  details  of  this 
methodology,  see  the  paper  entitled  “Accuracy  and 
Performance  of  LACIE  Yield  Estimates  in  Muior 
Wheat  Producing  Regions  of  the  World"  by  Phinncy 
et  al.) 

The  acreage  estimate  is  subject  to  both  bias  and 
variability.  Sampling  and  classification  are  the  two 
major  error  components  of  an  acreage  estimation  er- 
ror. Sampling  error  contributes  primarily  to  the 


variance  of  the  acreage  estimate,  whereas  classifica- 
tion error  is  the  main  contributor  to  the  bias  in 
acreage  estimate.  In  general,  estimated  within- 
stratum  variances  are  input  to  the  variance  estimate 
of  a zone  acreage  estimate  and  consist  of  sampling  as 
well  as  classification  variance  components.  The  bias 
is  incurred  at  two  levels:  segment  and  stratum.  The 
segment-level  bias  is  due  to  the  classification  pro- 
cedure that  first  determines  the  small-grains  propor- 
tion in  a segment  and  then  converts  it  to  a wheat  pro- 
portion by  applying  the  stratum-level  historical  ratio 
of  wheat  to  small  grains.  The  stratum-level  bias  in 
the  United  States  is  due  to  the  segment-level  bias  and 
to  the  ratio  estimation  of  wheat  acreage  for  Group  III 
counties.  (For  a definition  of  Group  III  counties,  see 
reference  1 or  the  paper  entitled  "LACIE  Large-Area 
Acreage  Estimation"  by  Chhikara  and  Feiveson.) 

The  error  sources  that  contribute  to  the  prediction 
error  (mean  squared  error)  of  a LACIE  production 
estimate  are  outlined  in  figure  2.  Though  it  is  desira- 
ble to  assess  the  individual  contributions  of  these  er- 
ror sources,  the  complexity  of  their  interaction  and 
lack  of  knowledge  of  true  area  and  yield  make  intrac- 
table the  estimation  of  all  error  components  and  the 
relation  of  the  components  to  the  overall  error. 


Flrst-Ordor  Error  Soureo  Investigations 

First-order  errors  are  those  errors  contributing  to 
the  LACIE  production  estimate  A which  can  be  ap- 
proximately quantified  using  LACIE,  SRS,  histori- 
cal. and  blind-site  data.  (A  “blind  site"  is  a sample 
segment  selected  from  the  set  of  L ACIE-allocated 
segments  for  the  purpose  of  acquiring  ground  obser- 
vations of  the  true  distribution  of  crops.)  The  etror 
in  Adepends  on  its  sources  in  a complex  way:  thus,  it 
is  unrealistic  to  assume  the  total  error  can  be  written 
as  a sum  of  uncorrelated  random  components.  In- 
stead. the  effect  of  each  component  is  assessed  bv 
estimating  the  reduction  in  the  prediction  error  in  A 
achieved  by  removing  that  error.  A major  accuracy 
assessment  effort  is  devoted  to  the  development  of 
statistica'.  methodology  for  estimating  the  acreage 
bias  and  its  error  components. 

Effect  of  sampling,  classification,  ami  yield 
variability  on  the  variance  of  the  production  estimate. — 
The  effect  of  a particular  error  source  is  assessed  by 
determining  the  reduction  in  the  production  variance 
estimate  when  the  error  is  eliminated  from  the  com- 
putation of  the  variance  estimate.  Suppose  is  the 
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PRODUCTION  ERROR 


t'Kil  RK  2.— I.ACIE  ft  r*t -order  error  component*. 


variance  of  ihe  country  production  estimate.  Then, 
as  described  in  the  Chhikara  and  Feiveson  paper. 
trp2  can  be  expressed  as 

£°« 

/ ‘i  »'  > / 


where  tr2  is  the  variance  of  the  Ah  and  yth 
pseudozone  production  estimates.  The  variances 
|«v’l  and  covariances  | tru]  can  further  be  expressed 
in  terms  of  acreage  and  yield  error  components,  and 
both  the  acreage  error  and  the  covariance  term  can 
be  further  subdivided  into  sampling  and  classifica- 
tion error  components  if  the  latter  can  be  estimated 
by  means  of  the  following  procedure. 

To  assess  the  effect  of  an  error  component  on  the 
production  variance.  <r,-  and  <r„  need  to  be  estimated 
with  that  error  component  onvtted  Suppose  sp2. 
SP) sp4',  sps  - . and  sp(  2 are  the  estimates  of  p- 
when  the  error  component  omitted  is  none,  yield. 


acreage,  sampling,  and  classification,  respectively. 
Then  the  ratios  Spr2/Sp2,  Sp42/Sp2,  Sps2/Sp2.  and 
SP(  2ISP2  arc  determined  to  evaluate  the  sensitivity 
of  the  production  variability  to  the  yield,  acreage, 
sampling,  and  classification  error  components, 
respectively. 

Area  error  source  investigations. — Area  error  source 
investigations  consist  of  estimating  bias  at  the 
regional  and  segment  levels  and  determining  con- 
tributions of  sampling  and  classification  error  to  seg- 
ment and  regional  acreage  estimation  variability. 

Estimating  bias  at  the  regional  level.  The  method 
for  estimating  bias  described  in  this  section  is  valid 
for  any  area  having  a sufficient  number  of  blind  sites. 
In  the  accuracy  assessment  of  L ACIE  area  estimates, 
it  is  applied  at  the  state  and  higher  levels. 

The  LACIE  estimate  of  wheat  acreage  A for  a 
given  area  can  be  written 
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where  x,  is  (he  wheal  proportion  estimate  in  the  Ah 
LAC1E  ^vnenu  n is  the  number  of  processed 
LACIE  segments,  and  wt  (/  — 1, ...,«)  is  a known 
weight  based  on  the  si/e  of  the  substratum  in  which 
the  Ah  segment  is  located,  the  number  of  segments  in 
this  substratum,  and  the  historical  data  of  any  Group 
111  substrata  the  wheat  acreages  of  which  are  esti- 
mated by  means  of  the  Group  HI  ratio  involving  this 
substratum. 

Corresponding  to  the  estimate  Jt  is  the  true 
acreage  A , which  may  be  expressed  as 


n 

L "Aj  <I8) 


where  c,is  the  true  wheat  acreage  for  the  substratum 
containing  the  Ah  segmeiy  and  n'is  the  value  of  the 
weight  which  would  give  perfect  Group  III  estimates 
of  wheat  acreage  for  unsampled  areas  using  these  n 
acquired  segments. 

The  wheat  proportion  estimate  for  the  Ah  seg- 
ment can  be  expressed  by  the  identity 


»£(«•,  « 

I M 

(21) 


Note  that  the  first  term,  B,.  represents  a bias  caused 
by  the  lack  of  exactness  of  the  Group  III  ratios  (i.e., 
»,  **,•),  whereas  the  second  term,  B>.  is  the  bias  due 
to  classification. 

The  classification  bias  component  B,  is  estimated 
by 


A 


m 


- £ 

in  *—> 

I I 


(22) 


where  m is  the  number  of  blind  sites  in  the  area  con- 
taining  n • /cessed  segments. 
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(23) 
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where  x,  is  the  true  wheat  proportion  of  the  Ah  seg- 
ment. «( is  the  sampling  error,  and  A,  is  the  classifica- 
tion error.  Since  segments  are  located  randomly  in 
substrata,  the  sampling  is  unbiased  and  £(«,)  - 0. 
However,  unbiased  classification  is  not  assumed  and 


for  the  yth  blind  site,  where  .v,  is  the  ground- 
observed  proportion  of  wheat  for  that  segment.  Since 
the  blind  sites  are  a random  subsampfc,  4 is  an  un- 
biased estimator  of  flj;  i.e.. 


(24) 


°i 


The  variance  of  ^ is 

(20) 


where  0,  is  unknown.  ,a  v - 

The  bias  in  A,  defined  by  E(A  - A),  is  given  oy  u'  /‘ml1  ”)  V*  ,25> 


where 


£ 

(*t 


**7<;(«7  + i,  + b/) 
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This  variance  is  estimated  by  replacing  S2  with  its 
estimate 


ft 


l 

m I 


i v 

m J?x  "/ 


(27) 


An  approximate  90-per-'ent  confidence  interval  for 
8j  is  constructed  by  (fij  - 1.645&,  Bj  + 1.645ft), 
where  ft2  is  the  estimate  of  VarlJSj). 

Reliable  county-level  data  are  not  often  available 
for  estimating  8j.  the  bias  due  to  Group  111  ratio 
estimation.  Agricultural  census  data  at  the  county 
level  are  available  only  every  4 to  5 years,  the  latest 
in  1974.  These  most  recent  census  data  are  used  to 
obtain  the  Group  III  ratio  estimates  in  the  LACIE  ag- 
gregation scheme  and  hence  cannot  be  used  for 
estimating  8).  Therefore,  county-level  SRS  esti- 
mates, which  are  the  only  independent  estimates 
available,  are  used  for  estimating  8|.  It  is  known  that 
tne  SRS  estimates  are  not  very  reliable  at  the  county 
level;  therefore,  the  following  estimate  of  8|  is  ob- 
tained only  at  the  LI.S.  Great  Plains  level  and  is  used 
with  caution. 

Because  current  SRS  county-level  estimates  arc 
not  available  during  the  crop  year,  previous-year 
county-level  SRS  estimates  are  used  to  obtain  the  c, 
in  the  equation 


£ ci(wi 


w. 


7 / 


(28) 


for  each  of  ;he  processed  LACIE  segments  in  the 
U.5.  Great  Plains.  Then,  8)  is  estimated  by 

*1  “£  WttSRS  ^SRS  <»> 

/•I 


where  <f Rs  is  the  SRS  wheat  proportion  for  the  coun- 
ty containing  the  /th  segment  and  ASRS  is  the  SRS 
wheat  area  estimate  for  the  U.S.  Great  Plains.  A relia- 
ble estimate  of  the  variance  of  Bx  is  not  available; 
thus,  for  practical  purposes,  the  bias  due  to  Group  III 
ratio  estimation  is  considered  negligible  if  8,  is  less 
than  2 percent  of  /(SRS 


Estimating  bi  is  at  the  segment  level;  In  this  sec- 
tion, the  statistical  methodology  for  estimating  the 
wheat  proportion  estimation  error  expected  for  the 
C A MS-processed  segments  is  described.  Let  /Vbe  the 
number  of  segments  acquired  in  a region  (state  or 
higher  level)  and  let  » be  the  number  of  blind  sites 
selected  randomly  from  these  N segments.  For  a 
region,  let  i,  represent  the  wheat  proportion  estimate 
in  the  Ah  segment  and  let  x,  represent  the  ground-ob- 
served proportion  of  wheat  in  the  Ah  segment,  where 
/-  1 N.  Then,  the  average  error  M^is  given  by 


The  estimate  of  Mo  is  given  by 


Letting  D,  - x,  - x,,  / « 1 , 2 »,  the  variance 

of  D is  estimated  by 


n)! 

Hi  <”> 


n I 


Lower  and  upper  confidence  limits,  respectively,  for 
the  population  average  difference  n #are  given  by 


't  0/2% 


(33a) 


£♦< 


0/2% 


(33b) 


where  r,  _a/2  is  the  value  of  the  1 - a/2  percentage 
point,  from  the  Student’s  / distribution  with  (»  - 1) 
degrees  of  freedom,  corresponding  to  the  desired 
confidence  level  of  1 — o.  If  m/)>*  inferred  to  be  sig- 
nificantly different  from  zero,  contributions  to  the 
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bias  and  mean  squared  error  (MSE)  due  to  small- 
grains  classification  error  and  wheat-to-small-grains 
ratio  error  are  estimated  (unless  a direct  wheat 
classification  procedure  was  used). 

Let  * and  $,(/  ™ 1,2 n)  be  the  estimates  of  r, 

and  x(,  respectively,  for  the  /th  blind  site,  where  r,-  is 
the  ground-observed  ratio  of  wheat  to  small  grains,  x, 
is  the  ground-obser*  c J small-grains  proportion,  and 
n is  the  number  of  blind  sites.  In  LACIE,  r,  is  the 
forecast  ratio  of  wheat  to  small  grains,  and  x,  is  the 
CAMS  estimate  of  the  small-brains  proportion. 

The  bias  B and  the  MSE  of  the  wheat  proportion 
estimate  obtained  after  ratioing  may  be  estimated  by 


n ' i i i ' 


/ = I 


(34) 


and 


M & -it  k <381 


4.  MSE  estimate  with  no  classification  error: 
/-  I 


These  quantities  are  calculated  at  the  state  and  U.S. 
Great  Plains  levels,  and  a sensitivity  analysis  is  con- 
ducted to  measure  the  effect  of  classification  and 
ratio  error  on  the  bias  and  the  MSE  for  ratioed  wheat 
proportion. 

Contributions  of  sampling  and  classification  error 
to  segment  and  regional  acreage  estimation 
variability:  The  variance  of  the  LACIE  acreage  esti- 
mate A for  a large  area  (e.g.,  a zone)  can  be  written 


MSE 
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It  is  clear  that  both  these  errors  are  caused  by  two 
factors:  the  CAMS  classification  of  small  grains  and 
the  estimated  ratio  of  wheat  to  small  grains.  The  con- 
tribution of  a particular  error  factor  may  be  assessed 
by  the  reduction  in  the  bias  or  the  MSE  which  would 
be  achieved  if  that  error  factor  were  omitted. 
Specifically,  the  following  formulas  are  used  in  this 
study. 

1.  Bias  estimate  with  no  ratioing  error: 


£'  = i Y(r$  rx  ) (36) 

n ill 


i = l 


2.  Bias  estimate  with  no  classification  error: 


B"  =-  yirx  r x ) (37) 

M *—>\i  l ill 

i=  1 


3.  MSE  estimate  with  no  ratioing  error: 


where  <r(  2 is  the  variance  of  the  acreage  estimate  for 
the  /th  substratum  (county)  and  tv,  is  a weight  which 
depends  on  the  size  of  the  substratum,  the  number  of 
segments  in  the  substratum,  etc.  (For  a description 
of  the  estimation  of  V 2,  see  the  paper  entitled 
“LACIE  Sampling  Design”  by  Feiveson  et  al.) 

The  variance  <r,2  represents  a mean  squared 
deviation  between  the  LACIE  estimate  for  the  coun- 
ty wheat  proportion  and  the  true  county  wheat  pro- 
portion. This  variance  is  caused  mainly  by  two  fac- 
tors: sampling  error  and  classification  error. 

It  follows  from  the  assumptions  in  equation  (41) 
that  the  /th  substratum  acreage  error  variance  or,2 
can  be  written  cr,2  — <ri  2 + X2o\2.  where  <r, 2 is  a 
contribution  resulting  from  classification,  and  \2<rs  2 
is  a contribution  caused  by  sampling.  To  determine 
the  effect  of  no  classification  error,  the  variance  of 
the  LACIE  acreage  estimate  will  be  calculated  using 
per 2 instead  of  w,2  where  p is  the  ratio 
(X2crv  2)/(«r(  2 + X2«r,2).  Similarly,  the  effect  of  no 
sampling  error  is  estimated  by  replacing  <r, 2 by  ( 1 - 
p)<r ,2.  In  the  following  discussion,  the  method 
employed  for  estimating  sampling  and  classification 
variances  and  the  function  p is  described. 
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It  wilt  be  assumed  that,  for  some  reasonably  large 
area  (e.g.,  a zone),  the  sampling  and  classification  er- 
rors f ( and  8;  have  the  following  properties. 


ground-truth  segment  proportion— From  the  G-fini- 
tion  of  8(, 


c,  and  5,  arc  uncorrelated 

i i 

r° 

t:(  6 I.V  > A*.V  + 0 

\ l\  I I i 


v(e,r 


It  follows  that 


(41) 


X{  + A*.Y  + 9 


(46) 


(47) 


(48) 


It  is  also  assumed  that  there  is  a linear  model  relating 

the  current-year  county  proportion  C,  to  the  histori-  Writing  A - 1 + A*,  one  obtains 
cal  proportion,  which  will  be  denoted  by  r(;  i.e.. 


C = o + ii:,  + (42) 


where  £({,)  - 0,  !((,)  - <r,,2,  Cov((„e,)  - 
Covlf,,^)  — 0.  and  u and  /3  are  regression  coeffi- 
cients. 

From  the  preceding  assumptions  and  definitions, 
three  basic  regression  models  arc  obtained. 

I.  True  segment  proportion  compared  to  histori- 
cal county  proportion— From  the  definition  of 


f(.vk)-x.v  * 0 (49) 

> (V||v,).  a’-  (48) 


3.  LACIE  segment  proportion  compared  to 
historical  county  proportion— From  equations  (44) 
to  (49), 


.V,  = C + e. 

/ < 1 

4 

i 

= tt  + dr+t+c 
i ( i 

(43) 

It  follows  that 

■ !xv,  • ») 

1 

= X(a  + jk.  j + 

« * “y 

(44) 

r(v,  )■ 

(45) 

t 

■ v ['  ( % 

1 

2 2(2  2 \ 

2.  LACIE  segment  pioportion  compared  to  '!■  + ^ \ >>  + °s  / 


(50) 


(51) 
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As  stated  previously,  it  is  desirable  to  estimate  p — 
(X2<rf  2)/(<rf  2+  X2ors  2).  None  of  the  three  regression 
models  enables  an  estimate  of  <rs2  separately  from 
<rh  2;  i.e.,  one  can  only  estimate  <rs  2 + <rh  2 , not  <ts  2 
alone.  If  current-year  county  proportions  Ct  were 
available,  <rh  2 could  be  estimated;  but,  since  this  is 
not  the  case,  p*  - lX2(o-s2  + <rA2)]/i<rf2  + X2(<r42  + 
<rh2))  will  be  estimated  instead  of  p.  If  <rh2  < < <rs2 
(a  reasonable  assumption),  then  p*  = p. 

The  procedure  for  estimating  p*  is  described  as 
follows.  Suppose  a given  zone  has  m blind-site  seg- 
ments and  n ordinary  (i.e.,  not  blind  site)  segments, 
and  let  the  blind-site  segments  be  numbered  1 to  m.  It 
is  assumed  that  ground-observed  wheat  proportions 
X,,  i *=  1, . . . , m are  available  for  the  blind  sites  and 
LACIE  estimates  Xjt  i — 1, . . . , m + n are  available 
for  all  segments.  It  is  also  assumed  that  historical 
wheat  proportions  Z,.  / = 1, . . . , m + n are  available 
for  the  counties  containing  the  segments.  If  <rh2  < < 
<rs2  so  that  p = p*,  regression  models  1 to  3 are 
applicable. 


The  function 


fl  M) 

/=] 


can  be  expressed 


fl 

1=1 

where  A&ify  is  the  conditional  density  of  ^ given 
X ),  and  g(Xj)  is  the  density  function  of  Xt.  Assuming 
errors  to  be  normally  distributed,  the  likelihood 
function  L can  be  specified  since 


ftt  . ftt  . , W?  4 

n )=  n cxp  - — ij  ft 


XX. 


E^C.ya^Zr  f(x\)=o2;  /=  1, .... m (52) 


= X*,  + »,  f(x(.J.Y.)=  a2,  /=1 m 

E(xi)=0+\a+  \0zf-, 

X2o2  + <J2;  i = m + 1 in  + n 


(53) 


(54) 


and 


exp 
S 


is  (s 


i=l 


(56) 


♦ o.2)VS 


If  there  is  one  segment  per  county,  then  the  errors  e, 
and  8/  are  independent  for  different  values  of  /; 
hence,  the  likelihood  function  of  the  sample  can  be 
written 


m . A , m+n  /*  , 

l = n fa*,)  n '(*,)  (55i 

i=  1 i=m  +1 


where  is  the  joint  density  of  and  $,for  /'  = 

1, . . . , wand  h(X,)  is  the  density  of  x,for  /'  = m+  1, 


X exp 


v \ ( v.  x0  <>  w.-,  y 


(57) 


Letting  Q = —2  log  L and  equating  to  zero  the 
partial  differentials  of  Qwith  respect  to  all  unknown 
parameters,  the  maximum  likelihood  equations  aje 
obtained  for  a,  (3,  9 , X,  ac2,  and  crs2.  If  a,  /»,  0,  X, 
&r2,  and  crs2  represent  the  solutions  of  these  equa- 
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lions,  then  the  invariance  theorem  for  maximum 
likelihood  estimation  can  be  used  to  obtain 


P 


(fry 

S’ 


(5!) 


as  the  maximum  likelihood  estimate  of  p.  The  max- 
imum likelihood  equations  are  nonlinear  but  can  be 
solved  using  numerical  techniques;  e.g.,  Newton's 
method. 

Since  0 is  a complicated  function  of  the  data,  it  is 
impossible  to  write  down  the  variance  of  0 for  finite 
sample  sizes  m and  n.  However,  the  asymptotic 
variance  of  p can  be  estimated  using  the  information 
matrix,  i.e.,  if 


To  estimate  K the  values  bc(L  f£),and  (z,|  and  the 
estimated  parameters  (0,  jt,  9,  a,  §c2  and  &s2)  are 
substituted  into  the  matrix  H — (hy)  — (d2  log 
L)/0  u,  d «/)•  Then,  equation  (60)  is  used  to  obtain 
an  approximate  variance  for  0.  Assuming  that  0, 
which  is  the  ratio  of  the  within-county  sampling 
variance  estimate  to  the  total  within-county  area 
variance  estimate,  also  applies  to  a large  area,  the 
estimated  variances  of  the  regional  area  estimate  due 
to  classification  ($2)  and  sampling  (ft)  are  given  by 


t}2  = (1-p)K2  (62) 

v2  = 002  (63) 


where  V7  denotes  the  estimated  acreage  variance  for 
the  large-area  estimate.  Consequently,  the  estimated 
CV  of  a large-area  estimate  A due  to  classification  is 
given  by 


-E 


-32  log/, 
du.  du. 


(59) 


CV(a\C)  = 


f 


(64) 


and  giu)  is  a differentiable  function  of  the  parameter 

vector  u *=■  i«,  0,  0,  A,  <rc2,  trs2),  then  the  variance  and  that  due  to  sampling  is  given  by 
of  g(u)  is  asymptotic  to 


[V(«)]r  v V(«)  (60) 


A A £ 

CV{A\S)=^r 

A 


(65) 


where  g'(u)  = Og/3«j,  ....  dg/du*)7  and  T 
stands  for  the  transpose  of  a vector  or  matrix.  Thus, 
in  this  case, g(u)  — (X2<ri2)/(X2<r12  + o,c2)  and 


where  cfv(/4|C)  and  CV(/f|S)  are  often  casually 
referred  to  in  L ACIE  as  the  classification  CV  and  the 
sampling  CV,  respectively. 


Second-Order  Error  Source  Investigations 
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A major  effort  is  made  in  LACIE  to  study  the 
sources  of  the  errors  that  influence  the  LACIE  pro- 
duction estimation  to  ascertain  the  accuracy  of  the 
procedures  being  used  and  to  devise  ways  of  improv- 
ing these  procedures. 

Yield  estimation  investigations. — The  purpose  of 
yield  estimation  investigations  is  to  determine  fac- 
tors introducing  errors  into  the  Center  for  Climatic 
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and  Environmental  Assessment  (CCEA)  yield 
model  predictions  at  the  pseudozone  levels.  For  diag- 
nostic purposes,  the  following  plots  are  developed 
for  each  pseudozone  for  each  yield  truncation. 

1.  Precipitation  as  a function  of  time  of  year  for 
the  current  year  and  for  the  3 maximum  and  the  3 
minimum  yield  years,  as  determined  from  the 
historical  data 

2.  Temperature  as  a function  of  time  of  year  for 
the  current  year  and  for  the  3 maximum  and  the  3 
minimum  yield  years 

3.  Means  and  standard  deviations  of  temperature 
and  precipitation  as  a function  of  time  of  year— The 
monthly  temperature  and  precipitation  for  the  cur- 
rent year  are  plotted  on  these  charts  for  diagnostic 
purposes. 

The  following  diagnostic  checks  are  also  made. 

1.  Calculate  sampling  error  by  using  meteorologi- 
cal data  from  the  cooperative  station  network. 

2.  Check  the  data  base  at  the  pseudozone  level  for 
clerical  errors. 

3.  Check  for  episodic  weather  conditions  and 
resultant  impact  on  yield  estimates. 

4.  Evaluate  models: 

a.  Reevaluate  variable  selection  by  adding 
current-year  meteorological  data. 

b.  Perform  latent  root  regression  on  pseudo- 
zone data  to  calculate  the  most  stable  variables  for 
predicting  yield  (without  allowing  for  a trend  term). 

c.  Investigate  trend  term  by  performing  latent 
root  regression  without  allowing  for  trend  and 
calculating  trend  on  the  residuals  from  the  most  sta- 
ble fit. 

During  LACIE  Phases  I,  II,  and  III,  several  in- 
vestigations were  performed  to  evaluate  and  improve 
the  classification  program  for  estimating  segment 
wheat  or  small-grains  proportion.  Analyses  of 
variance  models  were  employed  in  several  of  these 
studies.  (See  references  6 and  7 and  the  paper  by 
Chhikara  and  Feiveson  entitled  “LACIE  Large-Area 
Acreage  Estimation.")  Biostage,  analyst- 
interpreter  (AI),  segment  location,  and  ground- 
observed  wheat  or  small-grains  proportion  were  the 
factors  evaluated  for  their  effect  on  the  variability  of 
segment  wheat  or  small-grains  proportion  estimation 
by  CAMS.  Studies  on  omission  and  commission  er- 
rors in  labeling  of  classes  by  Al’s,  as  well  as  those 
resulting  from  classification  algorithms,  have  also 
been  conducted.  Evaluations  were  often  investiga- 
tive in  nature  and  the  methodology  used  was 
generally  restricted  to  plotting  and  tabulating  data, 
fitting  data  by  regression  to  examine  relationships, 


and  performing  tests  of  significance  for  comparative 
analysis. 

The  possible  sources  of  error  in  the  classification 
of  a segment  for  estimating  its  wheat  or  small-grains 
proportion  are  outlined  in  figure  3.  Most  of  these  fac- 
tors are  causative  and  are  called  second-order  error 
sources.  Some  of  these  sources  contribute  mainly  to 
the  variation  in  the  segment  proportion  estimate, 
some  sources  introduce  bias,  and  others  are  influen- 
tial in  both  respects.  Brief  descriptions  of  a few 
useful  investigations  are  presented  in  the  following 
paragraphs.  For  the  actual  studies  made  and  the 
scope  of  second-order  error  source  evaluations,  see 
reference  6 and  Chhikara  and  Feiveson's  paper. 

Segment-level  accuracy  investigations. — Accuracy 
of  ground-observed  proportions  outlined  by  dot 
counting,  of  CAMS  proportion  estimation,  and  of 
crop  calendars  comprise  the  segment-level  accuracy 
investigations. 

Accuracy  of  ground-observed  proportions  ob- 
tained by  dot  counting:  Two  methods  are  used  to 
determine  the  true  wheat  and  small-grains  propor- 
tions for  blind  sites;  namely,  dot  counting  and  com- 
puter digitization.  The  first  method  gives  wheat  and 
small-grains  proportions  by  evaluating  the  ground- 
observed  labels  of  a subsample  of  400  (or  more)  ran- 
dom dots  from  the  9-  by  11 -kilometer  (5-  by  6-nauti- 
cal  mile)  sample  segment.  This  method  produces 
only  approximate  results.  In  the  second  method,  the 
Bendix  100  system  and  computer  programs  SPATL 
and  MLTCRP  are  used  to  generate  the  wall-to-wall 
digitized  ground-observed  proportions  for  wheat  and 
small  grains.  In  this  task,  the  dot  count  proportions 
are  compared  to  the  wall-to-wall  digitized  propor- 
tions to  determine  the  accuracy  of  proportions  ob- 
tained from  the  first  method.  The  purpose  of  this 
task  is  to  validate  conclusions  about  bias  due  to 
classification  that  may  have  been  made  before  the 


FIGURE  3. — Sources  of  error/variation  in  the  classification  er- 
ror for  estimating  segment  wheat  and  small-grains  proportions. 
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wall-to-wall  ground-observed  proportions  obtained 
using  the  second  method  were  available. 

Accuracy  of  CAMS  proportion  estimation:  For 
the  blind  sites,  the  estimation  error  ( P — P),  where  P 
is  the  ground-observed  wheat  or  small-grains  propor- 
tion and  P is  the  estimate  of  P made  by  CAMS,  is 
determined.  Two  types  of  studies  are  performed  on 
these  errors:  (1)  analysis  of  variance/covariance  is 
done  on  the  absolute  errors  to  investigate  the  effect 
of  various  factors  likely  to  influence  the  classifica- 
tion performance  (e.g.,  AI,  wheat  biostage,  segment 
location,  wheat  proportion)  and  (2)  a linear  regres- 
sion is  performed  for  the  estimation  errors  on  the 
ground-observed  proportions,  biostage,  field  size, 
crop  type,  etc.,  to  explain  the  variability  in  segment 
wheat  proportion  estimation  errors. 

Accuracy  of  crop  calendar:  A major  reference 
used  by  analyst-interpreters  in  their  classification 
procedures  is  the  nominal  (mean  historical)  crop 
calendar  and  the  adjustable  crop  calendar  (ACC). 
Since  the  ACC  provides  the  latest  reference  informa- 
tion on  the  stage  of  development  of  wheat  in  an  area 
being  classified  and  estimated,  it  is  necessary  to 
determine  the  accuracy  of  this  reference  informa- 
tion. 

The  basic  data  set  for  these  evaluations  is  the 
growth-stage  data  acquired  by  USDA/ASCS  person- 
nel from  ITS’s  in  the  United  States.  These  growth- 
stage  data  are  acquired  in  periodic  ground  observa- 
tions of  the  ITS's  over  the  crop  reporting  districts 
(CRD’s). 

Plots  are  made  of  the  ACC  outputs  (for  the  ITS’s), 
the  meun  of  the  ground  observations  of  wheat 
growth  stages,  and  the  nominal  crop  calendar.  Confi- 
dence interval  estimates  are  made  on  the  basis  of  the 
distribution  of  the  ITS  ground-truth  observations, 
and  whether  the  ACC  results  fall  within  these  limits 
is  determined.  The  relationship  of  the  crop  calendar 
information  to  known  episodic  events  of  the  current 
year,  such  as  drought,  is  also  investigated  by  the  ac- 
curacy assessment  group. 

Pixel-level  comparison  investigations. — In  the  pixel- 
level  investigations,  the  ground-truth  data  arc  com- 
pared with  the  Al-labeled  pixel  data  and  with  the 
cluster  and  classification  maps  produced  by  CAMS. 
This  procedure  also  enables  a determination  of  the 
actual  composition  (in  terms  of  ground-truth  classes) 
of  each  pixel,  of  each  cluster  on  the  cluster  map,  and 
of  each  class  on  the  class  map. 

Blind-site  data:  Blind-site  ground  observations 
and  CAMS  data  are  compared  at  the  pixel  level  to 


evaluate  the  omission  and  commission  errors  and 
then  to  develop  a method  of  assessing  the  labeling, 
clustering,  and  classification  performance  in  a quan- 
titative manner.  A ground-truth  data  processing  pro- 
cedure is  used  to  produce  a tape  on  which  the  ground 
observations  are  presented  as  an  image,  similar  to  the 
Landsat  imagery  and  to  the  cluster  and  classification 
maps  generated  by  CAMS.  Details  are  given  in  the 
paper  entitled  “Accuracy  Assessment  System  and 
Operation”  by  Pitts  et  al. 

Each  subclass  in  the  ground-truth  data  has  its  own 
assigned  gray-scale  level  on  the  ground-truth  tape. 
The  subclasses  used  are  shown  in  table  I.  The  image 
on  the  ground-truth  tape  is  registered  to  the  corre- 
sponding Landsat  image.  However,  the  data  on  the 
ground-truth  tape  are  at  a finer  resolution.  There  are 
six  subpixels  on  the  ground-truth  image  for  each  pix- 
el on  the  Landsat  image. 

Analyst-interpreter  dot  (pixel)  labeling  accuracy: 
To  investigate  dot  (pixel)  labeling,  the  composition 
of  each  dot  is  obtained  first.  This  procedure  consists 
of  determining  the  representation  of  the  various 
ground-truth  classes  (table  I)  among  the  six  subpix- 
els for  each  dot  on  the  ground-truth  tape.  Each  dot  is 
then  given  the  label  of  the  subclass  having  the  largest 
representation  among  the  six  subpixels  corre- 
sponding to  that  dot  on  the  ground-truth  tape. 

Each  dot  is  also  given  a class  name  (as  dis- 
tinguished from  its  subclass  name).  The  classes  are 
those  used  by  the  analyst  to  label  the  dots:  spring 
grains  (SG),  spring  wheat  (SW),  winter  grains  (WG), 
winter  wheat  (WW),  grains  (G),  wheat  (W),  other 
(O),  and  a class  denoted  “X”  which  consists  of  dots 
that  fell  on  clouds  or  cloud  shadows  and  therefore 
were  unidentifiable. 

Dot  labeling  accuracy  is  studied  by  estimating  two 
confusion  matrices— one  for  classes  and  one  for 
subclasses.  The  class  confusion  matrix  consists  of  er- 
rors of  omission  and  commission  by  the  AI  and  indi- 
cates the  degree  of  accuracy  of  the  AI  labeling  with 
respect  to  the  eight  classes  mentioned.  The  subclass 
confusion  matrix  of  AI  omission  and  commission  er- 
rors describes  AI  skill  in  labeling  pixels  with  respect 
to  the  subclasses  listed  in  table  I. 

Labeling  accuracy  depends  on  several  factors  (fig. 
3).  The  effect  of  these  factors  is  evaluated  whenever 
feasible  and/or  critical.  Finally,  a study  is  made  to 
determine  whether  the  probability  of  a dot  being  cor- 
rectly labeled  is  higher  if  the  analyst  label  agrees  with 
the  classifier  label  for  that  dot. 

Clustering  accuracy:  Three  aspects  of  clustering 
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Table  l. — Subclasses  Used  in  Accuracy  Assessment 
and  the  Corresponding  Gray-Scale  Levels  on  the 
Ground-Truth  Tape 


Subclass 

Grayscale  lew! 

Fields  1 lo  80 

i to  80 

Alfalfa 

90 

Beans 

91 

Corn 

92 

Safflower 

93 

Sunflower 

94 

Sudan  grass 

95 

Sorghum 

96 

Soybeans 

97 

Sugar  beels 

98 

Winter  wheat 

99 

Spring  wheat 

too 

Barley 

tot 

Rye 

102 

Flax 

103 

Oats 

104 

Grass 

105 

Hay 

106 

Pasture 

107 

Trees 

108 

Same  as  90  to  108  except: 
Harvested 

115  lo  133 

Abandoned 

140  to  158 

Strip  fallow 

165  to  183 

Strip  fallow,  harvested 

190  to  208 

Snip  fallow,  abandoned 

215  to  233 

Water 

240 

Homestead 

250 

Idle  cropland,  stubble 

251 

Idle  cropland,  cover  cwp 

252 

Idle  cropland,  residue 

253 

Idle  cropland,  fallow 

254 

are  studied:  cluster  composition  and  purity,  cluster 
labeling  accuracy,  and  the  duster  confusion  matrix. 
The  data  used  are  blind-site  cluster  maps. 

Cluster  composition  is  the  set  of  percentages  of 
subpixels  in  a given  cluster  that  belong  to  each  of  the 
major  classes.  Class  is  determined  by  comparing  the 
cluster  map  with  the  image  on  the  ground-truth  tape. 
The  major  classes  are  SG,  WG.  G,  O,  and  Y,  where  Y 
consists  of  both  designated  other  (DO)  and  desig- 
nated unidentifiable  (DU)  areas.  The  “purity"  of  a 
cluster  is  the  percentage  of  the  total  number  of  sub- 
pixels in  the  cluster  that  belongs  to  the  class  with  the 
largest  representation.  The  composition  and  purity 
of  clusters  are  of  interest  since  they  indicate  the 
capability  of  the  clustering  algorithm  to  separate  the 
classes  into  relatively  “pure"  clusters.  These  quan- 
tities are  studied  as  a function  of  segment,  stage,  and 
acquisition  history. 


Cluster  labeling  accuracy  is  studied  first  by  assign- 
ing each  cluster  the  name  of  the  class  having  the 
largest  representation  of  subpixels.  The  cluster  is 
assumed  correctly  labeled  if  the  label  given  by  the 
labeling  logic  corresponds  to  this  name.  In  the  case  of 
nearest  neighbor  labeling  logic,  an  incorrect  label 
may  result  from  A1  mislabeling  of  the  dot  used  to 
label  the  cluster  or  from  poor  performance  by  the 
labeling  logic.  If  the  identity  of  the  dots  that  were 
used  to  label  each  cluster  can  be  determined,  these 
two  sources  of  error  are  studied  separately.  Cluster 
labeling  accuracy  is  studied  as  a function  of  cluster 
purity,  segment,  state,  and  acquisition  history. 

Two  confusion  matrices  are  estimated  for 
clusters— a class  confusion  matrix  and  a subclass 
confusion  matrix.  The  clustering  confusion  matrices 
are  evaluated  as  a function  of  segment,  region,  and 
acquisition  history. 

Classification  accuracy:  Classification  perform- 
ance is  studied  by  estimating  the  classification  confu- 
sion matrices  for  both  classes  and  subclasses.  The 
classes  are  SG.  WG,  G,  O,  X,  and  T,  where  T indi- 
cates pixels  which  have  been  thresholded  by  the 
classifier;  subclasses  are  the  same  as  for  dot  labeling 
and  clustering. 

An  important  investigation  is  made  to  determine 
the  effect  of  crop  height  and  ground  cover  on 
classification  accuracy.  In  this  study,  crop  height  and 
ground  cover  data  acquired  every  18  days  for  15 
selected  wheat  fields  in  each  blind  site  are  used.  The 
probability  of  correct  classification  is  computed  for 
each  of  these  fields  and  is  plotted  as  a function  of 
crop  height,  Robertson  biostage,  ground  cover,  and 
“green  number."  Means  and  other  relevant  statistics 
are  calculated  at  the  segment,  state,  and  regional 
levels. 

Intensive  test  site  data:  The  purpose  of  ITS  data  is 
to  determine  the  causes  of  labeling  and  classification 
errors  that  cannot  be  determined  from  blind-site 
data.  For  example,  one  use  for  these  data  is  to  ex- 
amine the  relationship,  if  any,  between  the  labeling 
and  classification  accuracy  and  errors  in  the  adjusta- 
ble crop  calendar.  Studies  of  accuracy  are  made  for 
both  wheat  and  small  grains  if  CAMS  estimates  of 
both  are  available. 

To  evaluate  the  dot  labeling  accuracy,  CAMS  per- 
sonnel analyze  the  imagery  and  attempt  to  determine 
the  dot  labels  by  photointerpretation.  The  labeling 
accuracy  (omission  and  commission)  is  determined 
by  comparison  with  ground-truth  labels  for  each 
classification.  The  "18-day  observation"  fields  in  the 
ITS’s  are  used  to  determine  the  crop  growth  stage 
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and  other  agromet  activities  and  hence  the  cause  of 
mislabeling.  Whenever  a dot  falls  on  an  18-day  obser- 
vation field,  CAMS  investigates  the  ancillary  data 
from  the  18-day  observations  such  as  crop  height, 
ground  cover,  stand  quality,  planting  date,  irrigated 
or  dry  land,  farming  practice,  and  growth  stage  for 
correlation  with  the  Landsat  data.  Each  acquisition  is 
processed  so  that  the  accuracies,  as  a function  of 
growth  stage,  may  be  determined;  CAMS  perform- 
ance analysis  then  determines  the  following  for  the 
ITS  labeling. 

1.  Number  of  ground-truth  wheat  and  other  dots 

2.  Number  and  percentage  of  incorrect  labels,  for 
wheat,  other,  and  total 

3.  Cause  of  error  for  each  dot 

a.  Necessary  acquisitions  missing 

b.  Poor  stands 

c.  Late  planting,  emergence,  or  development 

d.  Strip  fields 

e.  Analyst  error 

f.  Confusion  crops 

g.  Border/edge  pixels 

h.  Unknown  cause 


SUMMARY 

The  methodology  described  in  this  paper  for 
assessing  the  accuracy  of  L ACIE  estimates  illustrates 
the  detailed  and  extensive  evaluations  performed 
during  the  experiment.  This  methodology  was 
necessary  for  validation  of  the  implemented  wheat 
production  forecasting  technology.  As  intended,  it 
has  allowed  the  identification  and  isolation  of  key 
problems  in  wheat  area  and  yield  estimation,  some  of 
which  have  been  corrected  and  some  of  which  re- 
main to  be  resolved. 


The  rnqjor  unresolved  problem  in  accuracy  assess- 
ment is  that  of  precisely  estimating  the  bias  of  the 
LACIE  production  estimator.  This  problem  will  con- 
tinue to  be  an  issue  in  the  United  States  and  more  so 
in  foreign  countries.  In  the  United  States,  reliable 
ground  observations,  like  those  obtained  over  blind 
sites  and  intensive  test  sites  during  LACIE,  can  be 
obtained  for  further  assessment  of  the  bias  in  the 
crop  area  estimation  technology.  In  the  future,  if 
reliable  yield  information  at  the  field  level  can  also 
be  obtained  together  with  the  crop  acreage  informa- 
tion, an  improved  assessment  of  the  bias  in  the  crop 
production  forecasting  technology  can  be  achieved. 
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Manual  Interpretation  of  Landeat  Data 

C.  M.  Hay* 


INTRODUCTION 

The  LACIE  analyst  is  required  to  estimate  the 
proportion  of  small  grains  in  a given  sampling  unit. 
These  sampling  units  are  5*  by  6-nautical-mile  areas 
located  in  accordance  with  a statistical  sampling 
design.  The  estimation  process  requires  that  the 
analyst  interpret  a 1 -percent  sample  of  the  segment 
area  using  both  Landsat  multispectral  scanner  (MSS) 
imagery  data  and  ancillary  data.  Ancillary  data  in- 
clude crop  calendar  summaries,  cropping  practice  re- 
ports, meteorological  data,  and  other  pertinent 
regional  data. 

This  paper  discusses  the  manual  interpretation 
process  that  has  been  developed  within  LACIE. 
Details  regarding  the  role  of  interpretation  in  the 
machine  processing  approach  are  discussed  in  the 
paper  by  Heydorn  et  al.  entitled  “Classification  and 
Mensuration  of  LACIE  Segments,”  and  the  imple- 
mentation of  this  approach  is  discussed  in  the  paper 
by  Abctteen  and  Bizzell  entitled  “The  Classification 
and  Mensuration  Subsystem.” 

As  will  be  pointed  out  subsequently,  the 
methodology  for  the  interpretation  of  Landsat  and 
ancillary  data  for  inventory  purposes  is  in  a state  of 
heuristic  development  that  has  continued  through 
the  3 years  of  LACIE.  With  any  heuristic  t'evelop- 
ment,  concepts  are  formed,  methods  are  developed, 
resulting  problem  areas  are  analyzed,  and  new 
methods  are  proposed.  Such  cycles  have  certainly 
been  experienced  within  LACIE.  This  paper, 
however,  will  not  attempt  to  document  the  progres- 
sion of  thought  that  occurred  throughout  these  3 
years  but  rather  will  discuss  the  fundamental  con- 
cepts that  evolved.  Still,  no  claim  is  made  that  these 
concepts  are  entirely  satisfactory,  and,  in  fact,  prob- 
lems with  the  methods  will  be  addressed  at  the  end 
of  the  paper. 
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HISTORY  OP  MANUAL 
INTERPRETATION  IN  LACIE 


LACIE  PhatM  I and  II 

Throughout  LACIE  Phases  I and  II  (1975  and 
1976),  the  analyst  performed  two  main  tasks.  The 
first  task  was  to  outline  representative  areas  (fields) 
for  all  spectral  classes  within  a segment  on  the  basis 
of  their  appearance  on  the  Landsat  image  product. 
The  spectral  statistics  generated  from  these  areas 
were  used  as  training  for  maximum  likelihood 
classification.  The  second  task  was  to  label  the  crop 
type  (wheat/nonwheat)  within  the  selected  training 
areas.  This  process  of  first  selecting  representative 
training  areas  and  then  labeling  the  crop  type  within 
the  areas  comprised  what  is  called  the  “Fields  Pro- 
cedure.” An  analyst  took  approximately  12  hours  to 
process  a segment  by  the  Fields  Procedure  and  evalu- 
ate and  possibly  rework  ti:e  results.  Half  of  this  time 
was  spent  selecting  and  recording  training  areas;  only 
one-eighth  of  the  time  was  spent  actually  labeling  the 
areas  as  to  crop  type. 


LACIE  PIMM  III 

By  contrast  with  the  Phase  I and  II  procedure,  in 
LACIE  Phase  III  (1977),  a procedure  was  developed 
and  implemented  which  incorporated  clustering  for 
spectral  class  definition  and  training  statistics  genera- 
tion. This  procedure  is  called  Procedure  I.  The 
analyst  was  freed  from  the  time-consuming  task  of 
spectral  class  definition  and  could  concentrate  solely 
on  crop-type  labeling.  A new  within-segment  sam- 
pling strategy  involving  randomly  selected  sample 
dots  (pixels)  was  another  innovation  of  Procedure  I. 
Because  the  anr'vst  now  has  only  to  label  sample 
dots  as  to  crop  type,  his  segment  processing  time  is 
reduced  to  approximately  3 to  4 hours.  In  Phase  III, 
therefore,  the  analyst  had  only  one  main  analysis 
task— crop-type  identification. 
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MANUAL  INTERPRETATION  PROCESS 
Perspective  on  Landtat  Data 

It  must  be  remembered  that  Landsat  data  is  a new 
type  of  data  which  had  not  been  used  extensively 
before  LACIE.  Landsat's  uniqueness,  aside  from  its 
low  resolution,  small  scale,  and  synoptic  coverage,  is 
more  significantly  due  to  the  increased  spectral 
resolution  (compared  to  conventional  photographic 
systems),  its  regular  periodic  coverage,  and  the 
digital  format  of  the  data.  It  is  important  to  recognize 
the  differences  between  Landsat  data  and  conven- 
tional photographic  data.  Although  manual  image  in- 
terpretation procedures  that  were  developed  with 
and  for  conventional  photographic  imagery  still  have 
relevance  to  image-formatted  Landsat  data,  they 
need  to  be  modified  and  restated  within  the  Landsat 
context.  Furthermore,  Landsat  data  provide  addi- 
tional information  not  obtainable  from  photographic 
data.  This  additional  information  is  a function  of  the 
digital  temporal-spectral  response  data  from 
relatively  narrow-band  (compared  to  photographic 
data)  multispectra!  sensors. 

Tit*  Analysis  Pro****:  Th*  "Art  of 
Probabilities” 

Crop  (feature)  identification  from  Landsat  (or 
any  other  remotely  sensed)  data  and  ancillary  data  is 
basically  the  “art  of  dealing  in  probabilities”  (ref.  1). 
An  analyst  must  (1)  collect  information  from  the 
Landsat  data  about  the  characteristics  of  a feature, 
(2)  factor  in  additional  evidence  from  a priori 
knowledge  and  ancillary  data,  (3)  judge  the  impor- 
tance and  relevance  of  all  the  evidence,  (4)  formulate 
several  reasonable  working  hypotheses,  (S)  test  these 
hypotheses  against  the  evidence,  (6)  select  the  most 
probable  conclusion,  and  (7)  judge  the  degree  of 
probable  correctness  of  this  conclusion.  In  LACIE, 
the  most  probable  conclusion  is  recorded  as  a crop- 
type  label  for  a given  pixel. 

In  simple  terms,  the  interpretation  process  (called 
labeling  in  LACIE)  consists  of  two  main  compo- 
nents: (I)  feature  detection  and  physical  charac- 
teristics determination  and  (2)  feature  evaluation  in- 
cluding identification  and  condition  assessment, 
Although  these  processes  may  occur  simultaneously 
and  iteratively,  they  can  be  treated  separately  for  the 
purpose  of  explanation.  Feature  detection  is  the  action 
of  discriminating  a unique  feature  (a  field  in  the 
LACIE  case)  based  on  spectral,  spatial,  and  temporal 


characteristics  observable  within  Landsat  multitcm- 
poral-spectral  data.  Feature  characteristics  evaluation 
is  the  process  of  assessing  the  available  data  by 
analytical  means  and  then  synthesizing  the  pertinent 
data  for  the  purpose  of  concluding  the  feature's  iden- 
tity and  condition  state.  Feature  identification  is  the 
action  of  assigning  a name  (e.g.,  wheat,  nonwheat)  to 
the  detected  feature.  Feature  condition  assessment  is 
the  more  refined  identification  of  a feature  such  that 
some  quality  or  slate  (e.g.,  late  developing,  poor 
stand,  harvested)  is  ascribed  to  the  feature.  Correct 
feature  identification  cannot  proceed  properly  unless 
feature  detection  has  occurred.  Feature  detection, 
however,  does  not  ensure  feature  identification. 
Thus,  errors  in  labeling  may  result  from  either  (1) 
failure  to  detect  a feature  of  interest  or  (2)  failure  to 
identify  correctly  a detected  feature. 


Feature  Detection  and 
Characteristics  Determination 

Within  LACIE,  the  features  that  an  analyst  wishes 
to  detect  are  cropped  fields.  The  feature  charac- 
teristics that  an  analyst  must  determine  are  (I)  the 
size  and  shape,  the  type  of  boundary  elements,  and 
the  spatial  relationships  of  similarly  anu  dissimilarly 
appearing  features;  (2)  the  temporal-spectral  pat- 
terns throughout  the  growing  season  for  a given 
region;  and  (3)  the  magnitudes  of  the  actual  spectral 
values  within  specific  time  periods  corresponding  to 
given  crop-type  biostages  (spectral-temporal  charac- 
teristics). Of  these  three  characteristics,  the  second  is 
the  most  important  to  the  analyst  for  the  detection 
and  identification  of  wheat  or  any  other  crop.  The 
other  two  characteristics  are  necessary  when  signifi- 
cant overlap  exists  between  wheat  and  confusion 
crops,  when  key  acquisitions  are  missing  or  of  poor 
quality,  and/or  when  any  ambiguity  exists  in  the 
data.  Obviously,  the  probability  of  correctly  identify- 
ing a crop  within  a given  field  will  be  low  if  a spectral 
response  indicating  vegetation  canopy  cannot  be 
detected  during  any  one  portion  of  the  growing 
season  or  during  particularly  significant  vegetation 
biophases  specific  to  given  crop  types. 


Feeture  Characteristics  Evaluation 
for  Crop  Identification 

Although  site-specific  Landsat  data  enable  an 
analyst  to  detect  a feature  and  determine  its  physical 
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characteristics,  ancillary  data  and  a priori  knowledge 
from  outside  the  Landsat  data  are  necessary  for  the 
analyst  to  identify  and  label  a detected  feature.  That 
is,  nowhere  will  one  find  the  word  "wheatfield"  writ- 
ten across  a Held  as  observed  on  Landsat  data.  A 
priori  knowledge  and  ancillary  data  supply  informa- 
tion about  what  crops  are  grown  within  a region,  the 
rate  and  timing  of  canopy  development  for  specific 
crops,  cropping  and  cultivation  practices  employed 
within  a region  or  specific  to  a given  crop  type,  the 
characteristic  appearances  of  given  features  on  Land- 
sat data.  etc. 

A priori  knowledge. — An  analyst  gains  a priori 
knowledge  from  past  experiences,  educational  back- 
ground, specialized  training,  and  specially  prepared 
interpretation  keys.  For  example,  many  LACIE 
analysts  were  geographers  who  had  studied  as  part  of 
their  formal  education  (1)  interrelationships  be- 
tween land  use  and  cultural  patterns  and  (2)  physical 
environmental  parameters  such  as  climate  and  soils. 
Since  some  analysts  grew  up  on  a farm,  they  had  a 
firsthand  understanding  of  agricultural  cropping 
practices.  Still  others  had  many  years  of  photoin- 
terpretation experience  upon  which  to  draw.  With 
such  varied  backgrounds,  there  was  a need  to  bring 
all  the  analysts  to  some  minimum  level  of  common 
experience.  This,  before  the  ft art  of  LACIE,  the 
analysts  attended  an  intensive  specialized  training 
course  in  Landsat  data  analysis,  wheat  physiological 
development,  and  regional  cropping  practices  rela- 
tive to  small  grains. 

Ancillary  data. — In  addition  to  the  a priori  base  in- 
formation additional  information  is  required  for 
crop-type  identification.  This  additional  information 
is  by  convention  called  ancillary  data  in  that  it  con- 
sists of  deta  different  from  and  outside  the  site-  and 
date-spt^fic  Landsat  spectral  data. 

Types  of  ancillary  data  which  have  been  recog- 
nized as  being  necessary  for  crop-type  identification 
are  (1)  crop  calendar  information  including  average- 
normal  and  year-specific  data;  (2)  historical  crop 
proportions  for  several  recent  years;  (3)  regional 
cropping  practice  information  such  as  crop  rotation, 
cultivation  practices,  and  irrigation  practices;  and  (4) 
occurrence  of  meteorological  events  affecting  crop 
development  and/or  crop  spectral  response.  Each  of 
these  data  types  should  contain  quantitative  descrip- 
tions of  mean  normal  conditions,  as  well  as  the 
variability  that  can  be  expected  about  the  normal. 
The  variability  data  should  include  temporal 
variability  (year  to  year)  as  well  as  spatial  variability. 

Crop  calendar  data.— Crop  calendar  data  compose 


the  one  most  Important  type  of  ancillary  data  for  crop 
identification  from  Landsat  data  and  are  important 
to  all  aspects  of  the  crop  identification  task.  First, 
before  any  imagery  is  acquired,  analysis  of  crop 
calendar  data  will  determine  the  time  periods  during 
which  data  should  be  collected  for  the  particular  crop 
or  crops  of  interest.  Second,  crop  calendar  data  in 
conjunction  with  historical  crop  proportion  informa- 
tion enable  an  analyst  to  predict  possible  confusion 
crops  and  thus  assess  the  need  for  additional  confu- 
sion crop  separability  information.  Third,  crop  calen- 
dar data  serve  to  set  initial  expectations  of  temporal- 
spectral  response  patterns  for  the  major  crops.  Crop 
calendar  data  contain  information  about  the  time 
periods  during  an  average  year  when  significant 
stages  in  the  cultivation  and  development  of  a given 
crop  can  be  expected  to  occur.  Planting  and  harvest- 
ing dates  are  often  given,  and  information  about  the 
timing  of  other  intermediate  phases  such  as  seedbed 
preparation,  crop  emergence,  heading,  or  flowering  is 
frequently  presented. 

Crop  calendar  data  available  to  LACIE  analysts 
include  average-normal  year  data  for  all  major  crops 
within  an  area  and  a year-specific  adjusted  wheat 
crop  calendar.  Figure  1 is  an  example  of  an  average- 
normal  year  crop  calendar.  In  this  example,  the  per- 
centage of  the  given  area  undergoing  an  indicated 
development  stage  on  specified  dates  is  indicated  Tor 
each  major  crop.  Figure  2 is  an  example  of  the  year- 
specific  adjusted  wheat  crop  calendar  information. 

Relying  on  past  experience  and  a limited  number 
of  spectral-response-to-ground-data  correlations,  the 
analyst  must  translate  ground  crop  calendar  data  into 
expected  spectral  responses  and  image  charac- 
teristics. For  example,  from  field  experience,  the 
analvst  knows  that  a healthy  small-grains  crop  in  the 
headed  biostage  has  a 90-  to  100-percent  canopy 
cover.  The  major  part  of  the  spectral  response  from 
such  a field  will  be  from  the  vegetation  canopy;  there 
will  be  very  little  response  from  underlying  soil  or 
surface  litter.  Therefore,  the  analyst  expects  high  in- 
frared reflectance  and  low  red  reflectance  (because  of 
chlorophyll  absorption)  from  this  field.  On  a color- 
infrared  (CIR)  image,  such  a field  should  display  an 
intense  red  color.  On  the  other  hand,  for  earlier 
stages  of  development,  such  as  emergence,  the  soil 
background  reflectance  would  contribute  more  to  the 
overall  spectral  response  from  the  field.  Thus,  a less 
intense  red  color  would  appear  on  the  CIR  image. 

Historical  agricultural  statistics. — A significant  in- 
put for  crop  identification  is  the  most  recent  year's 
crop  acreage  statistics  for  the  specific  region.  From 
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CROP  CALENDARS  PLOTTED  JAN  1. 1970; 

PERCENT  OF  AREA  IN  DEVELOPMENT  STAGE  BY  SPECIFIED  DATE  FOR 
NORTH  OAKOTA.  1902  „ 1904;  AVERAGE  CROP  CALENDARS 


these  data,  initial  probabilities  or  occurrence  of  the 
principal  crop  types  and  possible  confusion  crops  can 
be  set. 

The  historical  agricultural  statistics  are  an  indica- 
tion of  the  expected  percentage  of  possible  confusion 
crops  (confusion  crops  predicted  from  calendar  data) 
present  within  an  area  relative  to  the  primary  crop  or 
crops  of  interest.  In  western  Kansas,  for  example, 
other  small  grains  (namely,  barley,  rye,  and  oats)  are 
always  possible  confusion  crops  for  wheat. 
Reference  to  published  historical  agricultural 
statistics,  however,  indicates  that  the  combined  per- 
centage of  cropland  occupied  by  these  three  crops 
was  approximately  1 percent.  By  comparing  these 
percentages  with  the  approximately  45  percent  of 
cropland  devoted  to  wheat,  the  analyst  can  have  con- 
fidence that  his  wheat  identifications  will,  in  general, 
contain  a very  low  (less  than  2 percent)  commission 
error  (identifying  these  other  small-grains  confusion 
crops  as  wheat).  Figure  3 is  an  example  of  the 


agricultural  historical  statistics  provided  to  LACIE 
analysts. 

Cropping  practice  and  environmental  relationships. — 
Relationships  between  the  physical  environment 
and  the  presence  of  certain  crop  types  can  often  be 
used  effectively  in  crop  identification.  Physical 
parameters  that  exert  significant  influence  on  what 
crops  will  be  planted  within  a region  are  climate,  soil 
type,  and  availability  of  irrigation  water.  For  exam- 
ple, on  the  dry-farmed  sandy-loam  and  loamy-flne- 
sar.d  soils  of  western  Kansas,  sorghum  is  planted 
more  often  than  wheat.  Sorghum,  which  is  more 
tolerant  of  water  stress  than  wheat,  grows  well  on 
these  highly  permeable,  low-water-holding-capacity 
soils.  However,  if  irrigation  is  available  and  if  there  is 
supporting  evidence  that  wheat  is  irrigated  within 
this  region,  the  possible  proportion  of  wheat  within 
these  soil  types  must  be  expected  to  be  equivalent  to 
the  proportions  found  on  loam  soils  of  the  area.  Ir- 
rigation also  leads  to  more  crop  diversity  within  an 
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FIGURE  2.— Example  of  ynr-opeclflc  wheat  crop  calcMlar  »<iu*tm*nt  Sola. 


area  than  will  he  found  in  dry-farmed  area*.  Thus, 
the  possibility  of  confusion  crops  is  greater  than  that 
in  an  adjacent,  less  diversified  area.  An  example  of 
one  type  of  cropping  practice  information  available 


to  l.ACIE  analysts  is  shown  in  figure  4. 

Full-frame  Landsai  Imagery. — The  usefulness  a- 
the  ancillary  data  described  previously  can  be  signifi 
cantly  enhanced  when  the  data  are  analyzed  in  con- 


lTv  OF  t«E 
iS  rous 


135 


County/State  CASS  CO. , N.  DAKOTA 


Land  Use 

Acreage 

(Most  Recent  Year) 

Percent  of  Total 
County  Area 

A.  Total  County  Area 

1,119,296 

B.  Total  Cropland 

947,051 

84.6 

1.  Cropland  Harvested 

803,104 

71.8 

2,  Cropland  Pastured 

34,123 

3.0 

3.  All  Other  Cropland 

109,824 

9.8 

C.  Woodlands,  Woodland  Pasture 

1,184 

0.1 

D.  All  Other  Land 

82,157 

7.3 

E.  Average  Field  Size  (Acres) 

F.  Wheat  (%  Total  of  County  Area) 


Conservation  Practices  (Most  Recent  Year) 

A.  Irrigated  Land  (%  County  Area)  

B.  Contour  (Acres) ; Strip  (Acres)  ; Terrace  (Acres) 


Crops  and  Agricultural  Lands  (Absolute  Acreage  and  % of  Total  County  Area) 


Crops 

Most  Recent 

Year  75 

Second  Most  Recent 

Year  74 

(Include  All) 

Acreage 

Percent 

Percent 

Irrigation 

Acreage 

Percent 

Percent 

Irrigation 

Wheat,  All 

460,700 

41.2 

447,800 

40,0 

Durum 

65,100 

5.8 

45,300 

4.0 

O.S. Wheat 

394,500 

35.2 

401,000 

35.8 

W.  Wheat 

1,100 

0.1 

1,500 

0.1 

Barley 

178,600 

16.0 

129,100 

11.5 

Rye 

3,200 

0.3 

2,900 

0.3 

Oats 

19,000 

1.7 

20,200 

1.8 

Flaxseed 

18,100 

1.6 

20,200 

1.8 

Soybeans 

87,500 

7.8 

89,300 

8.0 

Sugarbeets 

17,000 

1.5 

18,900 

1.7 

Sunflowers 

71,800 

6.4 

83,500 

7.5 

All  Hay 

33,100 

3.0 

33,600 

3.0 

Alfalfa  Hay 

22,100 

2.0 

21,600 

1.9 

FIGURE  3. — Example  of  historical  agricultural  statistics. 


cert  with  full-frame  Landsat  imagery.  One  factor  that 
limits  the  usefulness  of  historical  agricultural  statisti- 
cal data  and  environmental  relationship  data  is  the 
lack  of  spatial  variability  information.  Two  questions 
arise:  how  do  the  crop  proportions  reported  vary 
throughout  the  reporting  unit  area,  and  what  signifi- 


cance does  this  variability  have  for  crop-type  iden- 
tification within  the  area?  Full-frame  Landsat  imag- 
ery (fig.  S)  clearly  displays  the  distribution  of  major 
land  use  types  (cropland,  rangeland,  etc.),  as  well  as 
minor  land  use  differences  that  may  affect  local 
(within-segment)  crop  proportion  mixes. 
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Identification  « US- in 


UNIVERSAL  .STRATA  DESCRIPTORS 


Segment  # ISIS 
Strata  Overlay 
File  Location 


Country;  United  States 

State,  CRD:  North  Dakota.. CRD's  3,6,9 
Minnesota.  CRD's  1,4 

Full  Frame  Landsat  Imagery  Numbers: 

NO:  1398,  I39&,  146.!,  1464,  1473.  1482,  1S6S,  1584 

1586,  1618,  1619.  1621,  1624,  1641,  1642,  1645 

MN:  1514.  151,5.  1518.  1519.  1521,  18  1841 

AGRICULTURAL  LAND  USE  - General  Description 

Important  cash  crops  are  spring  wheat,  potatoes,  sugar  beets,  and  soybeans. 
Legume  seed  are  widely  important  in  the  northeast  part  of  the  stratum  Fallow- 
ing is  practiced  mainly  for  weed  control  and  for  accumulating  nitrogen  Sweet 
clover  (green  manure)  is  grown  widely  for  soil  improvement, 


Ag/Non-Ag  Overlay  File  Location 

DSF Texture  (High/l.ow) 

Overlay File  Location 

Field  Si;e: 

CLIMATE  - General  Desc r ijyn on 

The  climate  of  thi«  stratum  i«  continental.  The  summer  temperatures  arc 
generally  comfortable  with  very  few  days  of  hot  and  humid  weather  Mights,  with 
few  exceptions,  are  comfortably  cool.  The  winter  months  arc  cold  and  dry,  with 
maximum  temperatures  rising  above  fnv;ing  only  on  an  average  of  o days  each 
month,  and  nighttime  lows  dropping  below  :ero  approximately  half  of  the  time. 
(See  representative  stations.! 

Precipitation  is  the  most  important  climatic  factor  in  the  area.  The  Red 
River  Valley  lies  in  an  area  where  lighter  amounts  fall  to  the  west  and  heavier 
amounts  to  the  east.  Seventy-five  percent  ITS**)  of  the  precipitation  occurs 
during  the  growing  season  (April  to  September)  and  is  often  accompanied  by 
electrical  storms  and  heavy  falls  in  a short  time.  Winter  precipitation  is 
light  and  indicates  that  heavy  snowfall  is  the  exception  rather  than  the  rule. 
The  first  light  snow  in  the  full  occasionally  falls  in  September,  but  usually 
very  little,  if  any,  occurs  until  October  or  November.  The  latest  fall  is 
generally  in  April.  (See  representative  stations.) 

With  the  flat  terrain,  surface  friction  has  little  effect  on  the  wind  in 
the  area,  and  this  fact  has  led  to  the  legendary  Dakota  bliuards.  Strong  winds 
with  even  light  snowfall  cause  much  drifting  and  blowing  snow,  reducing  visibil- 
ities to  near  zero.  Fortunately,  these  conditions  occur  only  several  times 
during  the  winter  months. 


FIGURE  4.— Example  of  cropping  practice  Information. 
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IK.l  KK  5 -Full-fr.m,-  Undsat  dal.  for  p.rl  of  rotrm  North  Dakota.  The  small  outlined  are.  Is  the  se*ment  discussed  in  the 
multilcmporal  data  analysis  example  (Aug.  II.  1075;  I andsat  frame  £-2201-6421). 


Another  situation  in  which  full-frame  Landsat  im- 
agery is  useful  is  where  cultivated  land  is  thinly  in- 
terspersed among  rangeland  and  other  wildland 
areas  During  certain  wheat  biophases,  it  is  difficult 
to  distinguish  some  wheutlields  front  native  grass- 
land range  When  only  a small  area  (such  as  a sample 
segment)  is  available  for  viewing  and  only  less-lhan- 
optimum  temporal-spectral  data  are  available,  a 
number  of  misidcntificalions  may  occur.  Upon 
reference  to  'ull-l'rame  imagery  for  the  area, 
however,  the  analyst  can  obtain  a better  appreciation 
for  the  distribution  of  grassland  range  within  the 
sample  segment  In  addition,  the  evaluation  of  at- 


mospheric effects  in  segment-si/ed  areas  is  difficult, 
and  lull-frame  data  have  been  found  helpful  here, 
also. 


Multitemporal  Data  Analysis 
tor  Crop  Identification 

As  stated  previously  in  the  section  entitled 
feature  Detection  and  Characteristics  Determina- 
tion." the  feature’s  temporal-spectral  pattern 
throughout  the  grim  mg  season  is  the  most  important 
characteristic  for  detection  and  identification  of  crop 
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RiTRonurmii  ity  of  this 

ORIGINAL  PAG  IS  IS  Poor 


type.  The  procedure  used  to  evaluate  this  particular 
feature  characteristic  is  called  multitemporal  data 
analysis.  Multitemporal  data  analysis  is  based  on  the 
assumption  that,  within  a specific  region,  a given 
crop  type  or  group  of  crops  has  a temporal-spectral 
development  pattern  that  is  relatively  unique. 
Therefore,  by  monitoring  the  spectral  changes  within 
a field  throughout  the  grow  ng  season,  an  analyst  can 
identify  with  a high  degree  of  accuracy  the  crop  or 
crop  group  grown  within  a field. 

To  demonstrate  the  principle  of  multitemporal 
data  analysis  for  crop-type  identification,  a "walk- 
through" of  the  analysis  of  a segment  in  southeastern 
North  Dakota  will  be  presented.  The  previously  pre- 
sented ancillary  data  examples  will  be  referred  to 
since  they  are  applicable  to  this  segment. 

First,  from  a priori  knowledge  and  ancillary  data, 
a conceptual  spectral  crop  calendar  for  the  area  must 
be  developed.  The  spectral  crop  calendar  describes 
the  expected  Landsat  temporal-spectral  response  pat- 
tern for  various  crop  types  found  within  the  area  of 
interest.  Figure  6 is  a graphic  illustration  of  one  way 
of  portraying  such  a spectral  crop  calendar  for  North 
Dakota.  In  this  particular  spectral  crop  calendar,  the 
ratio  of  2 times  MSS  band  7 divided  by  MSS  band  5 
was  used  as  a green  vegetation  indicator  to  portray 
the  Landsat  spectral-temporal  patterns  correspond- 
ing to  crop  canopy  and  phenological  development 
through  time.  Spectral  crop  calendars  are  not  cur- 
rently available  to  LACIE  analysts  directly. 
However,  it  must  be  realised  that  every  analyst  con- 
sciously or  unconsciously  carries  the  concept  of  a 
spectral  crop  calendar  within  him.  Therefore,  to 
facilitate  the  discussion  of  multitemporal  data 
analysis,  part  of  the  spectral  crop  calendar  for  North 
Dakota  has  been  presented  in  concrete  form. 

From  figure  6,  one  can  see  that  the  maximum  and 
minimum  Landsat  vegetation  indicator  values  occur 
at  different  times  for  each  of  the  four  crop  types 
shown.  It  is  these  temporal-spectral  differences  that 
will  enable  the  identification  of  crop  types  within 
given  fields.  Notice  that  the  temporal-spectral 
differences  between  some  crop  groups  such  as  small 
grains  (wheat  and  barley)  and  large  grains  (corn)  are 
quite  pronounced.  Therefore,  there  is  little  risk  of 
confusing  these  two  crop  groups  within  this  region, 
given  moderately  adequate  timing  of  Landsat  ac- 
quisitions. Thus,  small  grains  (wheat,  barley,  oats, 
rye)  as  a group  are  expected  to  be  fairly  consistently 
identifiable  within  North  Dakota. 

The  temporal-spectral  differences  between  closely 
related  crops,  however,  such  as  between  small -grains 


crops  (wheat  versus  barley),  are  seen  to  be  more  sub- 
tle. Precise  timing  of  Landsat  acquisitions  will  be 
critical  to  the  separation  of  these  two  crops. 

Some  Landsat  imagery  will  now  be  examined  to 
see  how  this  information  can  be  used  for  crop-type 
identification.  Figure  7 is  a multitemporal  sequence 
of  Landsat  imagery.  The  image  product  displayed 
here  is  known  within  LACIE  as  Product  1.  Relative 
feature  temporal-spectral  response  characteristics  are 
determinable  from  this  image  product.  High  near-in- 
frared response  coupled  with  low  visual-red  response 
and  relatively  low  to  medium  visual-green  response, 
as  is  typical  of  green  or  actively  metabolizing  vegeta- 
tion, appears  red  on  the  CIR  composite.  Relatively 
equal  response  in  all  three  bands,  as  may  be  given  by 
bare  soil  fields,  appears  as  various  shades  of  gray,  de- 
pending on  overall  total  reflectance. 

From  past  experience  and  the  ancillary  data,  the 
image  characteristics  for  wheat  are  expected  to  be  as 
follows.  From  the  emergence  to  the  jointing  biostage, 
the  wheat  crop  canopy  cover  increases  from  0 to  100 
percent.  The  Landsat  vegetation  detection  threshold 
appears  to  be  approximately  20-percent  canopy 
cover.  Thus,  after  a 20-percent  canopy  cover  has 
been  achieved  on  through  jointing  (May  19  to  June 
10, 197b),  the  wheat  is  expected  to  appear  on  Product 
1 as  various  shades  of  pink  or  red,  depending  on  the 
amount  of  canopy  cover.  From  jointing  through 
heading  (June  10  to  30),  the  crop  is  actively 
metabolizing  and  should  appear  bright  red.  From  the 
heading  through  the  soft-dough  biostage  (July  I to 
26),  prior  to  turning  golden,  the  crop  is  increasing  its 
dry  matter  percentage,  and  wheat  fields  tend  to 
decrease  their  near-infrared  reflectance  slightly.  This 
slight  decrease  in  infrared  reflectance  makes  wheat 
in  this  stage  appear  as  a darker  red  (brick  or  blood 
red)  than  in  the  previous  stage.  As  turning  proceeds 
(July  26),  infrared  reflectance  drops  drastically  and 
red  reflectance  increases  slightly.  Whcatfields  in  the 
turning  stage  appear  dull  yellow,  brown,  or  brow  nish 
green  on  the  CIR  image  product.  After  the  wheat- 
field  has  been  harvested  (August  7),  it  will  appear 
whitish  or  yellowish  white  if  stubble  remains  in  the 
field.  The  harvested  field  may  appear  in  various 
shades  of  gray  if  disking  or  plowing  has  occurred 
The  more  the  field  has  been  plowed,  the  less  stubble 
there  will  be  on  the  surface  of  the  soil  and  the  darker 
gray  the  field  will  appear.  All  small  grains  go  through 
roughly  the  same  color  sequence  on  the  CIR  image 
product. 

Before  specific  field  labeling  is  attempted,  the  ex- 
pectations for  this  segment  must  first  be  established 
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FIGURE  6.— 1976  spectral  crop  calendar  far  flax,  corn,  sprint 
barley,  and  spring  wheat  In  Cass  County,  North  Dakota. 

by  referring  to  the  available  ancillary  data.  From  the 
nominal  North  Dakota  crop  calendar  (fig.  1),  spring 
wheat,  spring  barley,  and  spring  oats  have  very 
closely  timed  coincident  biostages.  Significant  confu- 
sion among  these  small  grains  can  be  expected  if  they 
all  occur  within  the  same  area.  Remembering  that 
there  is  an  indication  that  barley  matures  and  turns 
golden  a little  sooner  than  wheat,  one  can  look  for  an 
acquisition  around  the  critical  time  period  and,  if  it  is 
available,  attempt  to  separate  wheat  and  barley  on 
the  basis  of  the  expected  subtle  temporal-spectral 
differences.  Also  from  the  crop  calendar  data,  it  is 
observed  that  another  crop,  flax,  has  some  overlap 
with  the  small  grains.  Although  the  calendar  indi- 
cates later  planting  and  turning  for  flax  as  compared 
to  the  small  grains,  there  may  be  confusion  between 
small  grains  and  flax  if  acquisitions  are  missing.  No 
other  summer  season  crops  (corn,  potatoes, 
sunflowers,  beans,  etc.),  however,  are  expected  to  be 
significantly  confused  with  wheat  and  the  other 
small  grains  in  this  area. 

The  historical  agricultural  statistics  (fig.  3)  indi- 
cate that,  of  the  possible  wheat  confusion  crops  iden- 
tified from  crop  calendar  analysis,  barley  is  the  only 
crop  type  that  occurs  in  significant  proportions  along 
with  the  spring  wheat  in  this  area.  Analysis  of  the 
segment  position  on  full-frame  imagery  (fig.  S)  does 
not  indicate  any  conditions  that  would  lead  to  adjust- 
ment of  expectations  from  the  county  wide  statistics. 

In  figure  7,  several  fields  of  various  crop  types 
have  been  interpreted  and  outlined.  Crop  calendar 


information  for  the  first  acquisition  (May  7,  1976) 
indicates  that  the  spring  small  grains,  wheat  and 
barley,  have  just  been  planted.  Spring  smalt-grains 
fields,  therefore,  should  appear  as  bare  soil  (dark 
gray  to  black).  Winter  wheat  is  in  the  tillering  (pre- 
jointing) stage  according  to  the  crop  calendar,  and 
significant  vegetation  canopy  is  expected  (definite 
red  color  on  image).  Corn,  flax,  and  other  summer 
crops  are  not  yet  planted  and  therefore  are  still  seen 
as  bare  soil. 

On  the  second  acquisition  (May  2S),  according  to 
crop  calendar  information,  spring  small  grains  are 
emerged  to  tillering.  They  should  begin  to  show 
some  indication  of  red  (dark  purple  to  pinkish  to 
red)  on  the  Product  1 image.  Winter  wheat  should  be 
jointed  and  will  appear  bright  red.  Corn  and  flax  are 
just  being  planted  so  still  appear  as  bare  soil. 

On  acquisition  3 (June  1 1),  spring  small  grains  are 
jointed  to  booted  and  should  have  significant  vegeta- 
tion canopy  and  a definite  red  color  on  the  image. 
Winter  wheat  is  headed  and  should  be  bright  red  or 
beginning  to  darken.  Flax  and  corn  are  emerging  but 
may  not  have  sufficient  canopy  cover  to  allow  detec- 
tion of  vegetation  within  these  fields  as  yet. 

On  acquisition  4 (June  30),  spring  small  grains 
should  be  headed  and  bright  red  or  beginning  to 
darken  slightly.  Winter  wheat  is  in  the  soft-dough 
stage  and  should  be  showing  definite  signs  of  darken- 
ing. Corn  and  flax  may  show  signs  of  canopy 
development  but  may  still  not  have  adequate  canopy 
cover  present  to  indicate  vegetation  on  the  image. 

On  acquisition  S (July  17),  spring  barley  is  starting 
to  turn  (light  greenish  or  yellowish  on  image).  Spring 
wheat,  however,  is  not  yet  to  the  turning  stage  but  is 
still  in  the  soft-dough  stage  and  should  show  darken- 
ing on  the  image.  Winter  wheat  has  turned  and  har- 
vest has  started.  These  fields  will  appear  bright  white 
on  the  image  if  stubble  is  still  present  or  darker  gray 
if  the  field  has  been  plowed  after  harvest.  Corn  and 
flax  are  tasseling  and  blooming,  respectively,  and 
should  appear  as  some  shade  of  red  on  the  image. 

On  the  last  acquisition  (August  23),  all  small 
grains  (winter  and  spring)  have  been  harvested 
(whitish  for  stubble,  gray  for  plowed  field),  flax  is 
starting  to  turn  (darkening  on  image),  and  corn  is 
dented  (still  appears  red  on  the  image). 

This  example  has  been  presented  in  simplified 
form  to  demonstrate  the  principles  involved  in 
multitemporal  analysis  for  crop-type  identification. 
It  can  be  seen  that  small  grains  have  a unique  pattern 
compared  to  other  crop  types.  The  slight  difference 
in  rate  of  maturation  of  spring  barley  and  spring 
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FIGl  Rf  7. — 1976  crop  rear  multlteniporal  set  of  acquisition*.  for  a segment  In  l ass  Count),  Norlli  Dakota;  SW  - spring  wheal.  H - 
spring  barlet,  ft  W - winter  wheal,  KX  - flax,  and  t — corn  (a)  Mat  7.  (bl  Mat  25.  (Cl  June  II.  Id  I June  .III.  (el  Jul)  17.(0  August 
2.1. 
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wheat  is  also  demonstrated.  Precise  optimum  timing 
of  acquisitions  at  wheat-soft-dough  and  barley-turn- 
mg  biostage  is  critical.  An  acquisition  missing 
altogether  at  this  period  or  less  than  optimally  timed 
can  make  wheat  and  barley  separation  almost  im- 
possible. Also,  the  heavy  dependence  on  crop  calen- 
dar data  should  be  noted.  As  stated  earlier,  crop 
calendar  data  is  the  single  most  important  piece  of 
ancillary  data  in  crop-type  identification. 


MANUAL  CROP  IDENTIFICATION 
PROBLEMS 

In  Phase  I and  Phase  II  of  LACIE,  it  was  found 
that  in  some  regions,  the  analyst's  interpretation  er- 
ror was  beyond  the  tolerance  limits.  Several  problem 
areas  associated  with  each  of  the  two  main  in- 
terpretation components,  feature  detection  and 
feature  identification,  were  identified,  and  the  solu- 
tion of  these  problems  was  addressed  by  LACIE 
through  cooperation  between  the  research  com- 
munity and  LACIE  operations  personnel.  A detailed 
description  of  the  manual  interpretation  problems 
encountered  and  the  supporting  research  efforts  on 
these  problems  is  presented  in  the  paper  by  Hay  en- 
titled “Manual  Landsat  Data  Analysis  for  Crop  Type 
Identification.”  A briefer  discussion  of  manual  in- 
terpretation problems  will  be  presented  here  for  the 
purpose  of  completeness  within  this  paper. 


Problems  in  Feature  Detection 
and  Characteristics  Determination 

As  stated  earlier,  Landsat  data  are  used  for  feature 
detection  and  physical  characteristics  determination. 
Thus,  the  capability  of  Landsat  data  products  to 
clearly  and  accurately  represent  spatial  and  spectral 
data  to  the  analyst  is  of  great  concern.  During 
LACIE  Phases  I and  II,  the  only  Landsat  data  prod- 
ucts available  to  analysts  were  the  CIR  image  Prod- 
uct 1 and  positive-negative  image  Product  2.  These 
image  products  are  good,  effective  data  display  for- 
mats for  the  extraction  of  spatial  information,  such 
as  feature  size,  shape,  relationship  to  neighboring 
features,  and  distribution  throughout  an  area. 
However,  image  Product  1 can  provide  only  gross, 
relative  spectral  information  about  a feature. 
Although  gross,  relative  spectral  information  is  often 
sufficient  for  crop-type  identification  using 
multitemporal  analysis  procedures,  numerous  situa- 


tions were  encountered  in  LACIE  Phase  I and  Phase 
11  where  the  image  Product  1 did  not  represent  the 
Landsat  spectral  data  sufficiently  for  correct  crop- 
type  labeling. 

Another  crop  identification  problem  related  to 
feature  characteristics  determination  is  insufficient 
temporal  sampling.  As  stated  previously,  the  tem- 
poral-spectral pattern  throughout  the  growing  season 
is  the  most  significant  feature  characteristic  for  crop- 
type  identification.  If  this  characteristic  pattern  is 
not  adequately  determined,  there  is  a greater  prob- 
ability of  confusion  between  crop  types.  Two  causes 
of  insufficient  temporal  sampling  which  lead  to  in- 
adequate temporal-spectral  pattern  determination 
are  (1)  missing  Landsat  acquisitions  because  of  cloud 
cover  or  other  reasons  and  (2)  periodicity  of  Landsat 
overpasses. 

Features  below  the  resolution  limit  of  the  Landsat 
sensors  (approximately  1 acre)  cannot  be  detected. 
Thus,  correct  crop  identification  with  Landsat-1  and 
Landsat-2  data  is  impossible  for  fields  less  than  1 
acre  and  improbable  for  fields  from  S to  10  acres.  The 
improbability  of  correctly  identifying  fields  from  S to 
10  acres  is  a function  of  misregistration  between  ac- 
quisitions and  boundary  (mixed)  pixel  problems.  It 
is  necessary  to  determine  the  spectral  changes  of  a 
field  over  time  fairly  accurately.  If  data  points  repre- 
senting a given  ground  location  cannot  be  overlaid 
from  one  acquisition  to  another  to  a fairly  precise 
degree,  an  accurate  temporal-spectral  pattern  and 
crop  type  cannot  be  determined. 


Problem*  in  Feature  Evaluation 

Most  of  the  remaining  sources  of  error  in  manual 
crop-type  labeling  in  LACIE  were  a function  of  in- 
sufficient a priori  knowledge  or  ancillary  data  or  of 
nonoptimum  labeling  procedures. 

Insufficient  a priori  knowledge  and  ancillary  data. — 
One  deficiency  in  a priori  knowledge  which  had  an 
effect  on  analyst  labeling  accuracy,  particularly  in  the 
early  phases  of  LACIE,  was  the  lack  of  adequate  in- 
formation concerning  the  variability  in  the  temporal- 
spectral  patterns  of  wheat,  small  grains,  and  other 
crop  types.  A related  deficiency  was  the  lack  of  ade- 
quate crop-type  temporal-spectral  separability  infor- 
mation. No  specific  information  about  the  temporal- 
spectral  patterns  of  crop  types  other  than  wheat  was 
available  to  the  analysts.  These  deficiencies  resulted 
in  omission  errors  for  wheat  and  small  grains. 
Analysts  incorrectly  assumed  less  variability  in 
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wheat  temporal-spectral  patterns  than  was  actually 
present.  Thus,  interpreted  labels  were  conservative 
with  respect  to  wheat.  As  the  analysts'  experience  in- 
creased through  LACIE,  they  developed  a better 
feeling  for  the  true  wheat  temporal-spectral  pattern 
variability.  However,  additional  variability  informa- 
tion was  definitely  needed  in  abnormal  situations, 
such  as  the  occurrence  of  drought,  winterkill,  or 
other  episodal  events.  Similarly,  without  specific  in- 
formation about  the  temporal-spectral  patterns  of 
crops  other  than  wheat,  analysts  could  not 
“doublecheck”  their  identifications  by  working  the 
problem  in  reverse.  That  is,  in  addition  to  responding 
to  the  question,  “Is  this  pixel  wheat?"  the  analyst 
could  have  posed  and  responded  to  the  questions  (1) 
“What  crop  type  is  this  pixel?”  and  (2)  “What  crop 
types  are  definitely  not  represented  by  this  pixel?" 
Being  able  to  eliminate  certain  crop  types  often 
forces  the  analyst  to  go  back,  reconsider,  and  change 
his  initial  answer  to  the  limited  question  first  posed. 
However,  since  the  analyst  did  not  have  the  other 
specific  crop-type  data  and  the  needed  temporal- 
spectral  variability  information,  he  could  not 
doublecheck  his  initial  answer.  The  result  was  that 
some  wheat  was  mislabeled  or  omitted. 

Nonoptimum  labeling  procedure. — A large  number 
of  labeling  errors  traced  to  the  analyst  consisted  of 
the  labels  affixed  to  misregistered  and  boundary 
(mixed)  pixels.  Misregistered  pixels  are  those  that 
jump  back  and  forth  between  one  field  and  an  adja- 
cent one  on  successive  acquisitions.  Boundary  pixels 
are  mixtures  of  the  signatures  from  two  adjacent 
fields.  In  LACIE,  the  analyst  affixed  a definite  crop- 
type  label  to  a boundary  or  misregistered  pixel.  To  do 
this,  he  specified  a reference  acquisition  on  which  he 
labeled  the  pixel.  He  “guaranteed"  the  pixel  label  for 
that  reference  acquisition  only,  and  not  for  any  other 
acquisitions.  This  led  to  analyst-credited  “mislabel- 
ing" when  the  pixel  label  was  not  appropriate  for  the 
majority  of  the  segment's  acquisitions  that  were 
machine  processed  and  subsequently  checked  in  ac- 
curacy assessment. 

SUMMARY  ANO  CONCLUSIONS 

This  paper  constitutes  an  attempt  to  analyze  the 
manual  interpretation  process.  Although  manual 
crop  identification  in  LACIE  did  not  achieve  100- 
percent  accuracy,  the  error  in  crop  identification  was 
more  a consequence  of  insufficiencies  in  the  Land- 
sat,  a priori,  and  ancillary  data  than  totally  the  result 
of  inaccurate  or  nonoptimum  procedures.  Situations 


will  occur  where  the  “correct"  interpretation  cannot 
be  reached  on  the  basis  of  the  data  available  for  in- 
terpretation. No  procedure,  whether  it  be  manual  or 
automatic,  can  consistently  reach  the  correct  conclu- 
sion if  the  set  of  necessary  and  sufficient  data  is  not 
available.  The  manual  interpretation  procedures  used 
in  LACIE  were  adequate  to  support  the  accuracy 
goal  for  winter  wheat  and  U.S.S.R.  spring  wheat  (see 
the  paper  by  Marquis  entitled  “Lacie  Area,  Yield, 
and  Production  Estimate  Characteristics:  U.S.  Great 
Plains."  the  paper  by  Hickman  entitled  “LACIE 
Area,  Yield,  and  Production  Estimate  Charac- 
teristics: U.S.S.R,”  and  the  paper  by  Potter  et  al.  en- 
titled “Accuracy  and  Performance  of  LACIE  Area 
Estimates").  In  areas  where  the  accuracy  goal  was 
not  met  (namely.  U.S.  and  Canadian  spring  wheat 
areas:  see  the  papers  by  Marquis  and  Potter  et  al.  and 
the  paper  by  Conte  et  al.  entitled  “LACIE  Area. 
Yield,  and  Production  Estimate  Characteristics: 
Canada"),  the  failure  to  provide  adequate  acreage 
estimates  was  at  least  as  equally,  if  not  more  directly, 
due  to  Landsat  sensor  limitations  (resolution  limit, 
temporal  sampling  rate,  etc.)  as  to  manual  interpreta- 
tion deficiencies.  This  does  not  say  that  better 
manual  interpretation  and  measurement  procedures 
are  not  possible.  Indeed,  much  research  is  in  progress 
toward  such  improvements.  It  does  say.  however, 
that  current  procedures  are  adequate  to  support  a 
technology  capable  of  supplying  valuable  agricultural 
resource  information. 

This  paper  has  stressed  only  the  logical  processes 
involved  in  interpretation  rather  than  giving  a highly 
detailed  step-by-step  description  of  analyst  pro- 
cedures. Such  a step-by-step  description  of  LACIE 
analyst  procedures  is  available  elsewhere  (ref.  2).  In 
conclusion,  the  LACIE  experience  has  clarified  the 
perception  of  the  manual  interpretation  process. 
This  clearer  perception  will  significantly  aid  current 
research  in  improving  manual  interpretation  pro- 
cedures and  developing  automated  or  semiauto- 
mated  crop-type  labeling  procedures. 
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System  Implementation  and  Operations 


FOREWORD 

The  LACIE  Applications  Evaluation  System 
(AES)  was  composed  of  the  personnel,  procedures, 
and  systems  which  over  a 3-year  period  operated  and 
evaluated  the  LACIE  technology.  This  was  a very 
diverse  activity  with  three  participating  governme”' 
agencies,  utilizing  numerous  data  systems  and 
facilities  located  across  the  United  States.  The  varied 
system  products  were  integrated  to  produce  the  final 
LACIE  output  at  the  NASA  Johnson  Space  Center 
(JSC). 

The  AES  was  functionally  separated  into  data  ac- 
quisition and  management,  area  estimation,  yield 
estimation,  production  estimation,  system  opera- 
tions and  control,  and  efficiency  and  accuracy  assess- 
ment. This  chapter  will  be  directed  toward  the 
phased  implementation  of  the  Large  Area  Crop  In- 
ventory Experiment  and  the  manner  in  which  the 
AES  was  operated  and  evaluated. 

The  acquisition,  preprocessing,  and  storage  of  data 
for  LACIE  was  probably  the  most  diverse  of  the 
functions.  The  collection  and  processing  of  Landsat 
data  was  the  responsibility  of  the  NASA  Goddard 
Space  Flight  Center  (GSFC).  The  existing  Landsat 
ground  processing  system  was  used  to  obtain  the 
Landsat  imagery,  and  the  LACIE  processing  system 
implemented  at  GSFC  provided  custom  processing 
to  produce  the  S-  by  6-nautical-mile  segments  re- 
quired by  the  project.  This  activity  is  described  in 
“Acquisition  and  Preprocessing  of  Landsat  Data”  by 
Horn  et  al.  The  other  major  source  of  real-time  data 
was  the  existing  worldwide  weather  station  net- 
works. This  weather  information  was  assembled  and 
formatted  for  LACIE  use  by  NOAA  and  is  described 
in  “Operation  of  the  Yield  Estimation  Subsystem” 
by  McCrary  et  al. 

The  reformatting,  storage,  and  retrieval  of  the  data 
required  in  the  LACIE  project  was  a mstjor  activity. 
A vast  amount  of  digital  Landsat  data  was  acquired 
and  processed  on  a daily  basis  and  maintained  in 
electronic  data  bases  as  described  in  “LACIE  Data- 
Handling  Techniques”  by  Waits.  Ground  inventories 
were  obtained  for  about  one-third  of  the  U.S.  seg- 


ments each  year.  The  collection  and  handling  of  this 
“ground  truth,”  performed  for  the  accuracy  assess- 
ment program,  is  addressed  in  “Ancillary  Data  Ac- 
quisition for  LACIE”  by  Spiers  and  Patterson.  The 
majority  of  the  nonelectronic  data  used  in  the 
LACIE  project  (e.g.,  maps,  periodicals,  photographic 
products,  reports)  was  stored  in  an  extensive  data 
library  as  detailed  in  “The  Acquisition,  Storage,  and 
Dissemination  of  Landsat  and  Other  LACIE  Support 
Data”  by  Abbotts  and  Nelson. 

The  element  of  the  AES  that  had  the  operational 
responsibility  for  Landsat  data  analysis  was  the 
Classification  and  Mensuration  Subsystem  (CAMS). 
The  implementation  and  operation  of  CAMS  was  the 
responsibility  of  the  JSC  Earth  Observations  Divi- 
sion (EOD).  A discussion  of  the  phased  implementa- 
tion and  operation  of  CAMS  is  presented  in  “The 
Classification  and  Mensuration  Subsystem”  by 
Abotteen  and  Bizzell.  The  mqjor  portion  of  the 
CAMS  operation  utilized  a batch-run  capability  on  a 
large  computer  in  the  JSC  Mission  Control  Center. 
During  the  latter  part  of  Phase  II,  an  interactive 
capability  was  implemented  on  a small  computer  in 
the  JSC-EOD  as  described  in  “Concepts  Leading  to 
the  IMAGE-100  Hybrid  Interactive  System”  by 
Mackin  and  Sulester.  This  system  was  used  during 
Phase  III  to  develop  procedures  and  operationally 
proce  :s  SO  segments  in  the  U.S.S.R.  This  operational 
processing  was  performed  by  USDA  analysts  and  is 
addressed  in  “USDA  Analyst  Review  of  the  LACIE 
IMAGE-100/Hybrid  System  Test”  by  Ashburn  et  al. 

The  NOAA  played  a major  role  in  the  project  by 
supplying  real-time  and  historical  meteorological 
data  and  developing  and  operating  yield  models  and 
crop  calendars.  These  activities  are  explained  in  the 
paper  by  McCrary  et  al.  The  yield  model  results  were 
aggregated  with  the  area  estimates  to  produce  the 
production  estimates.  The  crop  calendar  outputs 
were  provided  to  the  CAMS  analysts  to  aid  in  relat- 
ing expected  crop  growth  stage  to  the  signatures  ob- 
served in  the  Landsat  imagery. 

The  area  and  yield  estimates  were  input  to  the 
Crop  Assessment  Subsystem  (CAS).  CAS  was  im- 
plemented by  the  JSC-EOD  during  Phase  1 and  oper- 
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ated  by  USDA  personnel  during  the  remainder  of 
LACIE.  CAS  periodically,  on  a predefined  schedule, 
aggregated  the  area  and  yield  inputs,  produced  pro- 
duction estimates,  and  generated  detailed  reports  to 
document  the  LACIE  estimates.  The  development 
and  operation  of  the  CAS  system  is  described  in 
“The  Crop  Assessment  Subsystem:  System  Imple- 
mentation and  Approaches  Used  for  the  Generation 
of  Crop  Production  Reports”  by  McAllum  et  al. 

The  various  subsystems  of  the  AES  were  each 
controlled  by  a subsystem  manager  primarily  con- 
cerned with  the  accomplishment  of  the  objectives  of 
that  particular  subsystem.  The  coordination  of  the 
activities  of  the  mqjor  functional  subsystems  and  the 
collection  of  the  information  to  provide  project  man- 
agement with  insight  into  the  actual  operation  of  the 
AES  as  a system  was  provided  by  a number  of  coor- 
dinators, including  the  LACIE  operations  manager, 
quality  assurance  manager,  facilities  manager,  the 


manager  of  the  Data  Acquisition,  Preprocessing  and 
Transmission  Subsystem,  and  the  manager  of  the  In- 
formation Storage,  Retrieval,  and  Reformatting  Sub- 
system. Descriptions  of  these  coordination  activities 
are  presented  in  “LACIE  Status  and  Tracking"  by 
Dauphin  et  al.,  “LACIE  Quality  Assurance"  by 
Gutschewski,  “Operations  Reporting"  by  Musgrove 
and  Dale  Marquis,  and  “EOD  Facilities  Configura- 
tion Management  Office”  by  Dauphin  and  Palmer. 

The  results  of  the  experiment  operations  were 
evaluated  by  the  Accuracy  Assessment  System, 
which  concentrated  on  the  accuracy  of  the  area, 
yield,  and  production  estimates.  The  approach  and 
the  system  developed  for  this  assessment  is  deline- 
ated in  “Accuracy  Assessment  System  and  Opera- 
tion” by  Pitts  et  al.  The  efficiency  of  the  experiment 
operation  was  monitored  during  the  three  phases  of 
LACIE  and  is  discussed  in  the  “LACIE  Applications 
Evaluation  System  Efficiency  Report”  by  White. 
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Acquisition  and  Preprocessing  of  Landsat  Data 

T.  N.  Horn,0  L E.  Brown, b and  W'.  H ’.  Anonsen b 


INTRODUCTION 

Early  in  1974,  a development  effort  was  under* 
taken  at  the  NASA  Goddard  Space  Flight  Center 
(GSFC)  to  establish  a data  acquisition  and  process* 
ing  system  to  support  the  LACIE.  Designated  the 
Data  Acquisition,  Preprocessing,  and  Transmission 
Subsystem  at  GSFC  (DAPTS/GSFC),  this  system 
was  to  provide  Landsat  data  inputs  to  the  LACIE 
system  at  the  NASA  Johnson  Space  Center  (JSC), 
where  a joint  NASA/U.S.  Department  of  Agriculture 
(USDA)/Nationa!  Oceanic  and  Atmospheric  Ad- 
ministration (NOA  A)  team  would  perform  analyses 
and  evaluation  in  pursuit  of  LACIE  objectives.  Re- 
quirements imposed  on  the  GSFC  system  included 
the  following. 

1.  Temporal  registration  of  selected  Landsat  data 
to  within  1 pixel  root  mean  square 

2.  Data  acquisition,  processing,  and  transmittal  to 
JSC  within  7 days 

3.  Capacity  to  handle  data  for  4800  site  locations, 
with  multiple  coverage  required  for  960  sites  and 
four-time  coverage  required  for  the  remaining  3840 
sites 

In  response  to  these  requirements,  DAPTS/GSFC 
was  configured  to  operate  as  an  integral  part  of  the 
established  Landsat  ground  system  at  GSFC.  This 
system  was  designed  to  use  existing  equipment  end 
processing  capabilities  as  much  as  possible  in  order 
to  maximize  hardware  compatibility  ?md  minimize 
software  development.  An  operation  team  with  con- 
siderable Landsat  experience  was  assembled  to  sup- 
port the  start  of  production  processing  in  January 
1975.  The  development  of  that  initial  system  and  the 
evolutionary  changes  which  followed  have  consis- 
tent!) been  aimed  at  providing  in  a timely  fashion 
the  data  required  by  the  LACIE  organization.  A 
review  of  system  performance  since  early  197S  sub- 
stantiates the  success  this  system  has  achieved. 


“General  Electric  Space  Division,  Beluville.  Maryland. 
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LAND8AT  OVERVIEW 

The  first  Earth  resources  technology  satellite, 
Landsat-1,  was  launched  in  July  1972.  Its  mission 
was  to  orbit  the  Earth  and  return  images  of  the 
Earth's  surface.  A second  Landsat  satellite  was 
launched  in  January  1975,  and  a third  was  launched 
in  March  1978.  Each  satellite  contained  a 
muHispectral  scanner  (MSS)  and  a return  beam 
vidicon  (RBV)  camera  system.  The  MSS  (fig.  1)  con- 
sists of  an  oscillating  mirror  that  scans  the  Earth's 
surface  horizontally,  while  the  forward  motion  of  the 
spacecraft  provides  vertical  displacement.  The  Land- 
sat-) and  Landsat-2  scanners  are  four-channel 
systems,  while  the  Landsat-3  MSS  is  i five-channel 
scanner.  The  RBV  units  (fig.  2)  are  shuttered  camera 
systems.  On  Landsat-1  and  Landsat-? , three  cameras 
were  arranged  to  provide  coincident  images  in  three 
spectral  bands.  The  Landsat-3  system  substitutes  two 
panchromatic  cameras  that  are  alined  to  provide  ad- 
joining images  which  have  improved  resolution.  The 
spectral  characteristics  of  both  sensors  are  listed  in 
table  I. 

The  operation  of  all  Landsat  satellites  in  collecting 
image  data  is  directed  and  controlled  by  the  Landsat 
Operations  Control  Center,  which  is  located  at 
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GSFC.  Once  acquired,  image  data  are  either  relayed 
directly  to  a Landsal  ground  station  or  recorded  on 
the  satellite  for  later  transmission.  The  three  primary 
Landsat  ground  stations  are  located  in  Alaska,  in 
California,  and  at  GSFC;  a portable  ground  station 
was  also  deployed  in  Pakistan  between  October  1976 
and  September  1977  to  assist  in  collecting  Landsat 
image  data.  The  ground  station  records  the  dst«  on 
wideband  video  tapes,  and  these  tapes  are  then  sent 
to  GSFC  for  processing.  The  overall  Landsat  system 
is  illustrated  in  figure  3. 


At  GSFC,  MSS  and  RBV  data  are  processed  to 
produce  both  photographic  and  digital  data  products. 
The  processing  involved  includes  reformatting,  an* 
notation,  radiometric  calibration,  and  geometric  cor- 
rection of  various  types,  depending  on  data  and  prod- 
uct type.  The  Landsat  data  processing  flow  is  shown 
in  figure  4.  In  its  current  hybrid  configuration,  the 
Landsat  processing  system  produces  a film  archive 
in  70-millimeter  format.  This  archive  is  then  used  to 
generate  photographs  for  Landsat  users.  Upon  re- 
quest, the  original  video  tape  is  used  to  produce 
digital  products.  Copies  of  the  film  archive  are  also 
provided  to  several  other  data  centers  for  use  in 
generating  and  distributing  Landsat  data  products  of 
various  types.  Future  plans  at  GSFC  call  for  conver- 
sion to  a digital  archive,  with  accompanying  im- 
provements in  image  processing  capability.  More 
thorough  descriptions  of  various  Landsat  system  ele- 
ments are  provided  in  the  Landsat  Data  Users 
Handbook  (ref.  1). 


DAPTS/QSFC  OVERVIEW 

In  order  to  meet  LACIE  requirements  for  Landsat 
MSS  data,  a dedicated  acquisition  and  processing 
flow  path  was  established  at  GSFC  within  the  Land- 
sat ground  system.  Although  many  of  the  processing 
functions  are  similar,  throughput  requirements  and 
time-line  constraints  dictated  that  a separate  end-to- 
end  flow  be  established.  Several  elements  included  in 
this  path  also  perform  other  Landsat  functions  on  a 
shared  basis,  while  other  elements  are  totally  dedi- 
cated to  LACIE  support.  The  DAPTS/GSFC  system 
is  shown  in  figure  5.  Comparison  of  this  figure  to 
figure  4 illustrates  the  similarity  between  the  two 
systems. 

Each  block  in  figure  S represents  a separate  pro- 
cessing subsystem  at  GSFC.  As  shown,  GSFC 
receives  LACIE  requirements  on  computer  tapes  (or 
cards).  These  inputs  are  processed  within  the 
general-purpose  image  processor  subsystem  and  a 
test  site  tape  is  generated.  This  tape  is  provided  to  the 
Control  Center  in  order  to  schedule  Landsat  MSS 
coverage  of  LACIE  test  sites.  Unlike  most  othe; 
Landsat  coverage  requirements,  LACIE  coverage 
scheduling  dees  not  include  consideration  of  pre- 
dicted cloud  cover  (although  the  capability  to  do  so  is 
available).  Data  are  recorded  on  wideband  video 
tapes  and  shipped  to  GSFC  as  an  integral  part  of  the 
Landsat  data  flow. 

The  Control  Center  generates  two  computer  tapes 
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FIGtRE  3.— Ottrall  Landaai  «ysttm. 


RE  4.— Undut  data  arquUilfan  and  promoting  flow  al  KU.  till  >. — DAPTS/LAltl  data  acqoUIUtm  and  ptomaine 
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to  enable  processing  and  trucking  of  MSS  data.  After 
data  acquisition  has  been  confirmed,  a spacecraft 
location  and  attitude  tape  is  delivered  to  the  Data 
Services  Laboratory  (DSL)  for  processing.  This  tape 
defines  the  acquisitioi.  dates  for  which  processing  is 
to  be  performed  and  contains  the  ephemeris  and 
telemetry  data<required  for  processing.  A status  tape 
for  inclusion  in  the  LACIE  master  file  is  also  gener- 
ated to  report  on  the  scheduling  and  data  acquisition 
activities  performed  by  the  Control  Center.  The 
Control  Center  uses  XDS-Honeywell  Sigma  3 and 
Sigma  S computers  in  performing  its  functions. 

The  DSL  locates  LACIE  data  within  the  MSS  data 
stream,  calculates  geometric  correction  coefficients, 
and  performs  the  image  annotation  processing  re- 
quired for  LACIE  data.  A copy  of  the  test  site  tape 
provided  to  the  Control  Center  is  used  by  the  DSL  to 
identify  LACIE  test  site  data.  Detailed  descriptions 
of  the  algorithms  involved  in  data  location  and  cor- 
rection coefficient  development  are  provided  in 
reference  2.  The  resulting  annotation  data  are 
recorded  on  the  LACIE  image  annotation  tape, 
which  is  then  provided  to  the  digital  subsystem.  This 
annotation  tape  contains  the  information  required  by 
the  digital  subsystem  to  extract  10-  by  1 1-nautical- 
mile  “search  areas”  from  the  100-naulical-mile-wide 
swaths  of  MSS  data.  In  addition  to  the  annotation 
tape,  the  DSL  also  produces  a status  report  tape, 
which  provides  processing  activity  inputs  for  the 
LACIE  master  file.  Processing  for  the  Data  Services 
Lab  is  performed  on  an  XDS-Honeywell  Sigma  5 
computer. 

When  the  video  tapes  have  been  received  at 
GSFC,  the  MSS  data  for  LACIE  are  digitized  by  the 
MSS  preprocessor.  Data  from  the  video  tapes  are 
transferred  to  a high-density  digital  tape,  which  can 
be  further  processed  by  the  digital  subsystem.  During 
the  transfer  process,  data  quality  checks  are  per- 
formed, and  poor-quality  data  are  replaced  by  adja- 
cent data  on  a line-by-line  basis.  (Replacements  of 
this  kind  result  in  data  being  flagged  as  marginal 
when  transmitted  to  JSC.)  Information  on  MSS 
preprocessing  is  manually  transferred  to  the  DSL  for 
inclusion  in  the  LACIE  status  report  tape. 

The  digital  subsystem  processes  data  from  the 
high-density  tape,  using  the  LACIE  image  annota- 
tion tape  as  a control.  Data  for  each  LACIE  site  are 
extracted  from  the  high-density  tape  and  transferred 
to  nine-track  computer-compatible  tapes  (CCT’s). 
Radiometric  calibration  is  performed  during  the  ex- 
traction process.  Following  extraction,  each  data  set 
is  reprocessed  to  permit  reformatting,  development 


of  a radiometric  histogram  table,  and  cloud-cover 
screening.  Automatic  cloud-cover  detection  is  per- 
formed within  the  digital  subsystem  using  a simple 
level-slicing  technique.  The  criterion  for  rejection 
was  set  at  10  percent  of  the  pixels  in  the  search  area. 
The  radiance  threshold  was  initially  set  at  an  ab- 
solute count  of  90  in  the  0.5-  to  0.6-micrometer  chan- 
nel (band  4).  However,  after  several  months  of 
operation,  this  level  was  adjusted  to  60  in  order  to 
more  closely  match  the  cloud  sensitivity  level  of  the 
temporal  registration  process.  If  more  than  10  per- 
cent of  the  pixels  have  a count  greater  than  60,  the 
data  set  is  discau.  d;  otherwise,  it  is  transferred  to 
the  search  area  conii  Liter-compatible  tape,  which  is 
provided  to  the  general-purpose  image  processing 
subsystem.  In  addition  to  the  search  area  tape,  the 
digital  subsystem  produces  a report  tape  containing 
inputs  to  the  LACIE  master  file. 

The  general-purpose  image  processor  performs 
the  geometric  correction  and  temporal  registration 
functions  required  of  DAPTS/GSFC.  The  data  with- 
in each  search  area  are  first  geometrically  corrected 
through  resampling  and  then  regijte.-d  with  pre- 
vious data  for  the  same  LACIE  site.  An  edge  detec- 
tion and  correlation  technique  is  used  to  establish 
temporal  registration  between  data  sets.  This  tech- 
nique involves  using  radiance  gradients  within  each 
data  set  to  identify  feature  edges,  such  as  field  bound- 
aries. (Feature  recognition  of  this  type  should  not  be 
affected  bv  seasonal  changes  in  radiance  levels.)  The 
edge  patterns  developed  for  two  data  sets  are  then 
compared  statistically  to  determine  a registered 
alinement.  This  technique  ij  illustrated  in  figure  6, 
and  the  algorithms  involved  are  described  elsewhere 
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FIGURE  6.— Temporal  registration  process. 
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in  this  volume  (G.  Grcbowsky,  “LACIF  Registra- 
tion Processing  System")  and  in  reference  2.  The 
results  of  the  correlation  process  .ire  used  to  identify 
and  extract  from  within  the  search  area  a 5-  by  6- 
nautical-mile  “sample  segment"  that  precisely 
matches  previously  processed  data.  If  no  prior  data 
has  been  processed  for  a site,  the  5-  by  6-nautical- 
milc  sample  segment  to  be  sent  to  JSC  is  extracted 
from  the  center  of  the  search  area.  The  histogram 
data  provided  on  the  search  area  tape  are  used  to 
calculate  Him  recorder  parameters  used  in  JSC  pro- 
cessing, and  the  extracted  sample  segment  is 
transferred  to  an  output  tape  for  transmission  to  JSC. 
As  in  other  subsystems,  a status  report  tape  is  pro- 
duced for  use  in  updating  the  LACIE  master  file. 

In  order  to  support  the  temporal  registration  proc- 
ess, the  controlling  data  base  for  DAPTS/GSFC  is 
maintained  within  the  image  processor.  Each  re- 
quirement update  received  from  JSC  is  processed 
within  the  image  processor,  and  the  registration 
reference  file  is  also  maintained  in  this  subsystem. 

As  a separate  function,  the  image  processor  up- 
dates the  LACIE  master  file,  processing  status  report 
tapes  from  other  subsystems  and  from  its  own 
registration  processing.  In  addition,  requirement  up- 
dates. final  data  transmission  reports,  and 
postprocessing  quality  and  inspection  reports  are 
also  recorded  in  the  master  file,  resulting  in  an  inte- 
grated end-to-end  record  of  DAPTS/GSFC  activities. 
The  image  processing  subsystem  also  produces 
master  file  reports  in  data  list  form,  primarily  for 
analysis  purposes.  General-purpose  image  processing 
involves  the  application  of  an  enhanced  XDS- 
Honeywell  Sigma  3 computer. 

As  an  addition  to  the  LACIF  system  shown  in 
figure  2.  a master  file  retrieval  system  was  developed 
to  provide  an  additional  capability  to  produce  status, 
summary,  and  analytical  reports  on  DAPTS/GSFC 
operation.  Master  file  tapes  provide  input  data  to  this 
system,  which  operates  on  the  DNIVAC  1108  com- 
puter at  GSFC  (not  a part  of  the  Lundsat  ground 
system). 


INITIAL  CONFIGURATION 

In  January  1975,  DAPI’S/GSFC  was  configured  to 
support  LACIE  through  the  use  of  Landsat-I.  Newly 
acquired  data  were  to  be  processed  and  transmitted 
to  JSC  within  1 days.  At  that  time,  video  tape  data 
were  transferred  to  high-density  tape  by  the  initial 
image  generation  subsystem,  as  the  MSS 


preprocessor  was  not  yet  operational.  Telemetry  and 
cphemcris  data  were  provided  by  the  Control  Center 
on  separate  tapes,  and  only  a data  list  could  be  used 
in  producing  master  file  reports. 

In  this  initial  configuration,  constraints  were  im- 
posed on  LACIE  site  location  in  order  to  preclude 
search  area  extraction  overloads  in  the  digital  sub- 
system, where  only  one  pass  through  the  Landsat 
data  stream  was  permitted.  The  image  processing  Sig- 
ma 3 computer  system,  which  was  originally  a part  of 
the  Landsat  scene  correction  (i.e.,  precision  process- 
ing) subsystem,  did  not  initially  include  a disk 
storage  capability;  as  a result,  a tape  oriented  con- 
trolling data  base  system  was  implem.  ted.  With 
only  research/dcvelopment  study  results  rely  on 
in  implementing  the  temporal  registration  | . vess, 
an  interactive  registration  system  was  readied  to  sup- 
plement the  automatic  correlator  (or  to  replace  it  if 
necessary).  Arrangements  for  daily  data  transmis- 
sions to  JSC  involved  courier  service  to  a nearby  air- 
port. airfreight  transportation  to  Houston,  and 
courier  service  pickup  and  delivery  to  JSC. 

When  production  operations  began  on  January  13, 
1975,  an  evolutionary  sequence  also  began,  even- 
tually leading  to  the  system  currently  in  operation  at 
GSFC.  Some  enhancements  were  actually  Landsat 
system  improvements,  from  which  the  LACIE 
system  also  benefited,  while  others  were  improve- 
ments specifically  intended  to  upgrade  the  GSFC 
LACIE  support  capability  or  to  meet  a newly  estab- 
lished LACIE  requirement. 


SYSTEM  ENHANCEMENTS 

Landsat-2  was  successfully  launched  on  January 
22,  1975,  and,  after  being  declared  operational  in 
early  February,  was  designated  to  replace  Landsat-1 
as  the  prime  source  of  data  for  LACIF.  use.  Both 
satellites,  however,  remained  in  operation  to  serve 
the  Landsat  user  community,  and  processing  loads  at 
GSFC  increased  accordingly.  In  March,  the  MSS 
preprocessor  became  available  to  support  LACIE 
processing  and  Landsat  digital  data  product  genera- 
tion for  other  users.  The  introduction  of  this  sub- 
system allowed  the  initial  image  generation  sub- 
system to  be  dedicated  to  film  production.  This 
modification  enhanced  the  GSFC  capability  to  sup- 
port two  Landsat  satellites  and  also  eliminated  the 
possibility  of  conflicts  between  LACIF  processing 
and  Landsat  film  archive  generation. 

In  November  1976,  the  LACIE  system  at  GSFC 
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again  benefited  from  a Landsat  system  enhance* 
ment.  Spacecraft  telemetry  and  ephemeris  data, 
which  had  previously  been  transmitted  across  the 
Control  Center/processing  system  interface  indepen* 
dently,  were  merged  with  video  tape  information 
into  a spacecraft  location  and  attitude  tape.  This 
operational  simplification  resulted  in  time-line  im- 
provements for  both  Landsat  and  LAC1E  process- 
ing. 

Several  enhancements  unique  to  LAC1E  have  also 
been  implemented  since  early  197$  and  have  resulted 
in  both  increased  throughput  and  improved  perform- 
ance. Early  analysis  results  indicated  that  the  GSFC 
system  processed  data  that  were  "too  cloudy"  to  be 
useful  at  JSC  (Excessive  snow  cover  was  also  con- 
sidered unacceptable.)  Accordingly,  an  interactive 
screening  function  was  established  within  the  image 
processor  to  permit  operator  rejection  of  data  not 
meeting  JSC  criteria.  In  order  to  expedite  processing, 
an  overlapping  configuration  was  established,  in 
which  two  data  sets  would  undergo  processing  in  a 
time-shared  fashion.  In  this  configuration,  the  data 
from  one  set  are  displayed  for  operator  examination 
while  the  other  set  is  involved  in  computer  process- 
ing. As  illustrated  in  figure  7,  the  two  data  sets  alter- 
nately undergo  display  and  processing  in  staggered 
fashion  through  the  several  steps  necessary  to  com- 
plete registration  processing.  This  design  allows  early 
rejection  of  data  judged  to  be  too  cloudy,  provides  for 
a later  "second  look"  to  handle  marginal  cases,  and 
maximizes  the  efficiency  of  both  the  operator  and 
the  computer  system  throughout  the  processing  se- 
quence. 
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After  2 years  of  operation,  it  became  evident  that 
cloud/snow-cover  rejection  rates  were  seasonally  de- 
pendent. During  the  spring  and  summer  months, 
lower  data  rejection  rates  often  could  not  justify  the 
operator/display  time  involved  in  the  screening  proc- 
ess. Accordingly,  the  image  processor  configuration 
was  again  modified  to  provide  cloud-cover  screening 
as  a selectable  option.  In  this  present  configuration, 
the  on-line  screening  option  is  selected  when  rejec- 
tion rates  are  high  and  when  the  data  flow  rate  is  low 
(i.e.,  primarily  late  fall  through  early  spring).  When 
screening  is  not  performed  on-line,  data  not  meeting 
JSC  criteria  are  rejected  as  a part  of  the  postprocess- 
ing quality  inspection  function. 

In  a separate  enhancement  relating  to  cloud  cover, 
a revised  correlation  technique  was  implemented  in 
the  summer  of  1976  to  minimize  the  adverse  effects 
of  clouds  and  cloud  shadows.  This  technique  in- 
volved recognition  of  cloud/shadow  areas  within  the 
data  being  processed  and  avoidance  of  these  areas  in 
correlation  processing.  Details  of  this  enhancement 
are  provided  in  the  paper  by  Grebowsky.  In  October 
1976,  GSFC  and  JSC  participated  jointly  in  establish- 
ing a data  link  interface  for  use  in  transmitting  Land- 
sat data  to  JSC.  The  use  of  this  link  has  improved 
data  delivery  time  considerably  and  has  eliminated 
the  inefficiency  and  complexity  of  the  courier  ser- 
vice/airfreight  interface. 

As  the  image  processor  configuration  was  refined 
and  improved,  initial  emphasis  on  tape  storage  of 
data  gave  way  to  a disk-storage  design.  Late  in  the  fall 
of  1976,  the  data  base  that  contained  all  registration 
reference  data  was  established  in  permanent  disk- 
resident  form.  The  reduction  in  reference  data  access 
time  which  resulted  represents  a significant  improve- 
ment in  image  processing  efficiency. 


NEW  CAPABILITIES 

New  requirements  in  supporting  1.ACIE  have 
been  imposed  on  GSFC  at  various  times  over  the  3 
years  LAC1E  has  been  underway.  Responses  to 
these  requirements  have  resulted  in  provision  of 
several  new  acquisition  and  processing  capabilities  at 
GSFC.  These  new  capabilities  combine  with  pre- 
viously described  system  enhancements  to  account 
for  the  evolutionary  development  of  the  current 
DAPTS/GSFC  configuration. 

When  production  processing  began  in  January 
1975.  the  duta  being  processed  bad  actually  been  ac- 
quired some  3 months  earlier,  during  the  fall  of  1974 


Although  initial  DAPTS  requirements  excluded  such 
"retrospective”  processing,  a capability  to  perform 
retrospective  processing  was  added  to  the 
DAPTS/GSFC  configuration  at  JSC  request.  In  order 
to  minimize  the  impact  of  this  processing  mode,  the 
system  was  modified  to  accept  bulk  image  annota- 
tion tapes  generated  during  Landsat  processing  and 
then  to  perform  only  the  additional  calculations  re- 
quired to  produce  the  LACIE  image  annotation 
tapes.  Although  this  approach  helped,  retrospective 
processing  remains  an  expensive  capability,  particu- 
larly in  terms  of  throughput  and  time.  Accordingly, 
GSFC  has  recommended  minimal  use  of  this 
capability. 

Several  of  the  constraints  (ref.  3)  which  were  im- 
posed on  LACIE  site  location  by  the  initial 
DAPTS/GSFC  configuration  were  eliminated  when 
the  DSL  and  the  digital  subsystem  were  modified  to 
allow  more  than  one  pass  through  the  MSS  data 
stream.  As  a result  of  these  changes,  a search  area 
that  overlaps  another  or  that  violates  other  location 
constraints  can  now  be  deferred  and  extracted  in  a 
second  pass  through  the  data  stream.  This  modifica- 
tion has  permitted  JSC  to  obtain  critical  data  that 
were  previously  unavailable.  Although  a second  pass 
involves  rewind  and  reprocessing  delays,  it  does  not 
significantly  reduce  DAPTS/GSFC  throughput. 

An  accompanying  enhancement  recently  installed 
in  the  DSL  eliminates  the  redundant  processing  of 
data  located  within  Landsat  frame  overlap  regions. 
Such  data,  although  of  no  value  to  JSC,  were  proc- 
essed and  transmitted  to  JSC  before  this  change  was 
incorporated.  The  lime  saved  in  not  processing  these 
data  is  now  available  to  process  other  data  that  are 
useful,  thereby  increasing  the  throughput  of  the 
system. 

The  initial  LACIE  concept  called  for  a comple- 
ment of  4800  sites,  with  data  for  most  sites  to  be  ac- 
quired on  a selective  basis.  When  JSC  analysis  plan- 
ning revised  the  requirement  to  involve  full-time 
coverage,  data  overloads  both  at  GSFC  and  JSC  were 
immediately  projected.  Accordingly,  a 
“pseudocoverage"  system  was  established  to  permit 
the  acquisition  of  all  data  but  the  processing  of  only  a 
selected  subset,  with  the  remainder  archived  for 
possible  later  use.  This  pseudocoverage  capability 
was  placed  in  operation  in  August  1975  and  has  been 
used  periodically  since  (hen  to  facilitate  LACIE  data 
requirements  adjustments.  The  psetidocoverage 
capability  does  not  involve  any  ground  system  ele- 
ments except  the  Control  Center,  for  acquisition 
scheduling  purposes. 


After  several  months  of  LACIE  production  proc- 
essing, the  need  to  provide  additional  acquisition  and 
pre  cessing  status  information  to  JSC  became  ap- 
parent. In  response  to  this  need,  a status  report  inter- 
face was  established  to  report  on  ell  data  which  were 
rejected  from  processing  at  GSFC.  These  reports 
were  produced  as  a byproduct  of  the  LACIE  master 
file  update  sequence  and  were  transmitted  to  JSC  on 
a regular  basis.  However,  the  incremental  nature  of 
these  reports  made  them  difficult  to  summarise,  and 
in  the  fall  of  1976,  a cumulative  report  interface  was 
established  to  supplement,  and  then  replace,  the  in- 
cremental reports.  This  cumulative  report  is  actually 
a historical  record  of  the  postprocessing  quality  in- 
spection activity  at  GSFC  and  has  been  provided  to 
JSC  since  1976  on  a monthly  update  basis. 

As  a part  of  the  registration  process,  the  first  data 
set  processed  for  each  site  becomes  the  reference 
data  for  use  in  subsequent  registration  processing. 
Data  of  poor  quality  occasionally  appeared  as 
reference  data,  leading  to  correspondingly  poor  cor- 
relation results.  In  May  1976,  a new  capability  was 
added  to  the  image  processing  system  which  permits 
replacement  of  such  reference  data  without  losing 
registration.  When  directed  by  JSC  requirement  in- 
puts, the  reference  data  for  specified  sites  are  now 
replaced  by  other  data  for  which  registration  has 
been  successfully  accomplished,  thereby  maintaining 
registration  continuity.  This  capability  has  since  been 
used  to  improve  correlation  results  for  a number  of 
sites  suffering  from  the  effects  of  poor-quality 
reference  data.  An  additional  general-purpose  image 
processor  enhancement  requested  by  JSC  at  the 
same  time  involved  manipulation  of  an  annotation 
flag  to  indicate  the  first  data  set  of  each  “biological 
window"  O.c..  acquisition  time),  for  use  in  generat- 
ing appropriate  Him  data  products  at  JSC. 

In  the  summer  of  1976.  analysts  at  JSC  established 
a requirement  for  regular  Landsat  imagery  to  com- 
plement the  sample  segment  data  provided  via 
DAPTS/GSFC.  Arrangements  were  made  with  the 
USDA  facility  in  Salt  Lake  City  to  provide  the  re- 
quired imagery  as  a part  of  their  Landsat  data  dis- 
semination function.  In  order  to  Support  this  ar- 
rangement, the  Landsat  processing  system  at  GSFC 
was  modified  to  produce  a work  order  that  identified 
each  Landsat  scene  from  which  one  or  more  LACIE 
sample  segments  had  been  extracted.  Production  of 
this  work  order  continued  until  late  1977,  by  which 
time  USDA  capabilities  had  been  upgraded  to  elimi- 
nate the  need  for  this  information. 

Landsat-3  was  successfully  launched  on  March  5, 
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1978,  making  five-channel  MSS  data  available  to 
Landsat  users  for  the  first  time.  However,  extensive 
LACIE  system  modifications  would  have  been 
necessary  both  at  GSFC  and  at  JSC  in  order  to  han- 
dle the  fifth-band  data.  As  a result,  the  LACIE 
system  at  GSFC  was  modifier*  to  exclude  the  fifth 
channel  and  to  extract  and  process  Landsat-3  data 
only  from  channels  4 through  7.  Further  modifica- 
tions were  made  to  permit  two-satellite  requisition 
and  processing  activities,  thereby  allowing  more  fre- 
quent coverage  over  selected  LACIE  sites.  These 
modifications  were  completed  in  the  summer  of 
1978. 


FUTURE  ENHANCEMENTS 

Throughout  1978,  the  Landsat  ground  system  at 
GSFC  has  been  undergoing  conversion  from  the 
familiar  hybrid  processing  system  to  a new  all-digital 
configuration.  A major  element  in  this  conversion 
involves  the  new  master  data  processor,  which  will 
significantly  increase  the  GSFC  capability  to 
geometrically  correct  and  register  image  data.  When 
fully  operational,  this  system  is  expected  to  displace 
the  current  DAPTS/GSFC  configuration  as  the 
source  of  Landsat  data  for  LACIE,  with  standard 
full-frame  data  replacing  the  subframe  sample  seg- 
ments currently  transmitted  to  JSC.  Toward  that  ob- 
jective, efforts  have  been  underway  since  mid-1977 
to  establish  the  ground-control-point  data  base 
needed  to  support  master  data  processor  registration 
in  regions  of  LACIE  interest.  Activities  are  also  un- 
derway to  establish  the  high-density  tape  interface 
through  which  master  data  processor  output  will  be 
transmitted  to  LACIE  users  and  to  accept  and 
preprocess  Landsat  data  in  this  form  at  various  user 
facilities.  Several  longer  range  planning  exercises 
have  also  been  undertaken  jointly  by  GSFC,  JSC, 
and  other  LACIE  participants  to  anticipate  and  pre- 
pare for  LACIE  or  follow-on  activities  in  the  early 
1980's. 


PRODUCTION  SUMMARY 

In  the  final  analysis,  a summary  of  LACIE  proc- 
essing accomplishments  at  GSFC  best  indicates  the 
success  of  the  DAPTS/GSFC  endeavor,  between 
January  1975  and  May  1978,  some  130000  data  sets 
have  been  acquired  to  support  LACIE,  with  more 
than  45  000  sample  segments  successfully  extracted 


and  transmitted  to  JSC.  The  system  responsible  Tor 
this  performance  was  effectively  implemented  in  a 1- 
year  period  during  1974,  at  a cost  of  roughly 
$600  (XX).  This  effort  included  the  research  and 
development  of  a heretofore  untried  registration 
technique  which  has  since  performed  beyond  all  ex- 
pectations. Registration  performance  throughout  the 
3-year  period  of  LACIE  operations  has  satisfied  the 
1-pixel  root-mean-square  requirement  established  in 
1974,  with  more  than  two  of  every  three  attempts  at 
data  registration  proving  successful,  notwithstanding 
the  data  cosmetic  faults  or  content  inadequacies  to 
which  the  process  is  inherently  susceptible.  The 
cloud/snow  rejection  rate  experienced  throughout 
the  last  3 years  has  approached  50  percent,  as  ex- 
pected in  most  Landsat  data  use  situations.  A 
detailed  summary  of  production  processing  perform- 
ance in  each  year  of  LACIE  operation  is  provided  in 
table  II. 


Table  II. — DA  PTSfGSFC  Performance  Summary 


EltiUlttui 

Jim 

in 

Sept.  /9T5 
LACIE 
those  1 

Oil.  IV'' 
in 

Sept.  IV'h 
LACIE 
lhasell 

Oil.  IV'b 

III 

Sept.  IV" 
LACIE 
those  III 

Number  of 
acquisitions 

7380 

31  296 

63  568 

Number  of  search 
areas  extracted 

3979 

IS  676 

30  438 

Number  of  sample 
segments  transmitted 
to  JSC 

2h02 

10  64S 

22  7)8 

Rale  of  cloud  snow 
rejections,  percent 

14 

52 

49 

Rate  of  correlation 
rejections,  percent 

14 

9 

4 

Rate  of  miscellaneous 
rejections,  percent 

12 

5 

11 
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Ancillary  Data  Acquisition  for  LACIE 
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INTRODUCTION 

The  design,  implementation,  and  operational 
functions  of  the  three  phases  of  LACIE  required 
several  types  of  data  in  addition  to  Landsat 
multispectral  digital  data.  This  paper  will  summarize 
the  types  of  data  required,  cover  the  various  collec- 
tion processes,  and  describe  the  procedures  for  ob- 
taining the  data  for  the  user. 

The  data  required  by  the  users  in  the  project  fell 
into  four  main  categories:  ancillary  data  packets,  full- 
frame  Landsat  imagery,  intensive  test  site  (ITS) 
ground  observations,  and  blind  site  data.  - 

To  aid  in  the  computerized  classification  process, 
a packet  including  the  following  ancillary  data  was 
needed: 

1.  Statistical  data  for  all  crops  in  each  of  the  coun- 
tries for  a period  of  at  least  IS  years  at  the  lowest  po- 
litical subdivision 

2.  Agronomic  data  describing  farming  and  crop 
rotation  practices  in  the  wheat  growing  areas  for  each 
of  the  LACIE  countries 

3.  Soil,  topographic,  political  subdivision,  and 
crop  density  maps  for  areas  of  interest  in  each  coun- 
try 

4.  Yearly  phenological  crop  development  data  for 
each  crop  in  the  area  of  interest  for  each  of  the  eight 
countries  for  at  least  10  years 

5.  Current-year  phenological  crop  development 
reports  plus  periodicals  and  annual  statistical  reports 
for  all  crops 

To  develop  the  sampling  strategy  and  to  support 
crop  assessment,  full-frame  color-infrared  (CIR) 
Landsat  imagery  needed  to  be  taken  throughout  the 
crop  season. 

To  improve  LACIE  procedures,  certain  intensive 
test  sites  in  the  wheat  growing  regions  of  the  United 


aUSDA  Agricultural  Stabilization  and  Conservation  Service. 
Houston,  Texas. 

'’NASA  Johnson  Space  Center.  Houston,  Texas. 


States  and  Canada  were  selected  for  which  the 
following  data  were  collected: 

1.  Land  use  inventories 

2.  Periodic  crop  observations 

3.  Solar  radiometer  measurements 

4.  Rainfall 

5.  Wheat  yield  for  selected  fields 

To  assess  the  accuracy  of  the  LACIE  results,  some 
operational  segments  in  the  U.S.  Great  Plains  were 
designated  blind  sites  for  which  ground  truth  was 
collected;  this  ground  truth  consisted  of  land  use  in- 
ventories and  wheat  development  estimates. 


ANCILLARY  DATA  PACKETS 

During  the  design  stage  of  the  project,  it  was 
decided  that  ground-observed  data  would  not  be  used 
by  the  analyst  in  the  computer  training  classification 
process.  Instead,  other  supporting  data  would  be 
used  in  the  identification  of  crops  used  to  train  the 
classifier.  These  ancillary  data  had  to  be  provided  for 
each  segment  of  each  country  being  worked. 

The  requirements  for  ancillary  data  were  estab- 
lished using  U.S.  data  sources  as  a guide.  The  ancil- 
lary data  package  consisted  of  2 years  of  recent 
statistics  on  all  crops  grown  in  the  county  where  the 
segment  was  located,  a summary  of  farming  prac- 
tices and  crop  rotations  for  the  general  area,  a 
description  of  the  general  soil  type  and  productivity, 
a nominal  phenological  crop  calendar  for  all  crops 
grown  in  the  segment,  and  various  large-  to  medium- 
scale  topographic  maps. 

The  ancillary  data  packets  for  the  United  States 
were  developed  in-house  from  data  that  had  been  ob- 
tained through  various  contacts  with  local,  state,  and 
federal  agricultural  agencies.  All  of  the  data  used  in 
the  packet  preparation  had  to  be  extracted  from  the 
reference  sources  and  reformatted  or  summarized  to 
meet  format  requirements.  This  task  was  time  con- 
suming, and  the  work  required  close  coordination 
between  the  preparers  and  the  data  analyst. 
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During  Phase  I of  LAC1E,  U.S.  segments  and  a 
few  scattered  segments  throughout  the  other  seven 
countries  were  processed.  The  ancillary  data  for  the 
foreign  segments  were  more  difficult.  Statistical  data 
sets  could  not  be  found  to  meet  all  project  needs — 
they  were  either  nonexistent  as  in  China,  incomplete 
as  in  Russia,  or  very  difficult  to  obtain  as  in  India. 
Very  little  had  been  published  about  farming  prac- 
tices in  any  of  the  foreign  countries,  and  no 
phenological  data  were  available  to  develop  segment 
crop  calendars.  Small-scale  maps  were  available  but 
medium-  or  large-scale  maps  were  impossible  to  get 
for  most  of  the  areas  of  interest.  Many  hours  were 
spent  during  Phase  I on  developing  ancillary  data  for 
the  few  foreign  segments  that  were  worked. 

As  LACIE  progressed  from  phase  to  phase,  com- 
promises were  made  on  many  of  the  data  require- 
ments or  substitutes  were  developed  to  replace  im- 
portant items. 

The  collection  of  long-term  detailed  historical  data 
of  the  type  required  for  a sampling  approach  was  a 
paradox.  If  the  data  had  been  readily  available,  the 
need  for  the  new  technology  would  not  have  been  ap- 
parent; without  the  data,  an  optimum  sampling 
strategy  to  produce  an  accurate  production  report 
could  not  be  designed.  Therefore,  any  and  all  types  of 
historical  crop  data  at  any  political  level  for  the  coun- 
tries involved  were  obtained,  hoping  that  from  this 
mixed  assemblage  of  statistics  of  varying  degrees  of 
detail  and  varying  degrees  of  accuracy  a decent  sam- 
pling strategy  could  be  devised. 


FULL-FRAME  LAND8AT  IMAGERY 

Full-frame  Landsat  data  had  been  collected  for 
almost  2 years  prior  to  LACIE  Phase  I.  Landsat  data 
had  been  collected  at  least  once  over  most  of  the  area 
defined  in  the  LACIE  countries.  The  U.S.  Air  Force 
(USAF)  operational  navigations  chart  (ONC)  maps 
(1:1  000  000  scale)  that  were  available  and  useful  did 
not  show  agricultural  areas,  field  sizes,  or  patterns,  so 
a requirement  was  generated  to  use  black  and  white 
9-  by  9-inch  single-band  full-frame  Landsat  imagery 
prints  to  help  delineate  the  areas  of  interest.  Data 
searches  were  made  and  the  imagery  was  produced 
by  the  Aerial  Photograph  Field  Office  (APFO)  of  the 
U.S.  Department  of  Agriculture  (USDA)  at  Salt  Lake 
City,  Utah.  The  Cartographic  Section  of  the  Earth 
Observations  Division  (EOD)  of  the  NASA  Johnson 
Space  Center  (JSC)  prepared  sectional  mosaics  using 


the  ONC's  as  a base.  This  product  was  used  in  defin- 
ing the  agricultural  areas,  delineating  the  sample 
frame,  and  developing  data  to  be  used  in  the  sam- 
pling strategy. 

When  it  was  discovered  that  the  sampling  strategy 
had  to  be  refined  and  the  ancillary  data  could  not  be 
provided  to  meet  specifications,  fult-frame  C1R 
transparencies  were  considered  and  used  by  the  proj- 
ect. A set  of  CIR  imagery  was  produced  to  cover 
each  of  the  wheat-growing  areas.  This  imagery  was 
used  to  redefine  the  agricultural  and  nonagricultural 
areas.  This  determination  was  then  factored  into  an 
improved  stratification  and  sampling  frame. 

Since  the  ancillary  data  were  not  complete  or 
satisfactory,  the  CIR  imagery  was  also  incorporated 
into  the  analysis  procedures  and  crop  assessment  ac- 
tivities of  the  experiment.  A requirement  was 
defined  to  obtain  coverage  for  at  least  one  full-frame 
image  with  less  than  20-percent  cloud  cover  for  each 
LACIE  biowindow.  The  APFO  did  not  have  access 
to  a statusing  system  for  full-frame  coverage  that 
would  meet  the  timing  requirements— data  within  14 
to  18  days  of  acquisition.  The  NASA  Goddard  Space 
Flight  Center  (GSFC)  implemented  a work  order 
system  whereby  they  provided  APFO  with  the  iden- 
tity of  eav  frame  containing  LACIE  segments.  This 
work  order  was  shipped  to  APFO  with  the  archival 
rolls  of  70-millimeter  film  used  by  APFO  to  generate 
the  LACIE  product.  This  system  provided  a method 
to  identify  in  a timely  manner  the  base  product,  but 
the  turnaround  was  still  21  to  30  days  from  date  of 
acquisition  to  receipt  of  CIR  imagery.  The  system 
still  required  many  manhours  of  manual  labor  at  the 
APFO  to  select  the  frames  to  be  produced.  In  the 
middle  of  Phase  III,  APFO  implemented  a system  to 
generate  LACIE  full-frame  work  orders  directly 
from  an  in-house  data  base  updated  with  a GSFC  up- 
date tape  received  with  the  roll  of  film.  By  the  end  of 
Phase  III,  LACIE  was  receiving  imagery  within  18  to 
25  days  from  the  date  of  acquisition. 

In  addition  to  the  CIR  data  received  to  support  the 
acreage  analysis,  additional  data  were  also  ordered 
from  APFO  and  GSFC  to  support  crop  assessment 
situations,  such  as  the  drought  in  the  U.S.  Great 
Plains  that  occurred  in  1976.  These  data  orders  were 
included  in  the  work  order  system  by  changing  the 
criterion  from  less  than  20-percent  cloud  cover  to 
less  than  50-peicent  cloud  cover  and  getting  the  data 
for  each  overpass  of  both  Landsat  l and  Landsat  2. 
During  the  three  phases  of  the  experiment,  over  9000 
frames  of  imagery  were  generated  by  APFO  and 
GSFC. 
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INTENSIVE  TEST  SITES 

Intensive  test  sites  (ITS's)  were  selected  and  data 
collection  requirements  were  established  to  provide 
ground-observed  data  for  procedural  development 
and  accuracy  evaluation  for  LACIE.  Landsat  experi- 
ments conducted  by  EOD  prior  to  LACIE  had 
shown  the  importance  of  acquiring  ground-observed 
data  from  areas  where  Landsat  data  were  being  taken 
for  processing.  The  experience  gained  in  these  proj- 
ects was  incorporated  into  the  development  of  the 
LACIE  requirements  and  collection  procedures.  The 
size,  number,  and  location  of  the  sites  were  estab- 
lished by  the  LACIE  Intensive  Study  Area  Task 
Group.  In  addition  to  the  U.S.  sites.  10  sites  in 
Canada  were  made  a part  of  the  LACIE  ITS  program 
by  using  an  existing  agreement  between  the  Cana- 
dian Centre  for  Remote  Sensing  (CCRS)  and  the 
USDA. 

Procedures  were  developed  to  handle  the  ITS  data 
in  such  a manner  that  they  could  be  readily 
reproduced  and  made  available  to  users  within  I 
week  after  receipt.  The  data  collected  from  these 
sites  were  shared  by  USDA,  NASA,  and  the  CCP.S. 
USDA  entered  the  data  in  a master  data  file  and  pro- 
vided the  project  a tape  file  and  printout  of  the  com- 
plete data  base  at  the  end  of  each  crop  year. 

There  were  42  ITS's  in  Phase  1, 37  in  Phase  II.  and 
34  in  Phase  111.  Changes  were  made  between  phases 
because  of  changes  in  workload.  Data  from  30  of  the 
sites  were  received  in  each  of  the  three  phases. 

Methods  of  data  handling  for  the  ITS's  gradually 
changed  throughout  the  development  of  LACIE  pri- 
marily because  of  changing  participants  and  scope  of 
involvement  by  each  group.  During  the  1974-75  crop 
year,  the  USDA  LACIE  Project  Office  was  the 
organization  responsible  for  collecting  all  field  obser- 
vation data;  however,  most  of  the  management  of 
this  function  was  transferred  by  contract  to  the  Earth 
Satellite  Corporation.  This  function  included  prepar- 
ing data  forms,  training  USDA  and  Canadian  field 
personnel,  checking  data  for  inconsistency  and  er- 
rors, and  preparing  a compilation  of  the  field  data  at 
the  end  of  the  crop  year.  JSC  was  responsible  for  ob- 
taining high-altitude  aerial  photography  and  subse- 
quently preparing  all  rectified  prints  and  field  bound- 
ary maps  for  each  ITS.  Copies  of  all  field  observation 
data  were  sent  to  JSC  to  be  logged  and  put  into  a data 
library  for  use  by  LACIE  personnel. 

During  the  next  two  crop  years  (1975-76  and 
1976-77),  the  USDA  elected  to  manage  directly  th: 
field  observations  program,  including  preparation  of 


data  forms,  production  of  instruction  manuals,  and 
compilation  of  computerized  field  data.  The  NASA 
functions  continued  to  be  a JSC  responsibility,  with 
the  addition  of  building  and  calibrating  solar 
radiometer  instruments  for  use  at  each  ITS.  The 
types  of  data  reported  for  each  ITS  were  land  use  in- 
ventories, periodic  crop  observations,  solar 
radiometer,  rainfall,  and  wheat  yield  for  selected 
fields.  The  site  inventories  and  periodic  crop  obser- 
vations were  the  most  important  data  obtained,  and 
these  will  be  described  in  more  detail  in  the  following 
paragraphs. 

All  ITS's  received  a complete  “wall-to-wall”  in- 
ventory once  every  crop  year.  This  task  was  carried 
out  by  USDA  or  Canadian  personnel,  usually  in  May 
or  June,  depending  on  site  location.  An  add;*ional 
fall  inventory  was  taken  at  the  sites  containing 
winter  wheat.  The  fall  inventory  of  the  winter  wheat 
sites  identified  the  content  and  field  boundaries  for 
all  fall-planted  ctops  such  as  winter  wheat,  barley,  or 
rye  along  with  the  following  information  for  each 
planted  field:  acreage,  crop  and  variety  being  grown, 
irrigation  method  (if  applicable),  fertilizer  used,  and 
planting  date.  The  annotated  photographs  or  field 
maps  and  tabular  data  were  forwarded  to  JSC  for 
duplication  and  distribution.  The  spring  inventories 
were  scheduled  to  begin  after  spring  planting  was 
complete  and  before  winter  wheal  harvest  began. 
The  inventory  included  the  same  type  information 
as  in  the  fall  inventory;  however,  it  was  for  all  crop 
types  and  current  land  use  status  within  the  site. 
During  the  growing  season,  high-altituue  aerial 
photography  was  acquired  over  each  site.  and.  after  it 
was  processed  and  screened,  the  imagery  was  rec- 
tified and  scaled.  These  data  were  then  combined 
with  the  annotated  photographs  or  field  maps  and 
tabular  data  forwarded  to  JSC  by  the  field  personnel, 
and  an  updated  photographic  overlay  containing  cur- 
rent field  boundaries  and  identification  was  pro- 
duced. These  new  photographs,  overlays,  and  field 
maps  constituted  the  data  base  for  the  next  site  in- 
ventory as  well  as  a reference  for  current-year  data 
processing. 

The  periodic  »'8-day  cycle)  observations  within 
an  ITS  were  sen  uled  on  the  days  of  the  Landsat 
overpass  for  that  site.  The  observations  of  approx- 
imately 50  fields  within  the  site  provided  a record  of 
the  crop  changes  for  these  specific  fields  throughout 
the  growing  season.  They  began  with  the  planting  of 
fall  crops  or  with  spring  crop  planting  where  there 
were  no  fall  crops,  and  they  continued  through  Sep- 
tember or  spring  wheat  harvest.  The  field  personnel 
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at  each  site  selected  approximately  SO  fields  of  which 
about  half  contained  wheat  and  the  remainder  a 
representative  sample  of  the  other  major  crops 
grown  in  that  site.  The  periodic  record  of  crop 
development  throughout  the  growing  season  in* 
eluded  information  such  as  plant  growth  stage,  per- 
cent ground  cover,  plant  height,  surface  moisture 
conditions,  weed  growth,  field  operations  (farming 
activities  in  progress),  disease  or  insect  problems, 
and  estimated  crop  quality  rating. 

During  the  first  two  project  years,  35-millimeier 
color  photographs  and  solar  radiometer  measure- 
ments were  taken.  The  photography  was  taken  at 
each  selected  field  on  the  observation  dates.  The 
solar  radiometer  measurements  were  made  at  the 
scheduled  Landsat  overpass  time  using  equipment 
provided  by  JSC.  This  information  would  permit 
evaluation  of  atmospheric  interference  effects  on  the 
Landsat  data  being  acquired.  Rainfall  was  recorded 
from  a network  of  gauges  spaced  throughout  the  test 
site  and  reported  with  the  next  periodic  observation. 
All  periodic  observation  data  were  forwarded  to  JSC, 
where  they  were  recorded  and  duplicated  for  dis- 
tribution to  the  appropriate  users  and  to  the  LACIE 
data  library. 

Wheat  yield  data  were  reported  at  the  end  of  the 
crop  year  for  i minimum  of  10  of  the  observed  wheat 
fields.  The  fields  selected  were  representative  of  the 
yield  values  within  the  particular  site  The  reported 
data  were  yield  estimates  by  the  farmers  or  the 
USDA  rather  than  actual  production  figures.  The 
lack  of  specific  values  was  due  to  the  fact  that  the 
harvested  production  was  not  isolated  and  available 
on  a per-field  basis.  The  estimated  yields  were 
therefore  subjective  and  of  varying  accuracy. 


BUND  SITE  DATA 

The  principal  objective  of  the  LACIE  was  to  as- 
semble, operate,  and  evaluate  the  remote-sensing 
technology  for  providing  country-level  wheat  pro- 
duction estimates.  A necessary  part  of  this  experi- 
ment was  to  evaluate  results  and  assess  errors  in 
order  to  improve  subsequent  operational  systems.  A 
phased  approach  was  chosen  which  involved  expan- 
sion in  two  directions:  in  the  technical  complexity  of 
the  functions  performed  and  in  the  geographic  size 
and  difficulty  of  the  area  being  surveyed.  The  techni- 
cal evaluation  tasks  were  performed  primarily  by  an 
Accurary  Assessment  Team. 

The  LACIE  Phase  I Accuracy  Assessment  ac- 


tivity in  the  United  States  concentrated  on  the 
analysis  of  the  ITS  data.  In  order  to  better  assess  the 
LACIE  operations,  some  regular  LACIE  segments, 
so-called  blind  sites,  were  “ground  truthed."  The  ex- 
pression “blind  site"  was  merely  a designation  ap- 
plied to  selected  aggregatable  segments  for  which, 
unknown  to  the  analyst,  ground-observed  data  were 
acquired  for  subsequent  evaluation  purposes.  The 
implementation  of  this  approach  occurred  late  in  the 
growing  season  of  LACIE  Phase  L Thus,  all  of  the 
selected  sites  fell  in  the  northern  spring  wheat 
regions. 

High-resolution  CIR  aerial  photography  was  ac- 
quired over  29  LACIE  segments  in  North  Dakota 
and  Montana  in  mid-August  197S.  Simultaneously, 
field  teams  were  collecting  ground  information  for  a 
substantial  portion  of  these  segments.  These  date 
were  combined  to  obtain  both  field  and  total  segment 
ground-observed  data. 

The  Phase  I procedure  for  obtaining  these  blind 
site  ground  data  was  relatively  costly  and  cumber- 
some because  it  involved  using  personnel  from  JSC, 
which  necessitated  travel  and  Add  operation  ex- 
penses. Because  of  the  usefulness  of  the  data, 
however,  a much  larger  number  of  blind  sites,  with 
wider  distribution,  was  proposed  for  Phase  II.  This 
required  that  a new  and  more  cost-effective  pro- 
cedure be  developed  for  obtaining  a field  data  set  of 
this  larger  magnitude. 

The  new  procedure  used  USDA  personnel  in  the 
counties  to  obtain  the  actual  field  data.  This  provided 
field  observers  who  were  more  experienced  and 
readily  available.  The  participating  agency  was  the 
Agricultural  Stabilization  and  Conservation  Service 
(ASCS)  of  the  USDA.  In  addition  to  providing  a 
larger  data  set,  this  method  allowed  some  of  the 
winter  wheat  sites  to  be  surveyed  twice  during  the 
growing  season  to  evaluate  the  LACIE  technology 
for  early-season  as  well  as  at-harvest  wheat  assess- 
ment. 

Color-infrared  aerial  photography  was  taken  for 
the  new  procedure  by  NASA  aircraft  at  an  altitude  of 
20  000  to  25  000  feet  along  two  flight  lines  over  each 
proposed  site.  The  developed  film  was  then  screened 
and  specific  frames  (usually  about  four  per  site)  were 
selected  to  use  for  print  enlargements  of  the  site. 
After  the  prints  were  made,  a frosted  plastic  overlay 
was  attached  to  each  print  for  use  in  recording  field 
identification  data.  The  overlay  material  was  suffi- 
ciently transparent  to  show  the  field  patterns  on  the 
aerial  photographic  prims  and  had  a Suitable  texture 
to  permit  writing  on  the  overlay. 
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The  site  boundaries  were  drawn  on  the  overlays 
by  JSC  personnel  and  these  prints,  along  with  ap- 
propriate instructions  and  examples,  were  sent  to  the 
ASCS  county  offices  in  the  areas  where  blind  sites 
had  been  selected.  The  type  of  data  recorded  was  a 
crop  or  land  use  code  for  each  field  or  area  within  the 
site.  A simple  but  uniform  set  of  crop  codes  was  used 
for  all  sites  to  simplify  data  interpretation  when  the 
prints  and  overlays  were  returned  to  JSC  for 
analysis.  In  addition  to  identifying  the  crops  within 
the  site,  each  ASCS  participant  was  asked  to  com- 
plete a two-page  questionnaire  containing  a few  com- 
ments about  weather,  insect,  or  disease  conditions 
affecting  the  wheat  crop  within  the  site.  Other  items 
included  were  an  estimate  of  the  stage  of  wheat 
development  at  that  date  compared  to  “average” 
years  and  a section  to  identify  any  special  crop  or 
land  use  codes  used  on  the  overlay. 

In  Phase  II  (1975-76  crop  year),  there  were  40 
early-season  blind  site  inventories  throughout  the 
Southern  Great  Plains  states  and  168  blind  site  inven- 
tories prior  to  or  at  wheat  harvest.  Thirty-seven  of 
these  late-season  inventories  were  actually  revisits  to 
sites  previously  surveyed.  This  provided  an  indica- 
tion of  the  final  disposition  of  winter  wheat  fields 
identified  in  the  early-season  inventories.  Some  of 
these  fields  were  plowed  under,  grazed  by  livestock, 
replanted  to  other  crops,  or  allowed  to  mature  to  har- 
vest depending  on  many  factors  such  as  stand  quality 
of  the  field,  farming  practices,  weather  growing  con- 
ditions, and  economic  conditions  affecting  wheat 
production. 

For  Phase  III  (1976-77  crop  year),  there  were  67 
early-season  site  inventories  and  202  site  inventories 
near  harvest.  Fifty  of  the  late-season  inventories 
were  repeats  of  the  earlier  set  of  sites  surveyed.  Two 
changes  or  improvements  were  incorporated  into  the 
procedures  for  the  Phase  III  blind  sites.  The  first  was 
to  use  the  NASA  RB57  aircraft  to  photograph  the 
sites  from  an  altitude  of  50  000  to  60  000  feet.  This 
permitted  complete  photographic  coverage  of  a site 
on  a single  frame  of  film.  This  procedure  decreased 
the  number  of  print  enlargements  required  and 
simplified  the  field  survey  by  having  only  one  prim 
to  annotate.  As  in  Phase  II,  frosted  plastic  overlays 
were  attached  to  the  prints  to  be  used  by  the  field  ob- 
servers to  note  crop  codes.  The  second  change  incor- 
porated in  Phase  ill  was  the  selection  of  15  wheat 
fields  within  each  site  and  periodic  reporting  of  their 
develnoment  status.  These  field  reports  were 
scheduled  to  correspond  with  the  periodic  (every  18 
days)  Landsat  passes  for  the  given  site.  Parameters 


such  as  plant  height,  percent  ground  cover,  and  drill 
(or  row)  spacing  were  helpful  in  correlating  and  In- 
terpreting the  Lnndsat  imagery  acquired  throughout 
the  growing  season. 

After  receipt  at  JSC  of  the  annotated  print  over- 
lays and  the  questionnaires,  these  items  were 
transmitted  to  the  Accuracy  Assessment  group  for 
preparation  for  analysis.  This  included  verification 
of  crop  codes  and  outlining  of  the  fields  containing 
wheat  or  other  crops  of  interest.  The  next  major  task 
was  to  planimeter  the  photographic  overlays  to 
determine  the  relative  areas  of  different  crop  types 
within  the  site.  The  change  in  Phase  III  to  the  use  of 
high-altitude  photography  and  the  resulting  single 
print  per  site  greatly  reduced  the  manhours  required 
fot  preparation  and  also  resulted  in  a reduction  of 
computational  errors  in  the  utilization  of  blind  site 
data. 


SUMMARY 

The  functions  performed  by  the  data  acquisition 
subsystem  during  the  three  phases  of  LACIE  sup- 
ported the  data  needs  of  all  other  elements  of  the 
project.  The  nonelectronic  data  base  consisted  of 
statistical  data,  printed  reports,  periodicals,  ground- 
observed  data  received  from  intensive  test  sites  and 
operational  segments,  and  full-frame  multispectral 
scanner  CIR  photographs.  Requirements  were 
received  as  part  of  an  overall  project  requirements 
document;  they  were  written  into  an  implementation 
plan;  and  finally  they  were  satisfied  within  the  sub- 
system or  farmed  out  to  be  implemented  by  other 
units  of  the  project. 

Collecting,  statusing,  and  providing  the  data  re- 
quired by  LACIE  was  an  enormous  task.  Require- 
ments were  constantly  changing  because  of  the 
dynamic  nature  of  the  project.  The  volume  of  data 
handled  far  exceeded  initial  estimates.  The  data  pro- 
vided were  used  to  develop  procedures  to  operate  a 
system  and  subsequently  test  the  results  to  deter- 
mine whether  it  was  possible  to  use  remotely  sensed 
data  to  inventory  a crop  (wheat). 
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ABSTRACT 

The  data-handling  techniques  that  were  imple- 
mented to  facilitate  processing  of  Landsat 
multispectrai  data  between  1975  and  1978  are  de- 
scribed in  this  paper.  The  data  that  were  handled  dur- 
ing the  LACIE  and  the  storage  mechanisms  used  for 
the  various  types  of  data  are  defined.  The  overall 
data  flow,  from  placing  the  Landsat  data  orders 
through  the  actual  analysis  of  the  data  set,  is  dis- 
cussed. An  overview  of  the  status  and  tracking 
system  that  was  developed  and  of  the  data  base 
maintenance  and  operational  task  is  provided. 
Finally,  the  archiving  of  the  LACIE  data  is  ex- 
plained. The  perspective  gained  by  the  adoption  of 
this  data-handling  framework  will  be  helpful  in  ad- 
dressing these  specific  areas  of  system  design  in 
future  applications. 


INTRODUCTION 

Until  recently,  far  more  Earth  resources  applica- 
tion data  had  been  collected  than  could  be  practica- 
bly managed  and  utilized  in  a cost-effective  manner. 
During  1977,  17000  Landsat  acquisitions  were  ar- 
rayed in  mass  storage  for  LACIE.  It  is  always  possi- 
ble to  store  information  randomly  or  in  arrival  se- 
quence and  to  retrieve  it  by  an  exhaustive  search; 
however,  the  disadvantages  are  obvious.  It  is  also 
possible  to  file  all  information  orderly  and  search  for 
it  sequentially.  Indeed,  there  was  little  other  cnoice 
before  direct-access  memory  was  introduced.  To 
fully  exploit  the  potential  value  of  the  Landsat  data 
collected  every  18  days,  the  most  rapid,  cost-effective 
data-handling  methods  available  must  be  used. 
Therefore,  the  LACIE  data-handling  system  evolved 
from  existing  individual  data-processing  component 
systems  used  in  varying  remote-sensing  disciplines. 


aLockheed  Electronics  Company,  Inc.,  Systems  and  Services 
Division,  Houston,  Texas. 


These  various  information  constituents  were 
modified,  transformed,  and  integrated  into  the 
LACIE  data-handling  methodology. 

The  design  of  the  LACIE  data-handling  system 
was  predicated  on  the  concept  of  man-machine  in- 
teraction. The  objective  of  the  design  was  to  provide 
the  LACIE  analyst  a complete  array  of  analysis  and 
interpretation  tools  to  interact  with  and  operate  on 
the  available  data.  To  fulfill  this  task,  a subsystem 
was  created  to  be  responsible  for  the  active  collec- 
tion, organization,  storage,  statusing,  retrieval,  and 
dissemination  of  remotely  sensed  data.  Although  the 
primary  use  of  these  data  was  in  direct  support  of 
LACIE,  the  preservation  of  such  data  for  future  use 
by  various  secondary  user  groups  has  been  ensured. 
This  paper  briefly  outlines  the  nature  of  both  the 
electronic  and  physical  aids  that  were  utilized  to  pro- 
cess Landsat  data,  the  data-processing  system 
developed  to  process  these  data,  the  interfaces  in  the 
use  of  the  information,  and  the  integrated  informa- 
tion system  to  support  the  use  of  the  data. 

DATA  DEFINITION 

The  LACIE  data-handling  system  was  developed 
to  manage  three  basic  forms  of  remotely  sensed  data: 
electronic  data,  physical  data,  and  derived  data. 

The  electronic  data  entered  the  system  in  Landsat 
multispectrai  digital  format  on  nine-track  computer- 
compatible  tapes  (CCT’s).  These  data  are  input 
directly  into  disk  storage  to  provide  interactive  dis- 
play capability  and  mass  storage  of  an  entire  crop 
year’s  Landsat  acquisition  history.  The  electronic 
data  were  primarily  handled  and  statused  automat- 
ically. 

The  physical  data  consisted  of  spacecraft  imagery, 
aircraft  photography,  field  observation  data,  crop 
calendars,  topographic  maps  at  several  scales,  and 
ancillary  summary  data.  These  data  were  manually 
handled  and  automatically  statused.  Much  of  the 
physical  data  was  placed  into  LACIE  segment 


163 


packets  for  Classification  and  Mensuration  Sub- 
system (CAMS)  analyst  utilization.  The  field  obser- 
vation data  and  aircraft  photography  were  provided 
to  the  Accuracy  Assessment  Subsystem  for  use  in 
evaluating  analyst  estimations. 

The  derived  data  appeared  as  several  types  of 
computer  printouts,  microfiche,  spectral  aids, 
classification  and  cluster  map  film  products,  and 
CCT’s.  The  final  derived  data  were  the  wheat  propor- 
tion estimates  that  were  forwarded  to  the  Crop 
Assessment  Subsystem  (CAS)  for  the  generation  of 
crop  production  reports.  The  Accuracy  Assessment 
Subsystem  was  provided  the  classification  and 
cluster  tapes,  batch  run  decks,  and  statistical  results 
output  tapes. 


DATASTORAGE 

The  principal  function  of  the  LACIE  data- 
handling  system  was  to  provide  a contingent  of  in- 
formation as  required  in  a timely  fashion,  and  to  ex- 
tract that  information  from  the  electronic  and  physi- 
cal data  repositories  in  an  orderly  and  consistent 
manner.  The  implementation  of  this  objective 
resulted  in  the  establishment  of  an  on-line  mass  disk 
data  storage  system  to  accommodate  the  electronic 
Landsat  imagery  and  the  creation  of  a 4000-square- 
foot  LACIE  Physical  Data  Library  (LPDL)  to  man- 
age and  store  the  comprehensive  physical  data  set. 

The  electronic  storage  capability  is  centered 
around  an  IBM  System  360  Model  75J  computer. 
The  system  is  located  in  the  Mission  Control  Center 
(MCC)  and  consists  of  equipment  which  was  origi- 
nally used  for  the  Apollo  lunar  landing  project.  The 
core  memory  of  the  360-75  is  .upplemented  with  42 
packs/drives  of  7330  disk  storage,  providing  direct- 
access  storage  for  more  than  4200  megabytes  of  data. 
The  companion  disk  packs  are  removable  and  in- 
terchangeable between  the  7330  disks.  Each  pack 
contains  11  disks  with  20  recording  surfaces,  giving 
more  than  100  megabytes  of  data  storage. 

Initial  requirements  for  data  storage  were  slightly 
shortsighted  in  that  a clean  and  abrupt  transition  was 
expected  for  data  acquisition  and  analysis  from  one 
crop-growing  season  to  the  next.  Additionally,  no 
provisions  were  made  for  maintaining  data  acquired 
from  previous  years  on-line  for  research  and 
development  purposes.  These  data  were  available 
only  from  stored  data  tapes,  thus  increasing  the  com- 


plexity and  time  involved  in  working  with  previous 
years'  data  sets. 

This  situation  was  rectified  by  expanding  the  disk 
storage  space  available  and  building  a separate  data 
base  for  each  LACIE  crop  season.  A research  data 
base  was  also  created  so  that  data  sets  not  maintained 
for  operations  would  be  available  for  analysis.  These 
added  flexibilities  greatly  enhanced  ease  of  access  to 
data,  minimizing  operational  problems  involved 
with  processing  such  large  quantities  of  data. 

A specialized  technical  library,  augmented  by  an 
automated  status  and  tracking  system,  was  estab- 
lished to  store  the  LACIE  physical  data.  The  com- 
plexity of  the  job  to  be  done,  together  with  the  huge 
volume  of  data  to  be  handled  and  processed,  required 
the  adoption  of  a total  systems  approach  and  the 
automation  of  the  LPDL.  Approximately  3000 
operational  segment  packets  were  maintained  during 
the  third  phase  of  LACIE.  Each  segment  packet  con- 
tained Landsat  segment  film,  crop  calendars, 
topographic  maps  at  several  scales,  and  ancillary 
summary  data.  All  packets  were  stored  sequentially 
in  filing  cabinets  with  controlled  access.  The  imple- 
mentation of  this  facility  is  addressed  in  detail  in 
another  LACIE  symposium  paper. 


FLOW  OF  DATA  AND  INFORMATION 

It  was  within  this  framework  that  the  LACIE 
data-handling  system  was  planned  and  developed 
into  its  current  integrated  information  and  data- 
processing  system.  Perhaps  the  best  way  to  examine 
the  composition  of  this  system  is  to  follow  incoming 
electronic  and  physical  data  through  their  processing 
cycle  and  then  describe  the  tracking  mechanism 
designed  to  identify  each  significant  event.  A visual 
depiction  of  the  LACIE  data  flow  is  contained  in 
figure  1. 


Initial  Data  Order 

Following  the  project  selection  and  definition  of 
the  LACIE  sample  segment  location,  electronic  data 
orders  are  placed  with  the  NASA  Goddard  Space 
Flight  Center  (GSFC)  via  the  data  transmission  line. 
The  segment  locations  aie  defined  by  geographical 
coordinates  at  the  center  of  the  sample  area.  The 
Landsat  acquisition  date  range — the  beginning  and 
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FIGURE  1— LACIE  data  flow. 


the  ending  or  collection  periods— is  also  specified  in 
the  GSFC  data  order. 

Concurrently,  the  segment  number  set  is  ran- 
domly distributed  over  the  disk  storage  packs  in  an- 
ticipation of  storing  up  to  16  Landsat  acquisitions  for 
each  segment.  Experience  has  shown  that  an  average 
of  5 to  6 acquisitions  can  be  expected  for  each  Land- 
sat  data  order  placed  with  GSFC  during  one  crop- 
growth  year. 

The  supportive  products  such  as  topographic 
maps,  crop  calendars,  and  ancillary  data  are  defined 
and  ordered  as  soon  as  the  sample  segment  location 
is  specified.  These  supportive  products  and  the 
Landsat  acquisition  film  are  placed  in  the  LACIE 
analyst  packets,  which  are  stored  in  the  LPDL  until 
required  for  analysis. 

Currently,  the  NASA  Johnson  Space  Center 
(JSC)  does  not  have  the  capability  to  generate  full- 
frame  9-  by  9-inch  imagery  from  Landsat  tapes.  Since 
the  synoptic  view  of  the  area  surrounding  the 
LACIE  segment  is  important  in  making  an  accurate 
analysis,  arrangements  were  made  with  GSFC  to 
send  the  full-frame  negatives  to  Salt  Lake  City  for 
further  processing.  Therefore,  the  initial  segment  list 
was  transmitted  to  Salt  Lake  City  to  support  this 
film-processing  effort  for  development  of  three- 
channel  color  infrared  products. 

Finally,  the  field  observation  data  collection  is  in- 
itiated as  soon  as  the  “ground-truth"  segments  are 
identified  within  the  total  segment  allocation.  This 
task  is  accomplished  by  Agricultural  Stabilization 
and  Conservation  Service  (ASCS)  and  Canadian  per- 
sonnel. Crop  inventory  data  are  collected  for  these 
segments  and  forwarded  to  JSC. 
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Data  Racalpt 

The  Landsat  data  from  GSFC  arc  transmitted  to 
JSC  via  a communication/image  transmission  line 
and  recorded  on  nine-track  magnetic  tapes.  All  mag- 
netic tapes  received  by  JSC  are  first  entered  into  the 
storage  records  and  assigned  a unique  accession 
number.  One  to  five  tapes  were  received  at  JSC  each 
day,  with  the  Landsat  data  arranged  in  files, each  file 
containing  one  sample  segment  acquisition.  Each  of 
these  files  consisted  of  header  identification  data; 
parameters  used  in  film  product  generation;  and  the 
image  data,  which  consisted  of  117  scan  lines  with 
196  pixels  of  data  each  in  four  spectral  bands. 
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All  acquisitions  are  entered  into  the  image  data 
base  on  the  IBM  360-75.  Subsequent  acquisitions  for 
the  sample  segments  are  stored  on  the  same  physical 
disk  device  but  not  necessarily  in  sequential  order. 
Established  indexes  allow  retrieval  and  composition 
of  data  consisting  of  up  to  four  acquisitions  of  the 
same  sample  segment  as  required  for  application  pro- 
cessing. A report  of  the  stored  data  is  automatically 
generated  at  the  time  of  update,  and  queries  concern- 
ing stored  data  may  be  generated  at  any  time. 

The  Landsat  data,  on  magnetic  tape,  are  filmed  on 
the  Production  Film  Converter  (a  digital  tape-to-film 
conversion  device),  which  produces  three  different 
three-channel  color  transparencies  for  development 
by  the  Photographic  Technology  Laboratory  at  JSC. 
After  development,  the  film  roll  is  cut  into  in- 
dividual products  and  packaged.  All  film  processed 
in  this  manner  is  forwarded  to  the  LPDL  for  inclu- 
sion in  the  LAC1E  analyst  packets. 


Preparation  for  Analysis 

A status  and  tracking  system  (discussed  later  in 
this  paper)  provides  ? report  of  all  electronic  and 
physical  data  available  to  support  an  analysis.  When 
a decision  is  made  to  analyze  a particular  region,  a 
packet  order  list  is  generated  requesting  the  LPDL  to 
transfer  analyst  packets  from  storage  to  the  analyst. 
After  the  analyst  has  received  the  packet,  fields  and 
dots  are  labeled  and  batch  run  decks  are  prepared  by 
incorporating  the  labels  into  a sequential  input  pro- 
cessing deck.  These  cards  are  forwarded  to  the  IBM 
360-75  for  interaction  with  the  stored  imagery  and 
statistical  analysis  using  a Staran  array  processor 
which  is  linked  to  the  data  base.  Details  of  the 
classification  and  clustering  processes  are  contained 
in  other  symposium  papers. 


Batch  Processing  and  Results  Distribution 

Data  classification  runs  (from  batch  or  interactive 
processing)  for  area  determination  on  the  IBM 
360-75  result  in  output  tapes  that  are  used  to  produce 
color  transparencies  of  cluster  and  classification  im- 
ages. The  images  are  generated  on  the  Production 
Film  Converter,  as  were  the  incoming  Landsat  image 
products  in  the  “Data  Receipt”  phase  These  hatch  or 
interactive  jobs  on  the  IBM  360-75  system  also  result 
in  statistical  report  tapes  (CAMS/CAS  Interface 
Tape,  CCIT)  and  microfiche.  The  CCIT  is  further 


processed  on  a PDP 1 1-45  to  provide  the  analyst  with 
Type  1 and  Type  2 dot  label  classifications,  bias  cor- 
rection classification  reports.  Type  1 and  Type  2 dot 
label  cluster  assignment,  bias  correction  cluster  re- 
ports, and  separability  reports.  All  these  result  pro- 
ducts are  forwarded  to  the  analyst  for  evaluation. 
The  Accuracy  Assessment  Subsystem  receives  all 
batch  input  decks  and  all  output  tapes  after  opera- 
tions is  through  with  them. 


Analysis  Completion 

It  should  be  clear  by  now  that  the  LACIE  analyst 
spends  a great  amount  of  time  studying  and  working 
with  the  electronic,  physical,  and  derived  data  to 
gather  all  the  information  necessary  to  produce  an 
area  estimate  for  the  CAS  to  utilize.  On  completion 
of  the  analysis  cycle,  area  proportion  estimates  are 
given  to  CAS  and  the  analyst  packet  is  returned  to 
the  LPDL. 


INFORMATION  STATUS  AND  TRACKING 

A description  of  the  information  flow  is  not  com- 
plete without  a mention  of  the  Automatic  Status  and 
Tracking  System  (ASATS).  The  ASATS,  as  the 
centralized  source  of  information,  is  the  hub  around 
which  the  LACIE  data  revolves.  It  is  built  on  the 
concept  of  a management  tool  to  trace  the  flow  of 
LACIE  materials  from  collection  and  data  storage 
through  the  various  imagery  interpretation/ 
mensuration  stages  and  finally  to  the  compilation  of 
a crop  area  and  production  estimate. 

As  the  LACIE  data  collection  and  data  bases  in- 
creased in  size,  the  simple  sequential  ordering  of 
units  was  not  adequate  to  effectively  organize  and  re- 
port on  the  information  within  the  system.  A 
custom-built  software  package  was  designed  to  solve 
the  problems  unique  to  LACIF..  The  resulting 
system,  ASATS,  was  maintained  on  a PDP  11-45 
computer.  This  system  is  discussed  in  detail  in 
another  paper.  The  ASATS  tracks  Landsat  data, 
from  arrival  at  JSC  to  completion  of  the  processing 
cycle,  at  various  designated  stations  in  the  LACIE 
data  Dow  previously  discussed.  Figure  1 indicates  the 
status  points  within  ASATS  that  were  used  to  gener- 
ate management  reports,  plan  data-processing  work, 
and  track  work  assignments.  In  the  earlier  develop- 
ment of  the  system,  the  ASATS  was  a valuable  aid  in 
determining  whether  data  were  delinquent  or  lost;  it 
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highlighted  problem  points  in  the  data  flow  that  were 
subsequently  improved. 


DATA  BASE  MAINTENANCE  AND 
OPERATION 

The  receipt  of  all  incoming  Landsat  data  is 
monitored,  all  data  updates  are  verified,  and  data 
storage  levels  on  the  imagery  data  bases  arc  observed 
to  maintain  sufficient  space  for  additional  data.  This 
monitoring  is  required  since  the  storage  system  is 
configured  to  allow  an  average  of  5 acquisitions  per 
segment  (up  to  a maximum  of  16)  to  be  stored.  Over- 
flow is  captured  on  an  overflow  pack;  however,  the 
system  monitoring  is  intended  to  prevent  this  occur- 
rence. Data  base  restructuring  or  data  purges  are 
sometimes  necessary  as  corrective  measures.  Data 
are  periodically  deleted  from  the  system  when 
quality  is  questionable.  The  entire  data  bases  are 
checkpointed  weekly  to  minim;...-  recovery  pro- 


cedures in  the  event  of  data  base  failures.  Thus,  data 
base  integrity  is  maintained  at  all  times. 


DATA  ARCHIVES 

At  the  completion  of  a project  crop  year,  all 
electronic  data  resident  in  the  operational  data  bases 
are  unloaded  on  computer-compatible  magnetic 
tapes.  Copies  of  these  tapes  are  sent  to  the  Federal 
Archives  and  Records  Center  in  Washington.  D.C. 
Working  copies  are  retained  at  JSC  for  further 
research  and  evaluation.  A directory  of  all  archived 
data  is  maintained  for  each  LACIE  phase  or  crop 
year. 

The  physical  data  remains  in  the  LACIE  segment 
packet  as  a historical  reference  tool  for  the  analyst  to 
use  in  subsequent  crop  years.  If  a particular  segment 
is  not  used  in  the  next  crop  year,  the  segment  packet 
is  removed  from  the  operational  data  library  and 
placed  in  an  inactive  data  repository. 
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INTRODUCTION 


Background 

The  Data  Research  and  Control  (DR&C)  Section 
of  the  Earth  Observations  Division  (EOD)  had  been 
in  existence  for  several  years  prior  to  the  LACIE 
program  but  was  configured  to  provide  remote- 
sensing data  support  to  projects  of  much  lesser  mag- 
nitude. Data  available  at  the  onset  included  aerial 
photography  (mainly  domestic,  but  serving  as  a good 
asset  throughout  the  project  to  support  research 
efforts);  a full-frame  Landsat  and  tape  file  started  in 
1972  with  the  launch  of  Landsat-I;  a visual  aid  file; 
remote-sensing/Earth  sciences  reference  collections; 
a map/chart  acquisition  and  storage  facility;  and  data 
to  support  projects  such  as  Cotnblight  and  the  Crop 
Identification  Technology  Assessment  for  Remote 
Sensing  (CITARS)  compilations. 

The  LACIE  support  requirements,  initially  out- 
lined in  a number  of  baseline  requirement  docu- 
ments, were  eventually  consolidated  as  part  of  a plan 
adjusted  throughout  the  experiment.  Because  DR&C 
also  supported  other  EOD  projects,  the  LACIE 
Physical  Data  Library  (LPDL)  was  established  to 
separate  LACIE-type  tasks  from  other  division  sup- 
port requirements. 


Premise 

With  4800  sample  segment  study  site''  located 
throughout  the  eight  LACIE  countries,  it  became  ap- 
parent that  the  total  volume  of  data  for  Landsat  and 
correlated  Intensive  Test  Site  (ITS)  areas  would  be 
greater  than  it  had  been  for  any  other  remote-sensing 
project  undertaken  by  EOD.  With  Landsat  data 
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being  collected  a number  of  times  per  site  throughout 
the  wheat  growth  year,  it  was  apparent  that  such  im- 
agery could  be  overwhelming  and  could  easily  over- 
tax the  existing  manual  data-handling  systems. 

However,  since  the  overall  LACIE  effort  was  in 
three  phases,  ranging  from  about  600  active  segments 
in  Phase  I to  nearly  3000  in  Phase  III,  it  was  believed 
there  would  be  ample  time  to  adequately  train  per- 
sonnel and  develop  efficient  support  systems.  Adap- 
tations were  made  without  an  excessive  expenditure 
of  manpower,  time,  or  facilities  as  the  experiment 
progressed. 

In  addition,  field  measurements  data— consisting 
of  aircraft  remote-sensing  and  ground  data  collected 
during  Landsat  overpasses  of  ITS  areas  for  research, 
test,  and  evaluation  (RT&E)  purposes — would  be 
variable  and  intermittent.  It  was  also  expected  that 
the  rate  of  data  input  to  RT&E  (and  subsequently  to 
LPDL)  would  vary  with  the  needs  of  their  programs, 
including  material  for  their  supporting  research  and 
technology  (SR&T)  contracts. 


LACIE  DATA  SUPPORT  REQUIREMENTS 

The  overall  LACIE  requirement  for  physical  data 
handling  performed  by  LPDL  is  presented  in  figure 
1.  Many  of  these  data  are  sample-segment  dependent 
and  require  extensive  handling  and  storage.  Since  the 
data  were  produced  by  other  organizations,  a basic 
function  of  the  LPDL  was  to  interface  with  these 
organizations,  to  serve  as  a central  repository  of  data, 
and  to  transmit  the  data  to  LACIE  users  as  needed. 
The  general  LPDL  function  was  to  research,  acquire, 
index,  maintain,  distribute,  track,  and  control 
LACIE  operational  data  and  documents.  To  execute 
this  function,  the  LPDL  was  required  to  interface 
with  LACIE  users  and  NASA  support  organizations. 
It  was  represented  during  the  daily  LACIE  Opera- 
tions Coordination  Center  (OCC)  meetings,  where 
operations  problems  could  be  resolved  in  real  time. 


169 


ease  rnmmuax  wm 


• FILM  PRODUCTS 

• CHOP  CALENDARS 

• MAP* 

• ANCILLARY 

SUMMARY 

• METEOROLOGICAL 

• TEMPORAL  CROP 

• CLASSIFIED  OUT- 

PUT PRODUCT* 

• SUMMARY  REPORT 


• FILM 

- AERIAL 

- GROUND 

• COMPUTER- 

COMPATIBLE 
TAPES  ICCT) 

• STRIP  CHARTS 

• FIELD  RECORDS 


• OSFC  DATA 
OROER 

- COMPUTER 

CARDS 

- LANDSAT 

ACQUISITION 

- TAPE  PROCESSING 

INFORMATION 

- DATA  BASE  STATUS 


(•) 


1.5  000  000  NATIONAL 
SUMMARY  WITH 
OVERLAYS 
1:1 000  000  MOSAICS 
NORTH  TO-SOUTH 
COLOR-INFRARED 
STRIP  MOSAIC 
1.500  000  US. 

STATE  MAPS 
1:1  000000  SAMPLE 
SEGMENT  PLOTS 


• ONC  COVERAGE 

• TOPOGRAPHIC 
- 1:260000 

- LARGE  SCALE. 

AS  AVAILABLE 


• TOPOGRAPHIC 

• COUNTY  SOIL 

MAPS 

• 1:24  000  MOSAIC 

WITH  OVERLAYS 


• LAND  USE 

• SOIL 

• GEOLOGY 

• STATE  AND 

COUNTY 

HIGHWAYS 

• CLIMATOLOGY 

• AGRICULTURE 

• METEOROLOGY 

STATIONS 

• AERONAUTICAL 

CHARTS 


lb) 


FIGURE  1. — LPDL  requirements  tasks,  (a)  Operations  data  handling,  (b)  Maps,  charts,  and  mosaics. 

(c)  Status  and  control  systems. 


8YSTEM8  AND  OPERATIONS 

Small-size  remote-sensing  experiments  often  en- 
tail relatively  small  amounts  of  physical  data  that  can 
be  stored  or  filed  by  the  experimenter  in  a couple  of 
Filing  cabinet  drawers.  Large-size  remote-sensing  ex- 
periments sucii  as  LACIE,  however,  with  its  re- 
peated coverage  of  many  parts  of  the  world  extend- 
ing over  several  years,  produce  a huge  amount  of 
data.  Such  systems  need  a formal  management 
method  and  centralized  data  repositories  with  con- 


trolled input/output  mechanisms  to  make  the 
material  valuable  to  many  users  without  excessive 
data  duplication  or  loss  of  time.  The  data  support 
systems  developed  for  LACIE  were  designed  to 
satisfy  this  need,  in  addition  to  meeting  the  other  re- 
quirements. 

During  the  Phase  HI  peak  level  when  approx- 
imately 3000  actual  segments  were  being  acquired  by 
Landsat,  a staff  of  20  people  was  required  to  main- 
tain the  functions  in  support  of  the  data  depicted  in 
figure  1.  Housing  that  many  people  and  providing 
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FIGURE  1.— Concluded. 


facilities  for  staging  and  storing  the  data  required 
more  than  4000  square  feet  of  office/warehouse 
space. 


ANALY8I8  PACKET  PREPARATION  AND 
8TATU8ING 

Packet  Development 

The  data  required  for  the  area  mensuration 
analysis  of  a LACIE  sample  segment  needed  to  be 
assembled,  coordinated,  stored,  tracked,  and 
retrieved  conveniently.  The  method  developed  to 
solve  the  requirement  was  the  Sample  Segment 
Packet,  one  of  which  was  prepared  for  each  of  the 
4800  LACIE  sample  segment  sites. 

Data  inserted  in  each  packet  by  LPDL  included 
LACIE  sample  segment  film  from  Landsat,  crop 
calendars,  topographic  maps  at  several  scales,  and 
ancillary  summary  data.  The  LPDL  was  also  respon- 
sible for  direct  acquisition  of  available  topographic 
maps  that  covered  the  LACIE  sites. 


Whenever  there  is  a large  volume  of  data— such  as 
that  associated  with  the  4800  sample  segment 
packets  (including  the  receiving,  sorting,  and 
organizing  of  thousands  of  pieces  of  paper  and 
film)— there  are  bound  to  be  problems.  Therefore, 
documented  procedures  were  established  to  main- 
tain order,  foster  longevity,  and  handle  problem 
areas.  The  packet  materials  were  placed  in  sturdy, 
large  envelope  folders  that  contained  sample  seg- 
ment numbers  and  indexing  cards.  The  packets  were 
stored  sequentially  in  filing  cabinets  with  controlled 
access  and  only  checked  out  and  in  by  authorized 
personnel. 

All  incoming  production  film  converter  imagery 
was  screened  on  receipt  and  checked  for  problems. 
Problem  photography  was  referred  back  to  the  pro- 
duction supervisors  for  refilming.  Other  data  prob- 
lems were  referenced  back  to  the  sources  for  correc- 
tion. The  LPDL  handled  any  map  problems,  and  the 
Data  Acquisition,  Preprocessing,  and  Transmission 
Subsystem  (DAPTS)  was  responsible  for  correcting 
problems  with  crop  calendars  and  ancillary  summary 
data. 
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Sarnpi*  8agm«nt  Packet  Status  and  Tracking 
System 

The  LPDL  provided  status  and  tracking  data  to 
the  Automatic  Status  and  Tracking  System 
(ASATS).  During  packet  development,  a coded  key- 
punched statusing  card  for  each  type  of  data  (maps, 
crop  calendars,  and  ancillary  data)  was  submitted  to 
ASATS  immediately  after  such  material  was  inserted 
in  each  packet.  When  a Landsat  acquisition  had  been 
received  and  entered  into  the  IBM  360-75  imagery 
data  bases  for  storage,  a statusing  card  indicating  data 
availability  at  the  NASA  Johnson  Space  Center 
(JSC)  was  prepared  and  forwarded  to  ASATS. 

After  Landsat  Him  had  been  received  and  inserted 
in  the  packet,  a film-ready  status  card  was  entered  in 
ASATS.  When  the  packet  was  released  nom  ancil- 
lary hold  and  film  was  received,  LPDL  prepared 
another  coded  status  card  indicating  that  all  neces- 
sary data  had  bed  placed  within  the  packet  and  the 
sample  segment  was  ready  for  analysis.  These  status 
and  tracking  steps  were  repeated  throughout  the 
LACIE  program. 


Data  Coordination  and  Reporting 

The  LPDL  coordinated  the  preparation  and  track- 
ing of  Data  Product  Requests  (DPR's)  for  LACIE. 
DPR's  prepared  by  the  analysts  were  checked  for 
completeness,  recorded,  approved,  and  forwarded. 
The  returned  data  was  checked  for  completeness, 
recorded,  and  forwarded  to  the  requester.  DPR's 
ranged  from  requests  for  Landsat  film  products  to 
data  base  queries,  keypunch  requests,  and  special  re- 
quests for  data  search.  During  Phase  III,  these  re- 
quests averaged  over  30  per  week. 


Pocket  Operations  and  Statusing 

Prior  to  the  start  of  Phase  I analysis,  sample  seg- 
ment lists  were  used  to  create  the  segment  packets 
and  to  generate  the  process  of  ordering  maps.  As 
i-.aps,  crop  calendars,  ancillary  summary  data,  and 
film  were  received  and  filed,  the  sample  segment  in- 
formation was  statused  through  manual  reports  and 
inputs  to  the  Interim  Status  and  Tracking  System 
(ISATS). 

During  Phase  I.  1033  segments  were  analyzed, 
while  2649  Landsat  film  sample  segment  acquisition 
sets  were  received.  Each  initial  film  data  set  con- 


sisted of  two  color  composite  film  transparencies  and 
four  black  and  white  film  transparencies,  each  of 
which  had  to  be  filed  in  the  packets  and  statused 
ready  for  analysis.  Numerous  problems  emerged 
during  Phase  I.  While  most  were  solved,  several  per- 
sisted due  to  time  pressures  and  the  need  for  rapid 
responses.  These  problems  are  listed  below. 

) . Sample  segment  lists  (based  on  the  sampling 
strategy  output)  were  often  not  received  in  sufficient 
time  to  search,  order,  status,  and  file  maps  in  the 
packets  without  a “crash"  program.  For  example,  it 
required  1 to  3 months  after  placing  an  order  to 
receive  maps  from  Canada;  1 to  2 months  to  get 
maps  from  the  Defense  Mapping  Agency 
Topographic  Center;  2 months  from  Australia;  and  I 
month  for  large  map  quantities  from  the  U.S. 
Geological  Survey.  These  time  intervals  include  first- 
class/airmail  delivery  in  the  United  States  and 
airfreight  shipments  from  foreign  countries.  This 
situation  persisted  and  was  aggravated  each  time 
there  was  a major  relocation  or  segments. 

2.  Initial  packet  handling  required  that  packets 
ready  for  analysis  be  removed  from  the  sequential 
files  and  placed  in  sepatale  cabinets  in  order  to 
satisfy  user  needs  to  examine  packet  contents  and  to 
plan  daily  workloads  prior  to  checkout.  In  addition, 
packets  with  data  problems  were  placed  in  another 
separate  cabinet.  This  method  was  operationally 
satisfactory  to  users  but  was  disruptive  to  the  regular 
data  handling  process,  causing  a high  expenditure  of 
manpower  time. 

3.  All  types  and  formats  of  data  (e.g.,  computer 
printouts,  detailed  processing  results,  etc.)  were 
saved,  causing  bulky  packets  and  file  crowding. 
Direction  regarding  retention  time  of  such  data  was 
difficult  to  obtain  as  there  was  often  a lack  of  consen- 
sus concerning  the  longevity  value  of  the  data.  This 
problem  persisted  and  in  fact  may  be  the  unfortunate 
byproduct  of  any  large-scale  experimental/quasi- 
operational  system  that  handles  large  and  varied 
quantities  of  data. 

4.  At  the  start  of  Phase  I,  input  data  cards  on  the 
required  data  and  packet  status  were  created  through 
manual  coding  and  punching.  Batch  card  inputs  to 
overnight  ASATS  updates  on  status  had  operational 
problems  and  breakdowns.  Therefore,  critical  reports 
on  packet  data  availability  and  statistics  on  data  flow 
and  time  lines  often  were  prepared  through  a time- 
consuming  manual  process.  Since  LPDL  provided  all 
these  data  and  essentially  performed  most  of  the 
tasks  for  ASATS,  as  well  as  maintaining  the  manual 
systems  and  records,  manpower  requirements  for  re- 
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purling  were  exceptionally  high.  This  situation  was 
eventually  solved  as  the  ASATS  reporting 
capabilities  became  reliable. 

In  Phase  II,  the  major  emphasis  was  to  increase 
and  diversify  the  number  of  segments  and  Landsat 
data  acquisitions  to  be  analyzed.  Segments  increased 
to  1437  and  Landsat  data  acquisitions  to  9211.  The 
color  composite  transparencies  and  black  and  white 
products  re  mained  the  same. 

By  Phase  II,  computer-generated  packet-ready 
lists  were  provided  through  ASATS,  thus  eliminating 
the  need  to  physically  shift  packets  from  cabinet  to 
cabinet.  Film  was  sorted  and  placed  in  annotated  en- 
velopes, making  it  easier  for  Classification  and  Men- 
suration Subsystem  (CAMS)  analysts  to  organize 
these  products  sequentially.  Early  in  Phase  11,  data 
and  packet  availability  cards  were  still  prepared 
manually.  Initially,  the  batch  card  input  to  perform 
overnight  update,  statusing,  and  daily  output  had  low 
dependability,  and  manually  recorded  data  were 
utilized  as  necessary.  However,  as  problems  with 
system  dependability  were  solved,  the  ASATS 
became  more  stable. 

In  Phase  III,  there  was  a significant  increase  in 
segments  to  be  analyzed.  The  number  of  color  com- 
posites per  site  increased  to  three  but  the  black  and 
white  transparencies  were  dropped.  In  addition, 
some  sample  segments  were  dropped;  some  were 
relocated;  and  new  segments  were  added  in  keeping 
with  a new  sampling  strategy.  Analyzed  segments  in- 
creased to  3006  and  Landsat  sample  segment  acquisi- 
tion sets  to  19000.  The  changes  and  the  additional 
volume  began  to  test  the  full  data  handling  capability 
of  the  entire  LACIE  system,  including  the  LPDL. 

Every  effort  was  made  to  obtain  the  sample  seg- 
ment relocations  lists  as  soon  as  possible,  since  each 
relocated  or  added  segment  had  to  be  plotted  on  a 
map  index  to  determine  the  new  map  coverage  (for 
ordering  purposes).  This  effort  required  the  full 
resources  of  LPDL  to  prepare  the  requisitions  and 
provide  expeditious  handtinf  on  arrival  so  that  maps 
could  be  available  to  assist  in  the  analysis  effort. 

The  CAMS  operational  procedures  changed,  in- 
corporating Procedure  1.  Bulky  computer  printout 
materials  retained  in  the  packets  caused  packet 
storage  problems,  so  that  file  drawers  normally  stor- 
ing 25  packets  now  could  only  accommodate  10  to 
12.  This  problem  was  solved  by  removing  an  opera- 
tional packet  from  the  active  staging  area  as  it  was 
completed  for  the  season  and  storing  it  with  other 
cyclical  records. 

Machine-generated  film  labels  received  via 


ASATS  for  the  film  product  envelopes  were  also 
placed  on  the  plastic  sleeve  used  to  protect  each  piece 
of  film.  The  use  of  the  computer-generated 
stickyback  labels  saved  considerable  time  in  the 
process  of  labeling  and  statusing  incoming  film 
products. 

Prepunched  packet  data-statusing  cards  were  now 
prepared  in  advance  for  all  new  or  relocated  seg- 
ments. As  the  segment  listings  were  entered  into 
ASATS,  these  cards  were  prepunched  except  for 
date.  The  date  was  entered  automatically  as  the  cards 
were  forwarded  by  LPDL  to  indicate  that  the  data 
had  been  received.  Thus,  while  the  data  rates 
nearly  doubled  from  Phase  II  to  Phase  III,  the  ever- 
increasing  automation  of  the  status  and  tracking 
system  often  offset  to  some  degree  the  impact  of  the 
increased  workload. 


LANDSAT  PULL-FRAME  8Y8TEM 

In  1972,  a Landsat  full-frame  data  acquisition  and 
storage  system  was  initialed  by  EOD  to  support 
remote-sensing  projects.  The  film  was  retained  for 
operational  use,  while  the  tapes  (after  initial 
analysis)  were  placed  within  c.  centralized  tape  li- 
brary for  operational  storage  and  retrieval. 

Early  in  the  program,  LACIE  management  stated 
the  need  for  a coordinated  Landsat  full-frame  Him 
file  with  adequate  status  and  tracking  procedures. 
This  file  was  established  as  part  of  LPDL,  which 
developed  a method  for  coordinating  full-fiame  data 
acquisition  and  management. 

Corrected  full-frame  9.5-inch-format  Him  imagery 
was  stored  from  both  Landsat-I  and  Landsat-2.  The 
imagery  from  the  multispectra)  scanner  (MSS)  often 
comprised  all  four  bands  in  black  and  white 
transparencies  and/or  prints  and  in  various  color 
combinations.  Since  a limited  quantity  of  return 
beam  vidicon  (RBV)  data  was  acquired,  there  were 
only  a few  selected  frames  in  the  file.  The  digital  full- 
frame  Landsat  data  was  received  in  the  form  of  nine- 
track  computer-compatible  tapes. 

A Landsat  path  and  row  (footprint)  indexing 
system  was  adopted,  with  data  being  filed  in  standard 
filing  cabinets  and  grouped  sequentially  by  acquisi- 
tion date.  An  index  card  was  prepared  for  each  im- 
age, and  film  could  be  checked  out  only  by 
authorized  personnel.  Manual  reporting  procedures 
were  developed  to  provide  the  LACIE  OCC  informa- 
tion on  the  status  of  full-frame  film. 

When  LACIE  I became  operational,  the  file  was 
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composed  of  about  15  000  pieces  of  Him,  mostly 
black  and  white  transparencies  of  the  four  MSS 
bands.  During  LACIE,  approximately  9200  color 
composite  acquisitions  were  received,  producing  a 
total  of  25  000  pieces  of  film  in  the  full-frame  inven- 
tory. 


MAP  SEARCH  AND  ACQUISITION 

Maps  and  charts  depicting  most  of  the  basic  land- 
associated  themes  that  covered  LACIE  areas  in  the 
eight  countries  were  researched  and  acquired,  where 
possible;  see  figure  1 for  a summary  list.  These  maps 
were  obtained  from  U.S.  government  agencies  and 
commercial  publishing  organizations.  In  addition. 


LPDL  was  the  repository  for  specialized  LACIE 
maps,  mosaics,  and  overlays  created  by  the  EOD 
Cartographic  Laboratory. 

However,  a mqjor  effort  was  necessary  to  search 
and  acquire  topographic  maps  to  cover  the  4800 
LACIE  sites  in  eight  countries,  The  map  manage- 
ment process  is  illustrated  in  figure  2.  Map  orders 
we  e sent  to  a U.S.  Geological  Survey  map  distribu- 
tion center  for  large-  and  medium-scale  li.S.  maps; 
the  Defense  Mapping  Agency  Topographic  Center 
for  other  domestic  and  foreign  maps,  and  a foreign 
cartographic  ager.cy  for  nationally  printed  maps. 

Each  segment  packet  was  provided  a 1:1  000  000 
Operational  Navigation  Chart  (ONC),  a ! .250  000 
topographic  map,  and  large-scale  maps.  U.S.  large- 
scale  maps  were  1.62  500  (15-foot  quadrangle)  and 
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1:24  000  (7. 5-foot  quadrangle)  scale  maps,  as  availa- 
ble. In  foreign  areas,  they  were  1:100000  and 
1:50  000  scale  maps,  as  available.  LPDL  created  a 
map  statusing  sheet  as  a device  to  ensure  that  when 
the  maps  were  received  the  correct  combination 
would  be  assembled,  numbered,  folded,  and  inserted 
in  the  appropriate  packet. 

As  described  in  the  Landsat  systems  design  paper, 
the  best  the  Goddard  Space  Flight  Center 
(GSFC)/Landsat  acquisition  system  could  guarantee 
was  that  the  sample  segments  were  located  within  a 
nominal  10-  by  11-mile  area.  Therefore,  to  ensure 
adequate  map  coverage,  each  segment  was  provided 
maps  that  covered  a 10-mile-diameter  circle  centered 
at  the  sample  segment  coordinate  point,  as  plotted  on 
a map  index.  Consequently,  if  the  coordinates  of  a 
U.S.  segment  were  located  near  map  boundaries,  the 
packet  would  require  up  to  four  1 : 1 000  000  scale 
maps,  four  1:250  000,  four  1:62  500,  and  nine 
1:24000.  After  the  segments  were  sited,  the  unused 
maps  were  returned  and  stored.  Later  in  the  pro- 
gram, the  1 : 1 000  000  scale  ONC  maps  were  removed 
from  the  packets  and  used  in  a separate  file  for  sam- 
ple segment  piOts. 

It  was  necessary  to  acquire  a sufficient  number  of 
these  maps  so  that  not  only  each  packet  had  one 
cop)  of  the  appropriate  map  sheet  but  that  there 
were  also  a few  spares  in  the  map  storage  facility.  In 
time,  maps  were  cut  un  or  lost  and  replacements 
were  needed;  because  a few  spares  were  available,  a 
great  deal  of  time  and  labor  were  saved. 

The  Map  Acquisition  and  Storage  Facility  pre- 
pared an  index  card  for  each  map,  mosaic,  and  over- 
lay received;  card  data  included  !hc  number  of  copies 
and  where  the  map  was  used.  During  the  LACIE 
period,  the  facility  increased  from  approximately  an 
II  000  index  card.  30  000  map  file  to  34  000  index 
cards  and  90  000  maps,  charts,  mosaics,  and  overlays. 
A semiau'.omated  information  retrieval  system  was 
utilized  as  a manual  assist,  thus  permitting  one  per- 
son to  manage  the  indexing  process.  The  system 
reduced  the  time  required  to  record  data  and  retrieve 
information,  hut  it  did  not  have  the  capability  of  pro- 
ducing map  listings. 


LACIE  REFERENCE/PROJECT  DATA  AND 
REPORTS 

The  LACIE  Refcicnce/Project  Data  and  Reports 
was  subdivided  into  four  areas:  (a)  LACIE  reference 
and  report  collections,  (b)  LACIE  ITS  data,  (c)  con- 


tingency data,  and  (d)  records  storage.  These  areas 
are  shown  in  greater  detail  in  figure  3. 

The  LPDL  interfaced  with  the  users  and  received 
data  and  documents  on  a daily  basis.  It  also  provided 
data,  reference  materials,  bibliographical  informa- 
tion, and  reports  for  project  use.  Over  60  000  items 
(excluding  maps)  were  received,  filed,  and  managed 
in  this  part  of  the  LPDL  operation  during  the  LACIE 
program.  !n formation  requests  for  this  data  varied 
from  50  to  100  per  week. 

Rpteranc*  and  Report  Collodions  for  LACIE 

Reference  information  was  composed  of 
agricultural  and  other  Earth/environmcntal  science 
texts,  statistics,  reports,  remote-sensing  documents, 
and  data.  Most  agricultural  reference  material,  apart 
from  basic  information  sources,  was  received  either 
from  the  statistical  service  office.!  of  the  federal 
government  or  directly  from  the  states.  Specific 
LACIE  project-required  data  was  received  via 
DAPTS  transmittal  reports. 

The  report  collection  consisted  principally  of  re- 
ports generated  by  LACIE  management  and  the  sub- 
systems. In  addition,  data  and  information  was  sup- 
plied to  offsite  technical  investigation  centers  under 
NASA  SK&T  contracts.  Reports  from  these  in- 
vestigations also  became  part  of  the  LACIE  report 
collection. 

The  DR&C/LPDL  regularly  used  a terminal  con- 
nected to  a RFmote  CONsolc  (recon)  that  provided 
a capability  to  research  bibliographical  data  from  the 
several  library  syt-«\,is  at  GSFC.  Additional 
bibliographical  data  were  obtained  from  other  library 
sources  across  the  country. 

Special  LACIE  Data 

Data  covering  LACIE  ITS  and  three  “supersites” 
were  received  throughout  the  LACIE  program;  the 
data  input  reached  a peak  during  Phase  111.  These 
data  were  composed  of  two  basic  types:  field 
measurements  data  collected  over  the  test  sites 
(comprising  only  basic  preprocessing,  such  as  film 
development),  and  data  produced  from  secondary  or 
tertiary  processing  or  development  (such  as  color 
land  use  map  overlay)  Sec  figure  3.  Suitab  e storage 
and  retrieval  systems  were  established  for  each  type 
of  data.  One  of  the  larger  files  was  created  for  the  35- 
mm  slides  (10  325  images)  depicting  crop  phenology 
in  the  various  test  fields  during  all  the  LACIE 
phases. 
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FIGURE  J.—  LACIE  Rcfrrcnct/Projscl  Hols  and  Reports. 


Contingency  Oats 

Contingency  data  comprised  all  classes  of  material 
for  which  there  was  a rapid  response  requirement. 
Such  data  was  located,  organized  (often  copied),  and 
Hied  so  that  shorMinte  retrieval  action  could  be  in* 
stituted.  This  concept  was  based  on  providing  rapid 
support  to  the  basic  data  flow  of  the  LACIE  sub- 
systems, such  as  CAMS,  the  Crop  Assessment  Sub- 
system (CAS),  and  the  Yield  Estimation  Subsystem 
(YES),  and  10  the  support  elements, such  as  ASATS. 
The  specific  methodology  varied  with  data  base  size 
and  allowable  response  delay.  For  example,  copies  of 
crop  calendars  and  ancillary  data  were  located  in  a 
room  next  to  the  packet  operations  so  that  a replace- 
ment was  almost  instantly  available  if  a crop  calen- 
dar or  ancillary  summary  were  lost.  However,  the 
map  storage  facility,  because  of  its  size  and  space  re- 
quirements, was  located  offsite.  To  replace  a map 
(assuming  a duplicate  was  available)  required  less 
than  half  a day.  Most  data,  such  as  LACIE  reference 
data,  was  stored  so  that  retrieval  could  be  completed 
within  24  hours. 


Records 

Records  storage  and  management  was  created 
separately  for  data  having  a cyclical  or  long-term 
need  and  for  data  that  had  completed  their  usefu'- 
ness  and  were  archived.  Cyclical  or  long-term  data 
could  be  made  readily  available  based  on  the  ad- 
vanced planning  and  timing  need  per  project;  e.g. , for 
the  start  of  each  LACIF  phase,  for  a particular  time 
period  within  a phase,  or  to  support  an  SR&T  con- 
tract. Depending  on  priorities,  available  personnel, 
and  data  volume,  most  records-type  data  could  be  ob- 
tained from  the  files  in  1 to  2 days. 

Most  ITS  data  were  categorized  as  records  and 
stored  offsite  due  to  tack  of  onsite  space.  Retrieval 
action  for  some  of  the  data,  such  as  the  35-mm  slides 
stored  in  LPDL,  was  almost  instantaneous. 
However,  for  data  that  required  processing,  such  as 
copies  of  aerial  photography,  the  timing  varied  with 
the  processing  cycle. 

Data  placed  in  an  archive  were  basically  “used" 
data,  such  as  a card  deck  for  a no-longer-u«ed  com- 
puter program.  Such  data  were  resunected  only  on 
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special  request  and  had  a low  retrieval  priority.  The 
data,  however,  were  stored  for  the  life  of  the  project. 


OTHER  SUPPORT  SERVICES 

The  OR&C  performed  other  standard  surport 
services  throughout  LAC1E.  Of  particular  value  was 
a complete  worldwide  microfilm  file  of  Landsat  Jala 
utilized  intermittently  in  conjunction  wi.h  changing 
priorities  or  countries.  The  expertise  gained  in  day- 
to-day  use  was  employed  as  required  throughout  the 
experiment. 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  management  of  data  is  an  important  element 
in  any  remote-sensing  experiment  or  operational 
program.  It  requires  appropriate  considerations  in 
system  planning  and  development,  and  suitable  deci- 
sions should  be  made  early  in  a program  so  that  data 
will  be  readily  available  when  needed. 

During  the  course  of  the  LACIE  program,  several 
persistent  data  handling  problems  were  evident. 
These  problems  were  as  follows. 

1.  Delays  in  receiving  requirements,  which  lead  to 
a lack  of  sufficient  implementation  time  to  supply 
data  when  needed  (e.g.,  delays  in  getting  sample  seg- 
ment lists  so  that  maps  could  be  ordered  and 
received  in  time  for  analysis  without  a crash  pro- 
gram) 

2.  Inadequate  user  interface  plans  and  require- 
ments, so  that  when  data  were  received  they  could  be 
indexed,  stored,  and  retrieved  properly  to  satisfy 
specific  users 

3.  Difficulty  in  obtaining  system  management 
plans  and  requirements  for  the  retention  and  dis- 
tribution of  data.  LACIE  follow-on  planning,  with 
the  definition  of  supporting  data  successive  systems 
requirements,  has  demonstrated  the  need  for  initial 
and  updated  planning  for  data  retention.  For  exam- 
ple, the  blind  site  data  for  Phase  II  operations,  which 
was  utilized  later  in  the  program,  was  essentially  lost 
because  of  inadequate  plans.  However,  the  interface 
between  requirements  and  what  can  be  afforded  in 
the  way  of  physical  storage  will  continue  to  present 
problems  to  program  planners.  Data  generated  by  a 
project  the  size  of  LACIE  is  tremendous,  and  storage 
space  and  associated  personnel  requirements  ex- 
ceeded the  capability  of  providing  complete  storage 
for  all  data. 


Such  problems  can  be  traced  back  in  part  to  the 
lack  of  initial  input  into  LACIE  baseline  require- 
ments documents  or  into  a LACIE  integratea  imple- 
mentation plan,  where  not  only  qualitative  tasks  but 
also  data  volume,  rate  of  input,  diversification,  and 
timing  are  significant  considerations.  Therefore,  it  is 
recommended  that  for  future  programs,  provisions 
for  physical  data  management  be  made  an  early  in- 
tegral function  of  system  development. 


THE  FUTURE 

The  data  collected  and  stored  as  a result  of  the 
lACIE  program,  coupled  with  the  data  initially 
available  to  support  LACIE,  constitute  a valuable 
data  collection  structured  to  support  operational  and 
experimental  remote-sensing  programs.  Much  of  the 
data  can  be  incorporated  into  an  Earth  resources  data 
base.  Elements  of  the  data  collection  that  could  sup- 
port future  remote-sensing  programs  include  the 
following. 

1.  The  Landsat  full-frame  image  files 

2.  The  microfilm  file  of  aerial  and  space  photo- 
graphic and  multispectral  scanner  data  that  encom- 
passes a large  portion  of  the  Earth's  surface 

3.  The  map/chart  collection  that  includes  various 
scale  maps  and  charts  for  a good  portion  of  the 
United  States  and  the  LACIE  area  in  foreign  coun- 
tries 

4.  Computer-compatible  tapes  of  good  quality 
Landsat  scenes  particularly  adaptable  to  agricultural 
application  site  research 

5.  A collection  of  basic  remote-sensing  data,  proj- 
ect data,  reference  material,  and  associated  publica- 
tions 

6.  Visual  aids  that  can  be  used  in  part  to  support 
presentations  on  remote-sensing  projects 

7.  Research  acquisition  and  handling  procedures 
for  managing  data  for  a high-density  remote-sensing 
program  that  will  be  applicable  to  future  programs 
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The  Classification  and  Mensuration  Subsystem 
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'v-  AODUCTION 

The  Classification  and  Mensuration  Subsystem 
(CAMS)  was  responsible  for  the  acreage  component 
of  the  wheat  production  estimates  produced  by 
LACIE.  The  wheat  acreage  for  a region  or  a country 
was  produced  from  the  individual  wheat  proportion 
estimates  of  5-  by  6-nautical-mi!e  sample  segments 
using  Landsat  imagery  and  supporting  historical 
data.  To  accomplish  this  task,  CAMS  implemented  a 
processing  system  to  respond  to  both  the  accuracy 
and  throughout  requirements  of  LACIE. 

From  an  operational  standpoint,  the  most  signifi- 
cant item  CAMS  had  to  overcome  was  the  scope;  i.e., 
segment  volume  processing  requirements.  The  ob- 
vious conclusion  from  a review  of  the  requirements, 
listed  in  the  following  chart,  shows  that  a significant 
increase  in  data  handling  and  processing  was  neces- 
sary. 


Phase  1 

Phase  II 

Phase  II' 

Total  segments 

700 

1700 

3 000 

Acquisitions 

2000 

9000 

18  000 

received 
Peak  processing. 

16  to  20 

35  to  40 

75  to  80 

requirements 
per  day 

In  Phase  I,  the  state-of-the-art  classification  tech- 
nology was  assembled  into  a machine-processing 
system  capable  of  handling  the  large  volume  of  data 
required  to  evaluate  and  improve  the  technology. 
The  design  of  the  initial  system  was  simplified  to 
allow  for  subsequent  modifications  with  minimal  im- 
pact. A significant  portion  of  the  operational  ele- 
ments (e.g.,  data  handling,  computer  card  deck 
generation,  etc.)  was  accomplished  manually.  The 
classification  technology  implemented  consisted  of 
classifying  Gaussian  maximum  likelihood  per  pic- 


aLockheed  Electronics  Company,  Houston,  Texas. 
'’NASA  Johnson  Space  Center,  Houston,  Texas. 


ture  element  (pixel)  from  defined  training  fields  and 
their  associated  statistics.  However,  one  major  ex- 
ception existed:  the  definition,  identification,  and 
labeling  of  training  fields  were  accomplished  without 
the  benefit  of  ground  observations.  Never  before  had 
such  a task  been  attempted  on  the  basis  of  analyst- 
labeled  satellite  imagery.  Thus,  a key  element  of 
CAMS  was  the  development  of  consistent  and  accu- 
rate labeling  and  analysis  procedures  that  used  Land- 
sat  and  supporting  data  in  a high-volume,  high- 
throughput  environment. 

By  exercising  the  first-generation  technology  in 
Phase  I.  CAMS  personnel  identified  several  key 
issues.  During  Phase  II,  answers  to  many  of  these 
technical  questions  evolved.  The  interrelationships 
between  man  and  machine,  technology  and  opera- 
tions, and  accuracy  and  throughput  started  to 
become  clearer.  Thus,  a significant  design  effort  was 
initiated  in  parallel  with  the  Phase  II  operations  to 
define  an  improved  technology.  The  result  of  these 
efforts  was  the  design  of  an  analysis  approach  called 
Procedure  1.  (Procedure  1 is  the  subject  of  the  paper 
by  Heydorn  entitled  “Classification  and  Mensura- 
tion Approach  of  LACIE  Segments.’’) 

An  experimental  design  to  test  and  evaluate  Pro- 
cedure 1 was  conducted  during  the  latter  stages  of 
Phase  II.  When  these  tests  showed  positive  results, 
the  tasks  necessary  to  implement  Procedure  1 (e.g., 
software  modification,  procedures  development, 
analyst  training,  etc.)  were  initiated  and  continued 
through  the  initial  Phase  III  processing  period  for 
winter  wheat  (fall-winter,  1976-77). 

The  implementation  of  Procedure  1 into  Phase  111 
operations  was  accomplished  in  two  stages.  First,  a 
concept  implemented  through  analyst  procedures 
with  minimal  software  changes  was  utilized  opera- 
tionally in  the  processing  of  spring  1977  winter 
wheat.  This  period  was  used  to  accelerate  analyst 
training,  final  system  debugging,  and  on-line  testing. 
Finally,  beginning  with  Phase  HI  spring  wheat  proc- 
essing (June  1977),  the  “full-up"  Procedure  1 was  im- 
plemented operationally. 
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CAMS  OPERATIONS 

The  training  backgrounds  of  the  analysts,  the 
available  data,  the  labeling  logic,  the  analysis  pro- 
cedures, and  the  overall  integration  of  these  factors 
into  the  large-scale  LACIE  environment  are  de- 
scribed in  the  following  sections. 


Analyst  Training  Background 

Varied  backgrounds  among  the  analysis  were  re- 
quired to  ensure  that  all  aspects  of  LACIE  were  ade- 
quately covered — photograph  interpretation,  geogra- 
phy, agronomy,  mathematics,  statistics,  and  com- 
puter science.  The  analysts  were  extensively  trained 
in  image  interpretation  techniques,  photographic 
film  production,  pattern  recognition  theory,  applied 
statistical  techniques,  and  available  data  analysis 
systems.  Figure  1 depicts  the  concept  of  the  opera- 
tional system  utilized  by  the  analyst  and  the  interac- 
tions of  the  various  functions. 


Available  Reference  Data 

Reference  data  (e.g.,  data  on  what  crops  to  expect 
in  a given  area  and  what  growth  stages  to  expect  for 
the  particular  acquisition(s)  being  analyzed)  were 
available  to  the  analyst.  Imagery  (film  products)  pro- 
vided the  analyst  with  spatial  and  spectral  informa- 
tion. A machine-processing  system  took  the  analyst's 
input,  classified  the  total  segment,  and  generated  out- 
put products  (classification  maps,  cluster  maps, 
classification  summaries,  etc.)  for  evaluating  the 
processing  results. 

For  every  sample  segment,  the  analyst  had  a 
packet  that  contained  imagery  (film  products),  maps, 
ancillary  data,  and  previous  machine  classification 
data.  Available  reference  materials  not  included  in 
the  packet  were  weekly  meteorological  summaries, 
full-frame  imagery,  and  analyst  interpretation  keys. 

Film  products. — Imagery  for  the  segment  includes 
product  I of  all  acquisitions  during  the  past  crop 
year,  if  collected;  and  products  1,2,  and  3 for  each  ac- 
quisition in  the  current  crop  year.  LACIE  product  1 
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(or  simulated  false-color  imagery),  shown  in  figure 
2(a),  is  created  from  digital  values  in  channels  1,  2. 
and  4;  color  assignments  are  blue,  green,  and  red, 
respectively.  LAC  IE  product  2 (or  positive-negative 
imagery),  shown  in  figure  2(b),  is  created  from 
values  in  channels  2, 3,  and  4 to  make  information  in 
channel  3 available  to  the  analyst.  Color  assignments 
in  this  product  arc  red  for  channel  2.  blue  for  channel 
3,  and  green  lor  channel  4;  polarities  are  reversed  for 
channels  3 and  4.  Products  1 and  2,  generated  to 
emphasize  contrast,  are  excellent  for  field  delinea- 
tion and  enhanced  spatial  features.  However,  de- 
pending on  the  data  in  the  scene,  contrast  is  some- 
times achieved  at  the  cvpcnse  of  consistent  color 
depiction  of  spectral  values.  LACIE  product  3, 
show  n in  figure  2(c),  is  imagery  developed  for  Phase 
III  specifically  for  a more  consistent  color  display  of 
spectral  signatures.  (The  magnitude  of  color  distor- 
tion varies  from  scene  to  scene,  the  range  of  possible 
distortion  is  displayed  in  figure  3.)  Product  4 (black 
and-white  images  ol  each  channel)  is  used  for  detect- 
ing data  dropout  problems  etc. 

Maps  — Maps  at  scales  of  1:24  (KX)  and  1:250  000 
were  useful  in  identifying  topographic  features, 
unusual  signatures,  and  natural  vegetation;  they  also 
ensured  the  proper  location  of  segments. 

Ancillary  data. — Ancillary  data  include  informa- 
tion on  cropping  practices  and  soils  for  the  crop-re- 
porting district  (CRD),  historical  crop  percentages 
lor  the  political  subdivision  for  the  preceding  4 or  5 
years,  and  a nominal  crop  calendar  for  the  CRD 
based  on  10-year  averages.  Sample  nominal  and  ad- 
justed nominal  crop  calendars  for  CRD  20  (Mon- 
tana) are  given  in  figures  4(a)  and  4(b),  respectively. 
Average  length  of  crop  development  stages,  the 
nominal  dates  when  these  stages  occur,  and  the  rela- 
tive grow  th  stages  of  other  crops  in  the  area  are  pre- 
sented. 

Machine  classification  data.— Machine  classifica- 
tion data  include  previous  wheat  acreage  estimates 
on  the  segment  and  classification  products  for  the 
most  recent  estimate. 

Available  reference  materials  not  included  in  the 
analyst's  packet  are  the  following. 

ccklv  meteorological  summaries. — Weekly  mete- 
orological summaries1  provide  current  data  on  tem- 


•\  weekly  meteorological  summary  of  ihc  tinned  Stales  is 
published  as  an  internal  memorandum  hi  the  National  (Veanic 
and  Atmospheric  Administration  (NOAA).  the  11  S Department 
ol  Agriculture  (l'St)AI.  and  Ihe  Nalional  Aeronautics  md  Space 
Administration  (NASA)  l At  II  personnel  had  limned  access  to 
these  summaries 
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MGl'RE  2 — landsat  film  products  for  Blaine  Counts,  Mon- 
tana. acquired  on  July  3,  1077  (front  ref.  II.  (a)  Product  1 cre- 
ated from  diiiitai  salues  in  channels  I.  2.  and  4.  (bl  Product  2 
created  from  digital  salues  in  channels  2.  .1.  and  4.  (cl  Producl  1 
created  from  digital  salues  In  channels  I.  2.  and  4. 
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FUJI  K>  4. — Nominal  and  adjusted  nominal  crop  calendars  for  ( Rl)  20  t'.lonlanal  (from  ref.  II.  (a)  Nominal  (historical)  crop  calen- 
dar. Ibl  Nominal  crop  calendar  adjusted  for  crop  sear  1076-77  (segment  I5’KI. 
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peraiure  and  precipitation,  a crop  calendar  adjust* 
ment  reflecting  the  current  Robertson  oiomete- 
orologtcal  time  scale  (BMTS)  discussed  in  reference 
2,  the  growth  stage  of  wheat,  and  a summary  of 
statewide  crop  and  weather  assessments. 

By  correlating  the  current  year's  specific  growth 
stages  found  on  the  crop  calendar  adjustment  (fig.  5) 
from  the  wecklv  meteorological  summary,  the 
analyst  adjusts  the  nominal  crop  calendar  to  be  more 
specific  for  the  current  year  as  shown  in  figure  4(b) 
This  information  can  he  very  useful  because  episodic 
events,  such  as  recent  rainfall,  alter  crop  signatures 
Soil  reflectivity  contributes  an  indeterminate  compo- 
nent to  the  average  reflectance  value  for  an  acre 
recorded  by  Landsat.  and  the  reflectivity  of  wet  soil 
differs  from  that  of  dry  soil.  Long-term  events,  such 
as  drought,  affect  the  expected  Robertson  BMTS  for 
w heat  and  the  expected  spectral  signatures 

Landsat  lull-frame  imagery . — Full-frame  imagery 
of  all  areas  in  which  LAC1E  segments  exist  are  pro- 
v ided  to  the  NASA  Johnson  Space  Center  (JSC)  four 
times  during  the  growing  season  for  analyst  use.  To 
fulfill  this  requirement,  cloud  cover  must  be  less 
than  20  percent  when  Landsat  passes  over  an  area  in 
order  to  acquire  usable  imagery,  f igure  6 is  a full- 
frame  image  of  sample  segment  1528  in  Blaine  Coun- 
ty. Montana,  acquired  on  July  3,  1977. 

The  coverage  of  the  full  frame  can  be  used  for  bel- 
ter distinction  between  agricultural  and  non- 
agricultural  patterns  and  signatures  within  the  seg- 
ment area  and  in  the  area  surrounding  the  segment. 
Drainage  patterns,  streams,  and  areas  of  natural 
vegetation,  such  as  rangeland  in  the  U S.  Great 
Plains,  are  frequently  easier  to  identify  on  full-frame 
imagery.  Knowing  the  relationship  oflhe  segment  to 
the  county  through  analysis  of  the  full-frame  image 
and  agricultural  statistics  can  be  very  important  to 
the  analyst.  The  agricultural  statistics  in  the  ancillary 
summary  can  be  understood  better  w hen  the  political 
subdivision?  to  which  they  apply  (eg.,  counties  in 
the  United  States)  and  the  segment  are  viewed 
together  after  plotting  on  the  full-frame  image.  In 
this  manner,  the  analyst  can  observe  how  the  seg- 
ment compares  to  the  remainder  of  the  county  with 
respect  to  the  proportion  of  agricultural  land  in  the 
county  and  in  the  segment.  Certain  crops  may  be 
grown  in  some  areas  of  a county  and  not  in  others 
For  example,  different  crops  may  be  grown  along 
rivers  and  streams  rather  than  on  drier  hillsides  with 
poorer  soil. 

Analyst  Interpretation  Keys. — Two  volumes  of 
Analyst  Interpretation  Keys  (ref  3)  were  available  to 


{ACM  GHOWTH  SIAGf  BEGINS  WHEN  APPROX  IM  AT  I LV  SO  HCHCtNt  Of  IHt 
CHOP  HAS  HE  ACHED  ONE  Of  THE  fOlLOWING  NUMBt  HE  O SI  AGES 
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Kltit  RK  5. — Example  of  crop  calendar  adjustment  from  the 
wecld)  meteorological  sumntars  (from  ref.  I).  The  numbers 
refer  to  the  Robertson  BMTS  growth  stages  for  wheat. 


the  analysts  in  Phases  II  and  III  Volume  I is  an 
operational  overview  of  wheat  and  nonwheat  sig- 
natures. Examples  are  relatively  general,  and 
nominal  photophenology  for  w heat  is  illustrated  In 
addition,  volume  1 gives  examples  of.  and  i tuses  for. 
some  of  the  variations  in  wheat  signatures  seen  on 
Landsat  imagery.  It  is  used  as  a general  training  and 
information  aid.  Volume  II  is  a regional  key  of 
Canada  and  the  U.S.  Great  Plains  and  is  used  by  pro- 
duction analysts  as  a guide  in  operations  It  is 
designed  to  lead  to  the  correct  identification  of  w heat 
and  small-grains  areas. 
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FKil  RF  6. — Full-framr  of  simplr  st-gnu-nl  I52H in  Hlainr 

Count).  Mimuna  ar^uitrd  on  Jul>  .V  1977. 


Labeling  Logic 

Plant  phenology  is  indicated  by  a temporal  change 
in  the  infrared  reflectance  relative  to  the  reflectance 
in  the  visible  range  Figure  7 shows  a typical  curve 
for  green  growing  vegetation  plotted  in  wavelength 
versus  percentage  of  reflectance.  Since  channel  4 <0.8 
to  1.1  micrometers)  is  assigned  red  in  LACIE  pro- 
duct I and  3 imagery . a red  spectral  signature  on  this 
imagery  reflects  healthy  green  vegetation.  The 
analyst  reviews  and  interprets  the  available  imagery 
and  reference  data  to  determine  the  potential  small- 
grains  areas  in  a segment  by  correlating  the  temporal 
growth  stages  of  small  grains  and  the  expected  se- 
quence of  changes  in  spectral  signatures  to  the  se- 
quence of  spectral  signatures  evident  on  the  imagery 
Table  I shows  the  expected  sequence  of  small-grains 
signatures  for  the  acquisition  dales  shown  in  figure  8 
The  field  labeled  “W"  on  the  sequence  of  imagery  in 
figure  8 is  clearly  a winter  grain  field  The  field 
labeled  “S"  follows  a spring  grain  temporal  growth 
pattern  The  field  labeled  **N”  does  not  follow  the 
temporal  growth  pattern  of  wheat.  A utfimed  discus- 
sion of  the  interpretation  process  is  contained  in  the 
paper  by  Hay  entitled  “Manual  Interpretation  of 
Landsat  Data.” 
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(‘replanting — no  red 
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stage — red 

Planting — no  red 

May  2* 

V igorous  growth  stage 
perhaps  sonic  change 
preparatory  to 
turning — redot  brick 
red 

1 niergeni — red  or  pink, 
depending  on  plant 
canopy 

July  J 

Ripe  stage— orange, 
yellow,  brown 

Vigorous  growih 
stage — red 

Aug  11 

Harvest  stage — yellow, 
white,  tan 

Harvest  or  ready  for 
harvest — yellow, 
brown,  olive  green, 
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A problem  delected  earl>  in  LACIE  was  that  the 
labeling  logic  should  be  more  sophisticated.  The 
within-segment  variability  was  \er>  large. sometimes 
as  great  as  the  wiihin-siaic  variability.  Thus,  some 
knowledge  about  the  parameters  that  affected  the 
spectral  signatures  and  labeling  logic  had  to  be 
developed  and  implemented  operational!) 

The  analyst  must  consider  interpretation  variables 
every  time  he  works  a segment. 

1 Knowledge  of  regional  effects — such  as  soils, 
cropping  practices,  and  climatic  conditions — is  an 
important  aspect  of  the  interpretation  process 

2 I ield  si/c  and  regis'Mtiwr.  in  small-  and/or 
strip-field  segments  need  to  be  consider'd  because  a 
signature  becomes  more  difficult  to  label  wuh  the  in- 
herent increase  in  the  number  of  field  boundaries 
and. subsequent!) . on  the  resolution  of  Landsat.  with 
an  increase  in  the  number  of  mixture  pixels.  I igure  q 
shows  examples  of  large,  typical,  and  small  fields  in 
the  USSR  and  the  United  States 

3.  Acquisition  history  is  important  in  preventing 
the  occurrence  of  confusion  crops  and  ensuring  the 
accurate  identification  ol  spectral  signatures  I igure 
It)  >s  an  example  of  an  acquisition  history  for  a seg- 
ment in  Richland  County.  North  Dakota  The 
dashed  lines  on  the  partial  crop  calendar  at  the  bot- 


tom of  the  figure  indicate  the  adjusted  growth  stages 
for  the  available  acquisitions  for  spring  wheat  and 
sunflowers,  the  latter  a potential  confusion  crop.  Ac- 
quisitions I and  4 are  essential  for  the  separation  of 
trees  and  natural  vegetation.  Because  the  spring 
wheat  has  already  begun  to  emerge  on  acquisition  I. 
crop  confusion  can  occur,  however,  used  with  ac- 
quisition 4.  the  separation  would  be  complete  With 
tust  acquisitions  I and  3.  spring  wheat  would  be  con- 
fused with  sunflowers.  Note  the  similarity  of  the 
fields  on  both  dates. 

4 E pisodic  events  are  any  phenomena  that  cause 
the  spectral  signatures  to  deviate  significantly  from 
the  nominal.  The  most  pronounced  episodic  event 
monitored  during  LACIE  was  severe  drought  (fig 
11(a)). 

The  use  of  ancillary  data,  such  as  meteorological 
data  and  full-frame  imagery,  was  incorporated  into 
the  CAMS  labeling  logic  several  times  during 
LAC'll  Other  parameters  which  varied  regionally  to 
affect  signatures  were  irrigation  (fig  1 1(b)).  fertiliza- 
tion. plant  varieties,  planting  dates,  (fig  11(c)). 
planting  densities  etc  In  the  presence  of  any  or  allot 
these  factors,  the  labeling  logic  had  to  be  flexible 
enough  to  account  for  them  when  they  were  signifi- 
cant. vet  general  enough  not  to  hog  down  the  labeling 
process 


Analysis  Procedures 

lo  ensure  that  consistent  output  lor  subsequent 
evaluations  came  from  CAMS,  a controlled  set  ol 
analysis  procedures  had  to  be  developed,  imple- 
mented. and  maintained  (refs.  4 to  M These  pro- 
cedures had  to  be  adaptable  to  new  technology 
whenever  deemed  necessary,  yet  provide  consistent 
results  to  meet  the  I \(  II  accuracy  and  throughput 
goals. 

Chase  I was  primarily  a learning  experience  No 
one  knew  how  well  the  state-of-the-art  technology, 
current  at  the  lime,  would  work  in  an  operation  il  en- 
vnonment  in  which  all  segments  were  processed 
alike  and  high  throughput  was  important  Mechani- 
cal procedures  were  available  lor  using  all  die  tech- 
nology. but  no  documented  decision  ngic  and 
analysis  procedures  existed 

Two  types  of  analysts  were  thought  necessary  in 
Phase  I The  analys'-interpreier  f Mi.  expert  in  im- 
age interpretation  techniques,  was  responsible  for 
identifying  potential  s, nail-grains  areas  on  l andsat 
film  products,  delineating  "training"  fields  for  all 
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spectral  signatures,  and  inputting  the  training  infor- 
mation for  computer  processing.  The  data  processing 
analyst  (DPA),  expert  in  pattern  iccognition  theory 
and  available  data  analysis  systems,  was  responsible 
for  processing  the  data  and  evaluating  the  results. 

The  A I had  man,  oOstaciv*  ><.»  overcome.  The 
quality  of  the  Him  products  was  poor,  and  color 
representation  was  inconsistent.  The  acquisition 
coverage  was  poor  mainly  because  of  resource  limita- 
tions at  the  NASA  Goddard  Space  Flight  Center 
(GSFC).  More  acquisition  coverage  would  have  been 
beneficial  at  this  stage  of  LACIE  because  no  docu- 
mented wheat  key  was  available.  No  current-year 
growth  stage  adjustments  were  available  during  (his 
phase  cither;  thus,  the  signature  variability  problem 
was  compounded. 

The  inadequacy  of  the  information  available  to 
the  Al  made  his  labeling  task  very  difficult.  In  addi- 
tion, the  Al  had  to  assume  the  manual  drudgery  of 
many  of  the  data  handling  tasks  (fig.  12).  The 
analyst's  major  functions  and  his  interaction  with 
the  operational  system  arc  further  discussed  in  the 
following  paragraphs.  This  discussion  concentrates 
on  the  basic  initial  characteristics  of  the  system  in 
Phase  I.  Improvements  made  in  Phases  II  and  III  are 
discussed  subsequently. 

.Vnrwwrjc. — Not  ull  the  acquisitions  received  at 
JSC  were  acceptable  for  further  processing.  The  ini- 
tial system  implemented  at  GSI-'C  to  extract  LACIF. 
segments,  though  improved  in  later  phases  of 
LACIF,  allowed  a substantial  number  of  segments 
with  excessive  cloud  cover,  data  dropouts  (especially 
with  Landsat-I).  and  misregistration.  The  analyst 
had  to  prescreen  all  the  imagery  and  select  for 
further  processing  only  those  acquisitions  that 
satisfied  a prespecified  criterion. 

Definition  of  tnunin ,c  areas. — The  initial  system 
used  the  typical  classification  approaches;  i.e..  it  re- 
quired that  the  analyst  identify,  select,  and  label 
training  fields  for  the  classifier  and  test  llelds  for 
evaluation.  Each  spectral  signature  had  to  he  repre- 
sented in  the  training  fields  selected.  Figure  13  ex- 
emplifies a typical  selection  of  fields.  Sufficient  sam- 
ples of  each  signature  also  had  to  be  defined  so  that 
sufficient  statistics  could  be  calculated  for  the 
classifier.  These  steps  had  to  he  repeated  for  the  test 
fields.  Then,  the  analyst  labeled  all  the  selected  fields 
us  wheat  or  nonwheat  using  the  labeling  logic. 

The  problems  with  this  upproach  were  numerous. 
It  was  very  difficult  for  the  analyst  to  identify  all  the 
spectral  signatures.  If  sufficient  samples  were  not 
allocated  in  the  correct  proportion  to  the  classes 


identified,  then  subsequent  classifications  were 
found  to  be  affected.  These  tasks  were  very  tedious 
and  time-consuming  and  detracted  the  analyst  from 
his  most  important  task  — labeling  the  signatures. 
These  problems  had  the  additional  undesirable  effect 
of  indirectly  making  multitemporal  processing 
operationally  difficult,  if  not  impossible. 

Data  base  updating.— Dm  base  updating  the 
CAMS  functional  (low  was  a scries  of  manual  func- 
tions to  go  from  signature  labeling  to  segment 
classification.  The  analyst  manually  had  to  delineate 
on  the  imagery  the  vertices  of  all  the  selected  fields 
(from  40  to  more  than  100).  In  a separate  operation, 
the  vertices  were  read  with  a grid-encoded  magnifier, 
recorded  on  keypunch  forms,  and  subsequently  en- 
tered into  the  processing  data  base.  About  halfway 
through  Phase  I,  a technique  that  used  a semiauto* 
muled  digitizer  for  field  delineation  was  developed. 
The  output  was  then  reformatted  for  compatibility 
with  the  data  base.  This  task  was  less  tedious  but  still 
somewhat  time-consuming.  The  next  task,  was  to 
complete  all  the  forms,  card  decks,  etc.,  necessary  to 
submit  the  job  for  machine  processing. 

Machine  pnkvssing.— Once  the  fields  were  suc- 
cessfully loaded  into  (he  data  base,  the  DPA  was 
notified.  All  machine  processing  was  interactive. 
Many  techniques  were  attempted;  class  statistics  and 
cluster  statistics  (obtained  by  clustering  the  fields  of 
each  category)  for  maximum-likelihood  classifica- 
tion. multitemporal  classifications,  and  signature  ex- 
tension runs.  Hallway  through  the  phase,  hatch 
capabilities  and  statistical  manipulations  for  sig- 
nature extension  were  added  to  the  system.  The  re- 
work rate  in  Phase  I was  250  percent;  signature  ex- 
tension attempts  were  unsuccessful  and  therefore 
abandoned  in  Phase  II;  and  the  majority  of  the 
multitemporal  classifications  was  unsatisfactory 
because  of  misregistration  and  Al  procedural  label- 
ing deficiencies. 

f valuation  of  results. — Before  the  submission  of 
the  resultant  segment  wheat  proportion  estimate 
from  the  machine  classification,  a final  review  of  the 
analysis  output  was  conducted.  This  was  ac- 
complished by  examining  the  probabilities  of  correct 
classification  (PCX's)  of  the  analyst-selected  training 
and  test  fields.  If  the  PCX's  were  above  preset  values, 
the  proportion  estimate  was  deemed  satisfactory. 
However,  in  audition,  the  classification  map  was 
manually  correlated  with  the  color-infrared  imagery 
to  ascertain  whether  the  wheat  areas  and  nonwheat 
areas  appeared  to  be  in  good  agreement  with  the 
classification  map. 
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FIGURE  12. — Initial  dal*  processing  tasks. 


Overall  Integration  of  CAMS 
Operations  Into  LACIE 

Phase  I. — At  the  conclusion  of  Phase  I.  several 
problems  became  obvious.  Most  of  these  were  due  to 
the  operational  problems  inherent  with  the  large 
throughput  requirements.  Technology  that  was 
straightforward  when  performing  remote-sensing  ap- 
plications in  a low-key  environment  became  very 
cumbersome  when  applied  to  a large  number  of  seg- 
ments. Naturally,  these  problems  were  compounded 
by  the  constraint  of  there  being  no  ground  data  for 
the  generation  of  training  statistics.  More  impor- 
tantly, no  ground  data  were  available  to  evaluate  the 
performance  of  the  system  during  operations.  This 
mode  was  new  but  understandably  necessary  for  the 
overall  goals  and  objectives  of  LACIE. 


Phase  II. — Because  of  the  problems  identified  in 
Phase  I,  it  became  appurent  that  new  approaches  to 
the  LACIE  environment  would  have  to  be 
developed  during  Phase  II.  One  of  the  obvious  prob- 
lems in  Phase  I was  the  segregation  of  the  Al  and  the 
DP  A.  It  was  very  difficult  for  each  to  perform  his  job 
satisfactorily  without  a dear  understanding  of  the 
other  half  of  the  analysis.  A successful  attempt  to 
resolve  this  problem  was  made  in  Phase  II  when  two- 
man  teams  were  formed  in  a cross-trainin  ; effort  to 
produce  end-to-end  analysts.  Increased  acquisition 
coverage  (all  clear  acquisitions)  and  improved  film 
products  were  also  aids  to  the  analyst.  Documented 
decision  logic  procedures  came  into  existence,  and 
Robertson  growth  stage  adjustments  were  available 
to  refine  the  crop  calendars  for  the  current  crop  year. 

Training  fields  were  still  used  to  train  the 
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classifier,  but  more  emphasis  was  placed  on 
multitemporal  signature  selection  when  processing 
was  multitemporal.  Because  this  task  was  very 
tedious  and  time-consuming,  multitemporal  proc- 
essing was  avoided  as  much  as  possible.  Another 
deterrent  for  multitemporal  processing  was  a 
machine  requirement  that  when  more  channels  were 
used  for  processing,  the  training  sample  had  to  be 
larger  to  prevent  excessive  thresholding.  One  of  the 
major  problems  with  Phase  II  was  the  increased 
processing  load  (from  700  to  1700  segments).  Some 
operational  efficiencies  had  to  be  developed  and  im- 
plemented to  allow  the  analyst  to  increase  his 
throughput. 

A “no  significant  change”  procedure  was  in- 
troduced in  Phase  II  to  expedite  processing.  If  the 
analyst  thought  that  processing  a new  acquisition 
would  not  significantly  change  a previously  satisfac- 
tory estimate  for  the  segment,  the  code  was  assigned 
to  the  new  acquisition,  no  machine  processing  occur- 
red, and  the  least  satisfactory  estimate  was  used  for 
the  aggregation.  In  addition,  because  of  a lack  of 
sufficient  samples,  segments  subjectively  thought  to 
contain  less  than  5 percent  wheat  were  processed  by 
handcounting  the  individually  labeled  wheat  pixels. 
Although  these  processes  helped  reduce  the  backlog, 
the  subjectivity  involved  made  them  technically  un- 
desirable. 

During  Phase  II,  a data  base  management  system 
was  implemented  to  relieve  the  analyst  of  a signifi- 
cant portion  of  the  data  base  updating  task.  This 
system  performed  all  of  the  reformatting  routines. 


quality  checks,  and  status  and  tracking  job  submis- 
sions. A batch  processing  capability  was  also  imple- 
mented: it  eliminated  all  the  interactive  processing 
except  for  reworks  (approximately  25  percent). 

In  addition  to  the  analyst's  functions,  an  indepen- 
dent group  of  personnel  performed  a final  quality 
assurance  review  of  each  segment  analysis  before 
submittal  for  subsequent  aggregations.  The  group's 
primary  function  was  to  ensure  that  established  pro- 
cedures were  followed  during  the  analysis  so  that 
subsequent  evaluations  and  problem-solving  tasks 
could  be  conducted  on  a consistent  and  controlled 
data  set.  In  addition,  the  group  reviewed  the  seg- 
ments in  a regional  framework  to  ascertain  whether 
specific  spectral  signatures  were  consistently  labeled 
and  whether  the  signatures  interfaced  with  the  ag- 
gregation and  sample  allocation  elements  of  the  pro- 
ject. The  findings  were  reported  to  aid  in  problems 
particular  to  those  areas. 

Many  problems  were  resolved  in  Phase  II,  but 
some  needed  more  attention.  Labeling  was  a major 
problem  in  Phase  I;  however,  with  a year's  ex- 
perience, the  analyst  gained  confidence  in  labeling  by 
having  a mental  wheat  key.  The  decision  logic  was 
quite  general,  however,  and  did  not  cover  the 
variability  of  wheat  signatures  under  different  grow- 
ing conditions  (i.e.,  drought,  winterkill,  dryland, 
cropping,  irrigation). 

Another  problem  area  was  analyst  bias.  The  con- 
tinued tendency  to  underestimate  caused  further 
questioning  of  the  assumption  that  the  analyst  could 
accurately  sample  and  label  the  segment  with  only 
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imagery  lor  spectral  information.  The  classifier  de- 
pended on  training  sample  si/es  in  proportion  to  the 
spectral  signature  across  the  entire  segment.  This 
was  a difficult  task  for  the  analyst  to  achieve  for  all 
the  possible  combinations  of  multitcmporal  classes 
front  imagery  aione.  Consistency  between  analysts 
was  impossible  to  maintain  with  this  type  of  pro- 
cedure. II  two  analysts  were  given  the  same  segment 
to  process,  they  would  use  different  quantity  and 
population  training  fields,  and  their  proportion  esti- 
mates  could  vary  considerably.  To  begin  to  develop 
solutions  to  the  underestimation  problem,  it  was  ob- 
vious that  an  opportunity  to  assess  the  Landsat  data 
at  a more  detailed  level  was  necessary. 

fhasc  III—  Solutions  to  the  labeling  and  bias 
problems  were  the  major  goals  of  Phase  III  In- 
terpretation teams  were  formed  so  that  the  in- 
terpretation process  could  be  a consensus  of  a group 
of ,hree  10  five  people  with  diverse  backgrounds  and 
levels  ol  experience.  A supplemental  Hint  product, 
product  .1  dig  .1)  and  Analyst  Interpretation  Keys,  as 
discussed  previously,  were  available  to  the  analyst 
during  this  phase  as  labeling  aids.  However,  a more 
serious  signature  masking  effect  due  to  the  film 
generation  process  was  corrected  for  Phase  III  Note 
the  fields  outlined  on  the  color-infrared  product  in 
hgure  14(a).  In  film  space,  little  or  no  signature 
difference  is  apparent.  Note  the  position  of  these 
same  fields  in  figure  14(b),  which  is  a spectral  scatter 
plot  of  the  actual  digital  data  from  Landsat  channels 
: and  .1.  Thus,  to  improve  the  analysts  labeling 
capability,  spectral  aids,  scatter  and  trajectory  plots 
°l  thc  Landsat  digital  data,  were  made  available  to 
the  analyst  lor  the  first  time,  but  they  were  only  ini- 
tiall)  used  as  an  evaluation  tool  to  check  dot  label 
consistency.  The  spectral  data  were  transformed  into 
two  variables,  green  number  and  brightness  (refs.  7 
and  8).  figure  15  shows  scatter  plots  of  analyst- 
labeled  dots  for  segment  1528  Figure  lb  shows  sam- 
ple trajectory  plots  for  winter  wheat,  spring  wheat 
and  nonwheal  pixels. 

I he  implementation  or  Procedure  I during  Phase 
HI  originated  from  a cluster-based  procedure 
designed  for  small-fields  areas.  This  procedure  was 
refined  and  improved  by  considering  analyst  bias, 
machine  bias,  and  efficiency.  A systematic  grid  (fig. 

I7)  was  developed  lor  dot  labeling  so  that  sampling 
among  analysts  would  he  more  consistent  and  sig- 
nature sampling  more  statistically  accurate  across  the 
whole  segment.  Thirty  to  fifty  dots  are  identified  to 
start  and  label  the  clusters,  and  another  40  to  60  dots 
are  labeled  and  used  as  .1  stratified  areal  sample  after 


not  RK  14.  Relationship  of  l.andsat  imager*  to  the  spectral 
scatter  plot,  (a)  Product  I.  (bl  Scatter  plot. 


classification  to  correct  for  machine  bias  and  to 
reduce  reworks. 


( lustering  is  the  process  ol  grouping  pixels  ac- 
cording to  some  distance  measure  (ref.  4)  The  pixel 

vector  ( \ = v,..vj v,/  n = number  of  channels)  of 

each  ol  the  22  932  pixels  in  the  segment  was  com- 
pared with  the  pixel  vector  of  each  of  the  20  starting 

rims  ( ) = ruis....j-(|).  Fach  pixel  was  assigned  to  the 
closest  starting  dot 
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where  m is  the  number  of  channels  used  in  cluster- 
ing. 
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A lor  all  pixels  were  assigned,  ihe  mean  and  the 
standard  deviation  of  each  cluster  were  computed  for 
each  channel,  Figure  18(a)  is  a color-coded  cluster 
map  ot  segment  1528.  Each  cluster  is  assigned  a 
unique  color,  hgurc  18(h)  is  a conditional  cluster 
map.  II  ihe  distance  between  the  cluster  and  the  dot 
that  labeled  it  was  within  a specified  threshold  of  a 
do.  labeled  "S."  "W.«  or  "N"  (spring  wheat,  winter 
wheat,  or  nonwheat),  the  cluster  was  color-coded 
green,  cyan,  or  yellow,  respectively.  If  it  was  not 
»Mhm  the  threshold,  the  cluster  was  assigned  a 
unique  color  The  cluster  was  then  automatically 
labeled  by  the  closest  dot.  and  the  cluster  statistics 
were  used  to  classify  the  segment.  In  classification 
the  probability  of  each  pixel  in  the  segment  belong- 
ing to  each  cluster  is  computed.  Values  are  summed 
to  the  category  level.  Pixel  assignment  is  made  to  the 
most  likely  category  and  (hen  to  ihe  most  likely 
cluster  or  subclass  within  that  category.  Alter 
classification,  thresholding  is  applied  10  remove  from 
the  final  classification  results  pixels  with  a low  pro- 
bability o!  belonging  to  the  assigned  subclass  (ref. 
lOM ^gure  18(c)  is  a classification  map  of  segment 

The  analyst  time  line  decreased  by  75  percent  (fig. 
Ig>  lie  no  longer  had  lo  delineate  and  digitize  fields 
He  simply  labeled  the  prescribed  number  of  dots; 
tilled  out  a process  request  form;  and.  after  the 
classification  was  complete,  evaluated  the  results 
(fig.  2d).  I he  evaluation  process  was  also  designed  to 
he  more  objective.  Ihe  percent  of  correctly  classified 
labeled  dots  and  the  variance  between  the  machine 
estimate  and  bias-corrected  estimate  were  used  to 
determine  whether  the  results  were  satisfaction  or 
unsatisfactory.  Less  than  10  percent  of  the  acqu.si- 
lions  processed  required  rework  Acquisitions  with 
low  PCC's  were  machine-reworked  by  relabeling  the 
clusters  and  reclassifying.  When  the  PCCs  were 
marginal  and  the  variance  was  large,  a dot  rework 
was  performed  by  manually  labeling  more  bias  cor- 
rection dots  and  recomputing  the  bias  correction  esti- 
mate and  variance. 

Ihe  implementation  of  Procedure  I for  Phase  III 
and  significant  improvements  in  the  data  manage- 
ment system  were  major  breakthroughs  and  account 
lor  the  biggest  success  stories  in  LACIE  This  vast 
improvement  in  the  allocation  of  tasks  to  man  and 
machine  allowed  the  accomplishment  of  the  in- 
creased scope  of  Phase  III  with  no  reduction  in  per- 
lormance  In  fact,  because  more  tasks  were  traits- 
I erred  to  the  machine  (fig  21 ).  the  analv  st  could  now 
concentrate  on  the  labeling  function  and.  with  the 
improved  products,  do  a better  job 
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SUMMARY 

The  key  accomplishments  of  the  CAMS  opera- 
tions arc  related  to  the  accuracy  and  throughput  goals 
of  i he  basic  LAClIi  output  requirements. 


Accuracy  Goals 

The  accuracy  goals  achieved  are  best  described  in 
the  paper  by  Potter  entitled  "Accuracy  and  Perfor- 
mance Characteristics  of  l At  II  Area  I stimates  " 
However,  it  should  be  emphasized  that  the  per-seg- 
nicnt  errors  in  bias  and  variance  measured  against 
ground  truth  decreased  significantly  from  Phase  I to 
Phase  111  The  decrease  in  variance,  noted  during 
Phase  II  and  continued  into  Phase  III.  is  attributed  to 
the  stabilization  of  repeatable  analyst  procedures 
The  bias  component  of  error  which  was  manifested 
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in  Phase  I and  somewhat  in  Phase  II  by  underesti- 
mates of  wheat  proportion,  especially  in  the  spring 
wheal  regions,  is  attributed  to  weaknesses  in  the 
labeling  decision  logic  In  Phase  I.  the  decision  logic 
was  fairly  simple;  i e . there  was  very  little  allowance 
lor  the  total  within-segmeni  signature  variability 
This  logic  tended  to  result  in  errors  of  omission  (oi  l- 
nominal  signature  of  wheat  called  nonwheat)  and 
thus  in  the  underestimation  of  the  true  wheat  pro- 
portion However,  the  process  matured  during  the 
final  phases  of  LAC'll  and  the  bias  was  reduced  to 
insignificant  levels,  except  for  the  highly  concen- 
trated regions  of  spring  wheat  and  spring  barley  The 
primary  factors  that  helped  support  this  decrease  in 
bus  were  overall  improvements  in  the  decision  logic, 
the  compilation  of  the  Analyst  Interpretation  kevs. 
the  incorporation  ol  digital  spectral  aids,  and  the 
implementation  of  Procedure  I into  the  C AMS 
operation 


Throughput  Goals 

The  significance  of  Procedure  1 to  the  C AMS  ac- 
curacy goals  can  best  be  appreciated  when  one  recalls 
that  the  initial  motivation  leading  to  its  design  and 
subsequent  implementation  was  lor  increasing  the 
t \MS  throughput  In  Phase  I the  scope  for  ( VMS 
was  7(Kl  segments  and  2(MHI  acquisitions  In  Phase  II. 
it  increased  to  17(H)  segments  and  9000  acquisitions 
To  accommodate  this  increased  load,  questionable 
schemes,  such  as  handcounis  and  "no  s gmficant 
change"  procedures,  had  to  be  used  t>  keep  the 
l'  VMS  operation  from  bogging  down  It  became  ap- 
parent very  quickly  that  to  meet  the  Phase  III  scope 
of  MM)  segments  and  IK  000  segments,  a more  tech- 
nically palatable  technique  was  necessary,  thus.  Pro- 
cedure I was  developed 

During  this  period,  other  elements  of  CAMS,  con- 
cerned with  the  systems  aspects  ol  the  processing 


problem,  noted  the  substantial  number  of  functions 
being  performed  by  the  analyst  which  dealt  more 
with  data  handling  than  with  data  information  ex* 
traction.  In  tact,  the  majority  of  his  segment  time 
line  was  spent  on  date-handling  functions  instead  of 
data  analysis.  The  implementation  of  a data  manage- 
ment system  to  relieve  the  analyst  of  the  data  han- 
dling. statusing,  and  tracking  was  the  significant  con- 
tributor to  the  accomplishment  of  the  segment  load- 
ing and  throughput  goals. 

The  development  and  implementation  of  Pro- 
cedure I represents  one  of  the  first  really  significant 
approaches  to  solving  some  of  the  major  man- 
machine  interactions  involved  in  a large-scale 
classification  application.  Comparison  of  figure  12  to 
figure  21  readily  shows  the  improvement  in  the  man- 
machine  distribution  of  tasks  as  realized  by  the  Pro- 
cedure I concept.  In  addition,  as  shown  in  figure  19, 
the  decrease  in  the  analyst’s  time  spent  per  segment 
between  Phase  i and  Phase  III  using  Procedure  I is 
significant.  The  payoff  comes  not  only  in  more 
throughput  realized  per  analyst  but  t»»  o in  probable 
improvements  in  accuracy  because  o i better  alloca- 
tion of  analyst  tasks.  For  example,  although  the 
analyst's  total  time  was  reduced  from  12  to  14  hours 
to  3 to  4 hours  per  segment,  his  actual  interpretation 
time  was  increased  from  approximately  1 hour  to  2 
hours  per  segment.  Thus,  the  analyst  spent  less  time 
on  clerical  tasks  and  more  time  on  interpretation  and 
labeling. 

These  accomplishments  indicate  that  LACIE  has 
indeed  been  successful.  The  mistakes  made  within 
CAMS  were  many;  however,  weighed  against  the 
stage  CAMS  personnel  were  in  3 or  4 years  ago,  with- 
out state-of-the-art  technology  because  of  LACIE. 
they  now  seem  trivial  or  at  least  worthwhile  as  ex- 
perience gained. 

This  paper  has  attempted  to  capture  the  signifi- 
cant highlights  of  the  total  CAMS  experience  during 
LACIE.  It  has  been  very  difficult  for  CAMS  person- 
nel intensely  involved  in  the  daily  workings  of 
CAMS  with  its  broad  spectrum  of  technological  and 
operational  activities  to  “broadbrush"  a happening 
that  dealt  its  full  measure  of  emotion  on  their  profes- 
sional and  personal  lives  for  3.$  years.  Most 
hopefully,  the  contributions  of  the  CAMS  analysts  to 
the  success  of  CAMS  in  LACIE  have  been  suffi- 


ciently evident,  for  these  contributions  represent  the 
backbone  of  LACIE  and  the  builders  of  the  pro- 
grams to  follow. 
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Concepts  Leading  to  the  IMAGE*  1 00  Hybrid 
Interactive  System 

T.  F.  MacklrflandJ.  M.  Sulestcr^ 


SUMMARY 

As  LACIE  Procedure  1 evolved  from  the 
Classification  and  Mensuration  Subsystem  small* 
f elds  procedures,  it  became  evident  that  two  com* 
pitational  systems  would  have  merit— the 
LACIE/Earth  Resources  Interactive  Processing 
System  based  on  a large  IBM-360  computer  oriented 
for  operational  use  with  high  computational 
throughput,  and  a smaller,  highly  interactive  system 
based  on  a PDP  11*45  minicomputer  and  its  display 
system,  the  IMAGE-100.  The  latter  had  advantages 
for  certain  phases;  notably,  interactive  spectral  aids 
could  be  implemented  quite  rapidly.  This  would 
allow  testing  and  development  of  Procedure  1 before 
its  implementation  on  the  LACIE/Earth  Resources 
Interactive  Processing  System.  The  resulting 
minicomputer  system,  called  the  Classification  and 
Mensuration  Subsystem  IMAGE-100  Hybrid 
System,  allowed  Procedure-1  operations  to  be  per- 
formed interactively,  except  for  clustering,  classifica- 
tion. and  automatic  selection  of  “best”  acquisitions, 
which  were  offloaded  to  the  LACIE/Earth 
Resources  Interactive  Processing  System.  This  arti- 
cle comments  on  the  development  and  use  of  this 
new  system. 


BACKGROUND 

As  Procedure  I (P-1)  evolved  from  the  Classifica- 
tion and  Mensuration  Subsystem  (CAMS)  smalt- 
fields  procedures,  it  became  evident  that  most  of  its 
major  elements  could  be  easily  incorporated  into  the 
LACIE/Earth  Resources  Interactive  Processing 
System  (LACI  E/E  RIPS).  These  included  the  use  of 
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dots,  bias  corrections,  selection  of  best  acquisitions, 
clustering  from  starting  vectors,  automatic  cluster 
labeling,  and  classification  with  labeled  cluster 
statistics. 

It  was  also  apparent  that  some  functions  could  not 
be  implemented  as  quickly  as  desired.  These  in- 
cluded the  complete  set  of  spectral  aids  to  assist  in 
labeling  dots  in  real  time.  In  addition,  an  easily  used 
rework  capability  would  not  be  implemented  on  a 
timely  basis  on  the  LACIE/ERIPS,  where  segments 
were  required  to  be  reclassified  by  reworking. 

The  overall  throughput  time  line  of  the  rystem 
was  also  an  issue.  It  was  thought  that  a more  directly 
interactive  . vstem  would  allow  an  analyst  to  com- 
plete real-time  analysis  on  the  system  in  a conven- 
ient way  with  a shorter  data  turnaround  time.  It  was 
also  apparent  that  a fast,  more  direct  system  would 
allow  more  experimentation  with  P-I . Much  testirg 
and  development  would  be  necessary  to  refine  the 
concepts  of  P-I;  to  choose  appropriate  parameters; 
and,  in  particular,  to  define  the  specific  spectral  aids 
and  procedures  to  be  used  in  analysis. 


IMAGE-100 

At  the  time  P-1  was  being  developed,  the  Interac- 
tive Multispectral  Image  Analysis  System  model  100 
(IMAGE-100),  coupled  with  a Programmed  Data 
Processor  model  11-45  (PDP  11-45)  minicomputer, 
was  used  for  supporting  research  in  several  activities 
in  the  Earth  Observations  Division  (EOD)  at  the 
NASA  Johnson  Space  Center  (JSC).  The  IM- 
AGE-100. built  by  the  General  Electric  Corporation, 
was  widely  considered  to  be  an  advanced  system 
when  the  ECD  had  acquired  it  only  2 years  earlier, 
and  it  had  been  continuously  upgraded  since  that 
time. 

Several  unique  features  of  the  IMAGE-100  made 
it  ideal  for  implementing  P-I.  Its  display  system  was 
remarkably  flexible  and  efficient;  it  provided  com- 


plete  flexibility  in  color  presentation  of  images;  it 
furnished  a movable  cursor  whose  shape  could  be 
defined  in  several  ways;  and  it  allowed  eight  themes 
(single-bit  images),  all  of  which  could  be  used  to 
store  and  display  results  of  image  operations  such  as 
classifications  and  cluster  maps.  These  bit  images 
could  be  combined  in  Boolean  operations  to  make 
still  others,  and  the  results  could  also  be  stored  and 
displayed  as  bit  images.  All  bit  images  could  be  dis- 
played alone,  in  combination  with  each  other,  or  as 
overlays  to  a regular  image. 

In  addition  to  these  capabilities,  which  probably 
did  not  exist  on  any  other  commercial  machine  at 
the  time,  there  was  a “pixel-alarming"  capability. 
This  unique  feature  allowed  a pixel  (picture  element) 
or  a group  of  pixels  to  be  identified  in  one  of  many 
ways  and  then  all  other  identical  pixels  to  be  auto- 
matically identified.  Pixels  could  be  chosen  by  the 
cursor  or  identified  manually  on  a histogram  dis- 
played on  its  graphics  screen.  Once  identified,  all 
identical  pixels  in  the  scene  were  alarmed  or 
“flashed"  in  an  operation  that  was  apparently  instan- 
taneous because  it  depended  on  hard-wired  pro- 
grams. This  capability  was  especially  promising  for 
the  spectral  aids  visualized  for  P-1. 

The  IMAGE-100  had  other  advantages.  It  already 
existed  in  the  Data  Techniques  Laboratory  of  EOD 
and  hence  required  no  lengthy  procurement. 
Although  used  for  other  projects,  it  could  be  dedi- 
cated to  the  needs  of  LACIE;  and,  quite  importantly, 
in  the  work  force  were  applications  programers  with 
the  depth  and  experience  to  design  changes  and  pre- 
pare the  programs  in  a very  short  time. 


Designing  the  Hybrid  System 

The  decision  was  made  to  implement  P-1  on  both 
systems — the  ERIPS  for  production  and  the  IM- 
AGE-100 mainly  for  accuracy — and  a large  number 
of  personnel  began  working  out  details  for  both 
systems.  Some  of  the  personnel  were  involved  in  the 
finalization  of  the  concepts  of  P-1  without  regard  to 
implementation  on  any  one  specific  hard- 
ware/software system.  Others  worked  at  the  func- 
tional design  of  P-1  as  it  could  be  implemented  on 
the  IMAGE-100  (concurrently,  the  same  was  being 
done  for  the  LACIE/ERIPS). 

Consideration  of  total  run  time,  admittedly  based 
only  on  educated  guesses,  suggested  that  the  cluster- 
ing algorithms  in  particular  would  take  an  inordinate 
amount  of  time  on  the  PDP  11-45  computer. 


Although  this  was  never  directly  confirmed,  an  early 
decision  was  made  to  offload  to  the  LACIE/ERIPS 
the  three  most  lengthy  computations — mustering, 
classifying,  and  selecting  the  best  acquisitions.  That 
is,  these  three  functions  would  be  performed  on  the 
LACIE/ERIPS  in  batch  runs,  whereas  all  other 
operations  in  P-1  were  to  be  run  on  the  IMAGE-100 
system.  In  this  sense,  the  IMAGE-100  form  would 
be  known  as  the  CAMS  IMAGE-100  Hybrid  System. 

The  most  notable  feature  of  the  final  design  was 
the  spectral  aids  to  be  made  available  to  the  analyst, 
including  both  trajectory  plots  and  spectral  plots 
Both  would  be  displayed  on  the  IMAGE-100  screen 
and  be  available  in  real  time.  They  would  allow  com- 
plete flexibility  of  display  colors,  placement  on 
screen,  etc. 

The  spectral  plots,  including  the  Kauth  coordi- 
nates, were  to  be  made  available  for  any  two  chan- 
nels. As  an  example,  an  analyst  might  choose  to  dis- 
play all  pixels  in  clusters  labeled  as  wheat,  with  green 
number  and  brightness  as  coordinates.  Full  use  was 
made  also  of  alarming — the  analyst  could  encircle  a 
dot  on  the  spectral  plot  with  the  cursor  and  it  would 
be  alarmed  on  the  color-infrared  (CIR)  image  of  the 
acquisition,  along  with  all  other  identical  pixels.  In 
addition,  dots  could  be  displayed  in  arbitrary  colors; 
however,  the  most  useful  and  most  generally  used 
display  of  the  dot  was  in  the  same  colors  as  the  actual 
image  being  displayed. 


DEVELOPMENT  OF  THE  SYSTEM 

The  baseline  document  on  requirements  for  the 
system  became  available  in  June  1977.  For  the  next 
month,  an  implementation  design  team  worked  to 
define  the  configuration  of  the  hybrid  system  and  to 
develop  a specification  document,  which  was 
published  prior  to  the  design  review  held  on  Febru- 
ary 1,  1977.  The  system  was  completed  and  delivered 
on  June  2, 1977. 

Although  multitemporal  analysis  had  existed  pre- 
viously, it  was  not  more  convenient.  In  the  new 
system,  the  labels  and  types  of  all  dots  were  retained 
from  a given  analysis;  therefore,  analysis  of  subse- 
quent acquisitions  could  be  made  with  the  same  set, 
perhaps  with  minor  modifications.  This  further 
reduced  analyst  contact  time  on  the  IMAGE-100 
system.  Because  the  computer  required  to  do  the 
large  computation  loads  was  no  longer  actively  in- 
volved in  intei active  displays,  higher  volume  could 
be  realized,  if  required. 
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The  goal  of  P-l  on  the  hybrid  system  was  to  pro- 
vide for  more  accurate  labeling  by  use  of  the  interac- 
tive aids.  The  interactive  terminal  or  the  IM- 
AGE- 100/PDP  11-45  system  allows  the  display  of 
conditional  and  unconditional  cluster  maps  and 
classification  maps  and  a summary  of  the  dot  data 
base  with  labels  for  all  dots  (pixels  which  fall  on  the 
grid  overlay).  The  analyst  can  then  display  the  CIR 
image  and  the  conditional  clusters  and  alarm  or  flash 
all  pixels  which  fall  in  a cluster  on  the  CIR.  He  may 
select  additional  pixels  to  label,  elect  to  relabel  pixels 
already  labeled,  or  relabel  clusters.  He  may  choose  to 
display  a trajectory  plot  from  the  grid  for  a!  I acquisi- 
tions, an  arbitrary  area  (“window")  on  the  screen, 
with  one  pixel's  trajectory  per  window.  He  might 
relabel  bias  correction  dots,  recompute  any  classifi- 
cation error,  and  finally  update  the  data  base  for  use 
in  the  analysis  of  the  next  acquisition. 


IMAGE*  100  HYBRID  8YSTEM  DE8IGN 

The  functions  required  for  an  interactive  analysis 
with  P-l  on  the  hybrid  system  are  defined  in  a 
diagram  of  the  normal  workflow  (fig.  1).  The  three 
types  of  activities  shown  are  interactive  processing, 
using  the  IMAGE-100  system;  manual  (off-line) 
processing;  and  computations  performed  on  the  IBM 
360-75  ERIPS.  Independently,  segments  can  be 
reworked  in  the  same  way. 

Before  an  analysis,  it  is  necessary  to  either  build  a 
data  base  (in  the  case  of  stai  tup)  or  update  the  data 
base  when  new  acquisitions  are  received.  The  soft- 
ware modules  required  for  these  activities  for  the  hy- 
brid system  include  imagery  update,  directory  up- 
date. DO/DU  (designated  other  /designated  uniden- 
tifiable) update,  DO/DU  offload,  dot  data  file  gener- 
ate, dot  data  file  update,  dot  data  file  offload. 


IMAGE-100  ERIPS 


FIGURE  I. — Normal  workflow  or  CAMS  IMAGE-100  Hybrid  System. 
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CAMS/Crop  Assessment  Subsystem  (CAS) 
statistics  file  build,  and  CAMS/CAS  data  base  up- 
date. 

Once  the  data  are  available  for  work,  software 
modules  are  required  in  the  following  sequence. 

1.  Initiate  segment  analysis 

2.  IMAGE-100  control 

3.  Image  display 

4.  Field  definition 

5.  Dot  crosshair 

6.  Dot  scatter  plot 

7.  Theme  logical 

8.  Window  erase 

9.  Dot  selection 

10.  Single-dot  labeling 

11.  Automatic  cluster  labeling 

12.  Classification  map  display 

13.  Recompute  proportions 

14.  Cluster  map  main 

15.  Unconditional  cluster  map  display 

16.  Conditional  cluster  map  display 

17.  Mixed  cluster  map  display 

18.  Display  report  generator 

19.  Reports 

These  software  modules  provide  for  the  type  of 
display  on  the  screen  of  the  IMAGE-100  that  is 
shown  in  figure  2.  It  should  be  added  that,  in  the 
spectral  plot  area,  the  axes  normally  were  Kauth 
green  number  versus  brightness,  but  greenness  ver- 
sus time  or  individual  bands  could  also  be  displayed. 

The  system  disk  and  data  disk  configurations  are 
defined  in  figure  3.  It  should  be  noted  that  the  CAMS 

CRT  DISPLAY 


392  512 


IMAGE-100  Hybrid  System  software  is  a complete 
system  operating  with  its  own  “driver”  or  “control” 
program  and  interfaces  directly  with  the  hardware 
operating  system  software,  the  Resource  Sharing  Ex- 
ecutive model  I ID  (RSX-1  ID).  The  control  program 
can  access  the  image  library  or  other  system  software 
modules  outside  the  CAMS  software,  but  all  com- 
munication or  transfer  of  data  or  information  from 
within  the  CAMS  software  to  “outside”  modules 
must  go  through  the  CAMS  “driver”  or  “control” 
program. 

The  data  disk  shows  functionally  the  arrangement 
of  the  data  files  retained  in  the  operating  data  base.  A 
maximum  of  six  acquisitions  per  segment  is  retained. 
To  load  an  additional  acquisition,  one  of  the  existing 
acquisitions  must  be  deleted. 

The  hardware  configuration  for  which  the  soft- 
ware was  developed  is  shown  in  figure  4.  The  interac- 
tive software  modules  provide  the  analyst  with  all  of 
the  tools  necessary  to  select,  one  by  one,  all  of  the 
grid  overlay  dots  or  pixels;  to  alarm  the  pixel  in  the 
spectral  domain  and  the  C1R  image;  and  to  show  the 
trajectory  of  the  vegetation  through  time  represented 
spectrally.  Knowing  the  spectral  signature  of  wheat 
and  the  normal  crop  planting  and  growth  cycle  of 
wheat,  the  analyst  can  identify  those  pixels  or  dots 
that  spectrally  represent  wheat  and  label  them  ac- 
cordingly with  the  labeling  module. 

When  labeling  with  the  IMAGE-100  is  complete, 
the  output  in  a card-image  format  is  sent  to  the  IBM 
360-75  for  the  clustering  and  classification  process- 
ing. The  labeled  dots  allow  the  analyst  to  check  his 
identification  of  wheat  against  the  computer's  ability 
to  discriminate  between  spectral  classes  and  provide 
a ratio  of  performance.  He  may  elect  to  relabel  the 
dots  or  pixels  and  start  over;  or  he  may  relabel  whole 
clusters  of  pixels,  change  the  classification,  and 
change  the  statistics  which  reflect  the  percent  of 
wheat  in  the  segment. 


OPERATIONAL  IMPLEMENTATION 

The  operational  data  flow  in  figure  1 and  the  proc- 
ess prior  to  that  operation  in  figure  5 identify  the 
functions  performed  for  analysis  of  a LAC1E  seg- 
ment. 

I.  As  tapes  with  segment  imagery  data  are 
received  from  the  NASA  Goddard  Space  Flight 
Center  (GSFC),  they  are  copied.  One  tape  goes  to  the 
production  film  converter  (PFC)  and  one  goes  to  the 
computer  for  loading  into  the  data  base. 
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FIGURE  3.— System  disk  and  data  configuration. 


FIGURE  4.— CAMS/IMAGE-lOO  Hybrid  System  hardware  configuration. 
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PTL  - PHOTOGRAPHIC  TECHNOLOGY  LABORATORY 
LPDL  - LACIE  PHYSICAL  DATA  LIBRARY 


FIGURE  5. — Data  flow  and  support  services. 


2.  Tile  film  from  the  PFC  conversion  is  processed 
in  the  photographic  laboratory  according  to  estab- 
lished standards.  It  is  then  forwarded  to  the  EOD  for 
combining  with  ancillary  data  and  forwarding  to  the 
analyst. 

3.  The  second  tape  is  used  in  the  computer  to  up- 
date the  image  data  base.  A report  is  generated  from 
this  data  base  so  that  segments  to  be  processed  on  the 
IMAGE-100  can  be  flagged  and  unload  commands 
generated. 

4.  The  unload  commands  for  the  image  data  base 
provide  the  desired  image  tape,  which  is  transmitted 
to  the  CAMS  IMAGE-100  Hybrid  System,  loaded  in 
the  data  base,  and  made  available  by  segment  acquisi- 
tion. 

5.  At  this  point,  the  analyst  calls  up  the  new 
acquisition  for  the  5-  by  6-nautical  . tile  segment, 
defines  the  DO/DU  fields,  and  performs  dot  labeling. 
Me  has  all  of  the  analyst  aids  (such  as  spectral  plots, 
trajectory  plots,  and  class  and  cluster  maps  from  pre- 


vious acquisitions)  at  his  disposal.  These  analyst 
decision  data  on  the  segment  are  output  from  the  IM- 
AGE-100 system  in  card  format  and  submitted  to  the 
IBM  360-75  system  for  a cluster  and  classification 
process. 

6.  The  results  of  the  cluster  and  classification 
processing  output  take  two  forms:  the  cluster  and 
classification  are  defined  in  image  format,  and  the 
classification  summary  information  and  bias  correc- 
tion data  are  output  in  report  form  on  tape. 

7.  The  analyst  may  then  evaluate  his  success  at 
labeling  by  overlaying  the  cluster  and  class  maps  on 
the  CIR  image,  evaluating  the  percent  correctly 
classified,  analyzing  his  cluster  labels  and  specific  dot 
labels,  and  using  the  spectral  plots  and  trajectory 
plots.  He  may  then  decide  to  rework  by  relabeling  if 
necessary,  or  he  can  place  acceptable  final  results  in 
the  data  base. 

The  data  flow  is  controlled  and  managed  effec- 
tively by  the  use  of  standard  operational  procedures 


and  standard  job  orders,  as  described  in  the  paper  on 
Landsat  data  acquisition  and  storage. 

Because  the  CAMS  IMAGE-100  Hybrid  System 
was  developed  for  use  by  analysts  with  a wide  variety 
of  backgrounds,  a thorough  training  program  in  basic 
analysis,  IMAGE-100  software,  and  operations  was 
required.  The  training  included  interpretation 
methods,  interpretation  of  photographs,  regional 
analysis,  multispectral  sensing,  and  signature 
analysis  of  agriculture  data. 

Generally,  analysts  who  are  trained  in  photo- 
interpretation  found  the  use  of  a cathode-ray-tube 
(CRT)  display  in  false  color  somewhat  less  desirable 


than  photographic  imagery;  however,  the  added 
availability  of  analysis  aids,  such  as  spectral  plots, 
trajectory  plots,  and  color  maps  of  clusters  and 
classes,  made  the  data  analysis  task  easier  and, 
hopefully,  decreased  labeling  errors. 

It  was  found,  further,  that  the  increase  in  automa- 
tion could  speed  up  the  analysis  and  provide  much 
more  flexibility  in  performing  one.  The  total  speed 
of  the  overall  system  was  found  to  be  dependent  on 
logistics  control  and  data  handling;  and,  although  the 
analyst’s  “hands-on"  time  decreased  in  this  pro- 
totype system,  total  elapsed  time  was  governed  more 
dramatically  by  the  system  data  flow. 
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USDA  Analyst  Review  of  the  LACIE 
IMAGE- 1 00/Hybrld  System  Test 

P.  Ashbum ,a  K.  BUlow,a  H.  L.  Hansen, b and  G,  A.  May1 


INTRODUCTION 

Late  in  LACIE  Phase  II,  a proposal  was  submitted 
to  implement  and  test  an  interactive  imaging  system 
during  Phase  III  (1976-77).  This  proposal  was  ap- 
proved, and  the  initial  test  of  the  system  was  initi- 
ated in  February  1977.  The  major  purpose  of  the  test 
was  to  provide  and  evaluate  a pseudointeractive 
classification  capability  during  LACIE  that  would 
provide  the  project  some  experience  with  a type  of 
system  that  the  user  (U.S.  Department  of 
Agriculture  (USDA))  would  be  implementing.  The 
test  was  also  to  evaluate  a timeliness  factor  in  seg- 
ment processing. 

Initially,  the  test  was  to  utilize  the  classification 
capabilities  of  the  General  Electric  (GE)  Interactive 
Multispectral  Image  Analysis  System  Model  100 
(IMAGE-100  or  1-100).  This  was  later  changed  to 
programing  and  implementing  a procedure  called 
Procedure  1 (P-1)  on  the  GE  1-100  Hybrid  System. 
This  procedure  called  for  the  labeling  of  single-pixel 
training  fields  (dots)  using  the  1-100  Hybrid  System; 
the  clustering  and  classification  was  to  be  done  on 
the  Earth  Resources  Interactive  Processing  System 
(ERIPS).  Evaluation  and  minor  rework  could  then 
be  done  on  the  I- 100.  This  total  system  was  called  the 
I-100/Hybrid  System. 

NASA,  USDA,  and  the  supporting  contractor, 
Lockheed  Electronics  Company  (LEC),  cooperated 
in  the  planning  of  the  test.  NASA  and  LEC  were  the 
principals  in  programing  and  implementing  the  test. 
USDA  supported  the  test  by  providing  the  primary 
operational  analysts.  Analysts  were  also  provided  by 
LEC  for  instructing  the  USDA  analysts  on  how  to 
use  the  system  and  for  identifying  and  correcting 
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procedural  and  system  utilization  problems  that  were 
encountered. 

The  1-100  test  included  operational  segments  from 
the  U.S.S.R.  and  test  segments  from  Canada  and  the 
United  States.  These  segments  provided  the  USDA 
analyst  with  a wide  range  of  geographic  conditions 
from  which  a broadened  range  of  learning  ex- 
periences was  achieved. 

In  addition  to  training,  the  USDA  analysts  had  the 
opportunity  to  test  and  evaluate  additional 
capabilities  of  the  interactive  imaging  system  and  to 
suggest  how  the  system  and  procedures  could  be  im- 
proved. A number  of  problems  with  hardware,  soft- 
ware, and  procedures  were  identified  and  corrected 
during  this  process. 

Finally,  the  USDA  analysts  learned  that  pro- 
cedural options  were  needed  to  work  with  the  widely 
varying  climatic,  geographic,  and  cultural  conditions 
that  exist  in  the  major  countries  of  the  world. 


THE  TEST 

The  USDA  1-100  analysts  used  essentially  the 
same  procedure  (P-1)  as  the  Classification  and  Men- 
suration Subsystem  (CAMS)  analysts;  details  for 
these  procedures  are  documented  in  the  CAMS  Im- 
age 100/Hybrid  System  Procedures/Requirements 
(LACIE-C00202,  JSC-11669,  Jan.  1977).  The  major 
exception  was  that  USDA  analysts  had  at  their  dis- 
posal an  interactive  system  that  provided  a wide 
range  of  on-line  spectral  aids  that  could  be  used  in 
the  initial  dot-labeling  process.  They  also  had  an  on- 
line capability  for  relabeling  dots  and  clusters. 

The  LACIE  sample  segments  to  be  processed  by 
the  USDA  I- 100  team  were  selected  from  the  U.S.  In- 
tensive Test  Sites  (ITS’s)  (24),  Canada  ITS's  (10), 
Canada  Blind  Sites  (30),  and  Kokchetav,  U.S.S.R. 
(SO).  Each  of  the  four  analysts  had  a list  of  segments 
for  which  he  was  responsible. 

The  segments  were  loaded  onto  the  1-100  disk  and 
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the  analyst  was  advised  which  segments  were  ready 
for  processing.  A total  of  six  different  acquisition 
dates  for  the  same  segment  could  be  loaded  and  re- 
tained on  the  system  at  any  one  time.  The  analyst 
could  call  up  a segment  number  and  view  the  Julian 
date  for  each  acquisition,  then  select  the  date  to  be 
used  for  processing. 

A color-infrared  (CIR)  image  (both  a film  product 
and  a cathode-ray-tube  (CRT)  image)  provided  the 
analyst  with  a visual  product  of  the  segment.  Space 
was  provided  under  the  CRT  image  for  displaying  up 
to  six  different  scatter  plots.  When  appropriate,  the 
analyst  used  this  space  to  build  a scatter  plot  for  each 
acquisition  date. 

The  analyst  viewed  the  image  to  locate  and  outline 
areas  that  should  be  labeled  “DO"  (Designated  Other 
than  wheat)  areas.  He  also  outlined  areas  that  could 
not  be  identified  because  of  clouds  or  cloud  shadows; 
these  areas  were  labeled  “DU"  (Designated  Uniden- 
tifiable). Each  pixel  falling  within  these  outlined 
areas,  including  any  of  209  single-pixel  fields,  would 
no  longer  be  counted  in  the  subsequent  processing  of 
the  segment.  The  DO  and/or  DU  areas  could  be 
viewed  as  a color  overlay  on  the  CRT  image.  After 
the  DO/DU  definition  was  completed,  the  analyst 
proceeded  to  a single-pixel  field  selection  procedure 
where  he  requested  a selection  of  30  or  more 
unlabeled  single-pixel  fields  (dots).  These  did  not  in- 
clude any  of  the  pixels  contained  within  DO/DU 
areas  already  labeled  DO/DU  pixels.  This  selection 
was  stored  in  one  of  the  eight  theme  tracks  and 
viewed  as  an  overlay  on  the  CRT  image.  These  30  or 
more  dots  were  cross-referenced  in  the  scatter  plots 
to  determine  if  the  entire  spectrum  of  the  scene  was 
represented.  The  analyst  then  began  labeling  the 
single-pixel  fields  to  establish  labels  to  be  used  as 
starting  vectors  for  the  clustering  and  classification 
algorithms. 

The  analyst  used  two  methods  for  accessing 
single-pixel  fields  or  dots.  One  was  by  typing  in  the 
number  of  the  dot  and  the  other  was  a cursor  selec- 
tion that  required  putting  a cursor-formed  box 
around  the  dot  that  represented  the  field.  Either  of 
these  two  methods  prompted  a display  of  informa- 
tion about  the  dot,  including  the  green  number  for 
each  acquisition  dale,  the  raw  digital  channel  values 
for  each  date,  the  dot  number,  the  location  by  coordi- 
nates of  the  dot  in  the  scene,  and  any  previous 
analyst  or  classifier  label  for  the  dot.  The  dot  was 
alarmed  in  both  the  CRT  image  and  the  scatter  plots 
at  this  time.  In  addition,  all  other  dots  in  the  image 
with  the  same  radiance  values  were  also  alarmed.  If 


the  analyst  was  unsure  of  a label  for  a dot,  he  could 
enlarge  the  area  surrounding  the  dot  for  better  view- 
ing. He  then  labeled  the  dot  as  wheat,  small  grain,  or 
nonwheat.  If  the  dot  was  on  the  edge  of  an 
agricultural  field,  he  could  skip  labeling  that  dot  and 
proceed  to  the  next  dot.  This  process  was  repeated 
for  each  of  the  30  or  more  selected  dots  in  the  scene. 
These  dots  were  then  designated  “Type  1"  and  pro- 
vided the  starting  vectors  for  the  clustering  and 
classification  algorithms. 

After  the  first  selection  of  dots  was  labeled, 
another  selection  of  40  or  more  dots  was  made  from 
the  unlabeled  dots.  The  same  procedure  used  for 
labeling  the  Type  I dots  was  used  for  labeling  these 
dots,  which  were  designated  Type  2 dots,  with  one 
exception;  no  dots  could  be  skipped  when  labeling. 
The  labels  for  the  Type  2 dots  were  later  used  to  pro- 
vide a bias  correction  of  the  machine  classification. 

Computer  cards  for  all  labeled  dots  were  gener- 
ated and  sent  to  ERIPS  for  processing,  after  which 
the  analyst  was  notified  that  the  classification  results 
had  been  returned  and  loaded  for  the  continuation  of 
processing  on  1-100. 

The  analyst  reinitiated  the  segment  processing 
and  requested  to  see  the  cluster  results  and/or 
classification  map.  He  also  checked  the  percent- 
correct  classification  (PCC)  for  both  Type  I and 
Type  2 dots.  If  a PCC  of  80  percent  or  higher  was 
achieved  and  if  the  classification  map  was  satisfacto- 
ry, the  results  were  deemed  suitable  for  aggregation 
and  the  results  were  sent  to  the  Crop  Assessment 
Subsystem  (CAS).  If  not,  a rework  of  the  data  was 
required. 

The  rework  could  take  several  forms.  One  was  to 
check  the  analyst  label  and  change  it  to  agree  with 
th  classifier  label,  which  would  increase  the  PCC. 
Another  form  was  to  change  cluster  labels  by  view- 
ing the  cluster  results  and  map  overlay  and  changing 
the  labels  of  mislabeled  or  misclassified  clusters. 
However,  this  only  improved  the  PCC  if  there  were 
analyst-labeled  dots  associated  with  the  cluster.  The 
last  resort  was  to  add  more  dots  and  send  the  seg- 
ment back  to  ERIPS  for  reclassification. 

This  same  procedure  was  used  for  all  subsequent 
acquisitions.  Generally,  the  same  labels  were  used  on 
the  new  acquisitions  prior  to  their  classification. 


1-1 OO/HYBRID  SYSTEM  RESULTS 

The  USDA  analyst  team  analyzed  segments  from 
three  countries.  Twenty-four  intensive  test  sites  were 
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chosen  in  the  United  States.  These  sites  were  located 
throughout  the  Great  Plains  but  the  majority  were  in 
the  spring  wheat  area.  Forty  intensive  test  and  blind 
sites  in  Canada  were  selected  for  processing.  These 
segments  fell  in  Alberta,  Saskatchewan,  and 
Manitoba.  The  team  also  analyzed  50  Russian  seg- 
ments located  in  Kokchetav. 

Since  the  USDA  analysts  were  working  with  an 
on-line  interactive  imaging  system,  they  had  at  their 
disposal  a number  of  capabilities  not  provided  to  the 
CAMS  analysts  as  an  on-line  system.  These  on-line 
capabilities — such  as  interactive  dot  labeling,  class  or 
cluster  map  overlay  flicker,  and  flashing  of  all  dots  of 
equal  spectral  value — were  very  useful. 

In  working  with  and  labeling  dots,  the  multidot 
alarm  capability  of  the  1-100  was  a very  useful  tool, 
allowing  the  analyst  to  select  a single  pixel  on  the  im- 
agery and  alarm  within  the  image  all  pixels  with  the 
same  spectral  value.  This  tool  also  helped  the  analyst 
in  determining  the  spectral  confusion  that  may  exist 
within  the  scene.  For  example,  on  a given  acquisi- 
tion, a selected  wheat  pixel  would  alarm  a pixel  that 
occurred  in  a hay  field. 

Another  useful  tool  of  the  1-100  was  the  eight 
theme  tracks.  This  allowed  the  analyst  to  store,  at 
one  time,  different  cluster  and  classification  results 
that  could  then  be  used  to  overlay  the  Landsat  image. 
By  flashing  the  different  themes  on  and  off.  the 
analyst  could  view  the  different  overlays  for  com- 
parison and  analysis.  However,  even  with  these 
capabilities,  it  was  sometimes  very  difficult  to  iden- 
tify the  wheat. 

It  was  very  difficult  to  interpret  and  analyze  the 
Russian  segments  because  of  a low  soil  moisture  con- 
dition that  existed  at  seeding.  The  native  vegetation 
was  showing  moisture  stress  problems  on  the  first 
acquisitions  obtained  in  early  spring:  these  condi- 
tions resulted  in  poor  wheat  emergence  and  the 
analyst  had  difficulty  identifying  fields  of  emerged 
wheat.  The  USDA  team  flagged  to  the  project  the 
poor  wheat  condition  that  existed  in  Kokchetav. 

Very  little  precipitation  occurred  in  Kokchetav 
during  wheat  tillering  and  jointing.  Therefore,  much 
of  the  wheat  that  emerged  produced  little  vegetative 
growth.  As  a result,  the  red  signature  from  healthy 
growing  wheat  did  not  appear  on  the  imagery.  This 
made  it  difficult  to  obtain  a good  area  estimate  of  the 
wheat;  in  some  cases,  the  estimates  for  these  seg- 
ments were  only  25  percent  of  the  previous  year's 
estimates. 

For  the  Canadian  segments,  the  USDA  team 
passed  direct  wheat  estimates  or  spring  wheat  esti- 


mates only.  Significant  amounts  of  flax,  rapesecd, 
and  barley  are  grown  in  the  spring  wheat  areas  of 
Canada.  The  analyst  found  the  green-number  scatter 
plots  of  the  209  dots  to  be  extremely  valuable  in  sepa- 
rating wheat  from  flax  and  rapeseed.  Those  dots 
representing  flax  and  rapeseed  group  together  and 
separate  from  the  small  grains  and  other  green 
vegetation.  The  flax  and  rapeseed  tend  to  appear 
pinkish  on  the  CIR  image,  which  is  distinctly 
different  from  the  appearance  of  wheat. 

No  repeatable  technique  was  established  for  sepa- 
rating wheat  from  barley,  however.  At  one  point  in 
the  processing,  it  appeared  that  these  (wo  crops  could 
be  separated  by  a pattern  that  existed  in  a plot  of  the 
raw  Landsat  values  for  channels  5 and  6.  However, 
after  applying  this  phenomenon  against  ground 
truth,  it  was  determined  that  no  apparent  correlation 
existed  between  the  wheat  and  barley  in  these  two 
channels. 


ANALYST  PROBLEMS  AND 
RECOMMENDATIONS 

During  the  process  of  utilizing  the  interactive 
1-100/Hybrid  System,  the  USDA  analysts  identified 
several  major  and  many  minor  improvements  that 
could  be  made  in  the  procedures  and  software  in  the 
system.  Since  LEC  was  the  primary  contractor  in  the 
development  of  the  software,  interaction  with  the 
LEC  staff  and  the  USDA  analysts  became  impera- 
tive. 

After  all  users  of  the  system  were  thoroughly 
familiar  with  the  P-1  software,  several  meetings  were 
held  to  collect  inputs  on  how  the  system  should  be 
upgraded.  Inputs  were  provided  by  LEC  system 
analysts  and  instructors  and  by  USDA  analysts. 
Often  the  same  problem  was  identified  by  more  than 
one  of  the  three  users. 

As  a result  of  the  meetings  on  how  to  improve  the 
CAMS  I-  100/Hybrid  System,  a long  list  of  recom- 
mendations for  improvements  was  made,  ranging 
from  correcting  spelling  in  menus  and  prompts  to 
major  changes  in  the  different  processors.  All  prob- 
lems that  were  identified  could  be  divided  into  five 
categories:  (1)  wrong  capabilities  were  stressed;  (2) 
unnecessary  data  were  provided  to  the  analysts;  (3) 
methods  were  needed  to  case  the  man-machine  inter- 
face; (4)  additional  capabilities  were  needed:  and  (5) 
designer  performance  netted  improvement. 

There  were  many  areas  where  systems  analysts 
could  improve  the  crop  analysts'  activities.  However. 
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in  order  to  do  so,  it  was  necessary  for  the  systems 
analysts  to  understand  the  background  of  the  crop 
analysts.  When  error  messages  occur,  they  can  be 
worded  for  the  system  analyst  or  the  crop  analyst.  It 
is,  however,  the  crop  analyst  who  will  use  and  re* 
spond  to  them  most  often.  Therefore,  these 
messages,  as  well  as  all  others,  should  be  oriented  to 
the  primary  user. 

There  was  a problem  in  looking  at  film  images  and 
CRT  images  because  of  an  inherent  incompatibility 
between  the  two  images.  This  problem  was  most  ap- 
parent when  the  analyst  was  trying  to  adjust  the 
CRT  colors  to  match  those  of  the  film  images.  More 
than  5 minutes  are  required  to  display  the  entire  im- 
age of  a segment  on  the  CRT;  this  time  has  to  be 
minimized  because  image  manipulation  can  occur 
frequently. 

During  processing,  numerous  messages,  menus, 
and  prompts  were  shown  to  the  analysts.  Although 
some  of  the  messages  were  critical,  they  were  dis- 
played the  same  as  all  other  messages.  Because  of 
this  great  similarity,  the  analyst  could  very  easily 
overlook  a message  that  would  render  invalid  every- 
thing he  had  done. 

Outlining  a field  and  designating  it  as  DO/DU  was 
considerably  more  complex  than  originally  antici- 
pated. A prime  problem  was  the  limitatii  n in  the 
number  of  vertices  which  a single  area  could  have. 
The  analysts  often  had  to  break  the  areas  into  arbi- 
trary subareas  which  contained  no  more  than  the 
limit  of  10  vertices.  With  a large  number  of  fields,  it 
became  difficult  to  keep  track  of  the  names  of  the 
fields;  it  would  have  been  simpler  to  have  been  able 
to  place  the  cursor  in  the  field  to  identify  it. 

Since  an  I-100/Hybrid  System  procedure  was 
specified,  the  software  could  have  been  designed  so 
that  one  automatically  proceeded  from  one  step  to 
another  in  the  correct  sequence.  This  step-by-step  se- 
quence could  have  been  accomplished  through  the 
use  of  global  (total)  defaults.  Similarly,  the  computer 
software  should  be  streamlined.  It  should  be 
developed  to  parrot  recognized  procedures.  It  would 
be  advantageous  to  make  the  nonstandard  processing 
easy  to  use. 

Because  of  the  nature  of  the  Hybrid  System,  it  was 
necessary  to  perform  part  of  the  work  on  the  1-100 
and  the  remainder  on  the  LACIE  ERIPS.  The  inter- 
facing problems  could  have  been  streamlined  con- 
siderably. There  was  also  a problem  of  delay  between 
processing  and  a timely  receipt  of  results.  It  was  orig- 


inally hoped  that  the  timelag  would  be  only  a day  or 
two,  but  experience  indicated  lapses  of  more  than  a 
week.  Because  of  this  situation,  the  analysts  had  to 
work  with  eight  or  nine  segments  at  a time,  which 
was  inconvenient  and  complex. 

The  experience  gained  from  the  exercise  on  the 
1-100  proved  to  be  invaluable  to  USDA  analysis. 
When  the  Applications  Test  System  (ATS)  was 
being  designed,  many  of  the  problem  areas  were 
avoided,  and  a more  efficient  and  user-oriented 
system  was  delivered. 


CONCLUSIONS  AND  RECOMMENDATIONS 

The  USDA  analysts  obtained  valuable  experience 
from  the  1-190  effort.  The  varied  wheat  conditions 
throughout  the  three-country  study  area  enabled 
analysts  to  study  and  become  familiar  with  different 
cultural  practices,  weather  conditions,  and  farming 
methods  and  with  how  these  factors  affect  wheat- 
growing  conditions  and  analysis  approaches. 

It  soon  became  apparent  that  a single  analyst  proc- 
essing procedure  was  insufficient  in  analyzing  Land- 
sat  data  for  the  purpose  of  obtaining  a wheat  area 
estimate.  Procedure  1 worked  fairly  well  when  used 
in  areas  having  small  fields  and  heterogeneous  sig- 
natures from  the  land  cover  types  within  those 
fields.  However,  since  it  took  an  average  of  3.5  hours 
per  sample  segment,  it  was  extremely  cumbersome 
and  time-consuming  in  agriculture  areas  having 
relatively  large  fields  and/or  homogeneous  spectral 
signatures.  A processing  procedure  should  be 
developed  for  a type  or  specific  set  of  agricultural 
conditions;  for  example,  wnen  working  in  an  area 
having  large  fields,  the  analyst  should  use  a pro- 
cedure that  takes  advantage  of  such  conditions. 

The  I- 100  is  mainly  being  used  as  a display  device 
for  images,  classification  maps,  spectral  aids,  etc.  Im- 
plementation of  P-l  on  the  machine  allows  for  some 
manipulation  of  the  display.  All  clustering  and 
classification  was  done  on  the  ERIPS  with  the  results 
being  evaluated  and  minor  rework  done  using  the 
1-100.  Utilizing  the  two  different  systems  to  process  a 
segment  resulted  in  innumerable  problems,  as  men- 
tioned previously.  Many  of  these  problems  con- 
cerned logistics,  but  others  involved  such  factors  as 
an  analyst  needing  two  or  more  processing  sessions 
just  to  initiate  and  complete  the  analysis  of  a seg- 
ment. In  analyzing  remotely  sensed  data,  a totally  in- 
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(eractive  system  is  needed  to  accomplish  the  task 
most  efficiently.  The  system  should  have  the 
capability  to  process  a segment  from  start  to  finish, 
which  includes  both  uassification  and  evaluation. 


The  ATS  is  built  around  such  an  interactive  system 
concept,  and  the  experience  gained  on  the  MOO  has 
impacted  the  design  of  the  system. 
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Operation  of  the  Yield  Eetlmation  Subsystem 

D.  C.  McCrary,0  J,  L Rogers, b and  J.  D.  Hill 0 


INTRODUCTION 

The  Yield  Estimaiion  Subsystem  (YES)  was,  as 
its  name  implies,  the  operational  element  of  LACIE 
responsible  for  developing  yield  estimation  tech* 
niques  and  using  these  techniques  to  produce  the 
yield  estimates  necessary  for  each  of  the  project's 
production  reports.  However,  its  work  was  con- 
siderably broader  since  YES  had  the  responsibility  to 
develop  all  the  project's  uses  of  meteorological  data. 
This  paper  not  only  describes  (he  operations  neces- 
sary to  produce  yield  estimates  but  also  covers  the 
other  facets  of  the  subsystem's  work. 


THE  SUBSYSTEM 


Overview  of  Products 

The  yield  estimates  were  the  most  important 
products  of  YES  since  they  were  essential  for  ag- 
gregation with  acreage  estimates  to  prepare  the  proj- 
ect's production  estimates.  The  technical  approach  to 
yield  estimation  is  described  elsewhere  in  this  collec- 
tion (see  the  paper  by  Strommen  et  al.  entitled 
“Development  of  LACIE  CCF.A-!  Weather/Wheat 
Y ield  Models")  and  will  not  be  discussed  in  detail 
here.  After  the  technical  approach  was  established,  it 
was  necessary  for  the  yield  subsystem  to  apply  it  to 
the  many  areas  of  the  world  where  LACIE  would  be 
performing  its  investigation!;.  Ultimately,  this  was  to 
produce  a total  of  452  separate  yield  models  covering 
the  principal  wheat-growing  areas  of  the  U.S.  Great 
Plains.  Canada,  the  U.S.S.R.,  India.  Australia.  Brazil, 
and  Argentina.  Routine  weather  data  were  collected 
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in  a timely  manner  to  produce  the  input  data  neces- 
sary to  operate  on  a monthly  basis  the  208  models  for 
the  United  States,  Canada,  and  the  U.S.S.R.  during 
the  three  phases  of  LACIE.  The  remaining  models 
were  evaluated  as  part  of  the  project's  exploratory 
work. 

The  second  application  YES  made  of  the 
meteorological  data  consisted  of  operating  the  crop 
calendar  models  which  produced  estimates  of  wheat 
development  stage  for  any  particular  day  of  the  year. 
These  models,  which  are  also  described  in  detail  in 
this  collection  (see  the  paper  by  Whitehead  and 
Phinney  entitled  "Growth  Stage  Estimation"),  re- 
quired daily  maximum  and  minimum  temperature 
reports  as  input  data  to  monitor  the  crop’s  develop- 
ment through  the  growing  season.  Estimates  derived 
from  observations  at  2500  separate  weather  reporting 
stations  were  provided  on  a biweekly  basis  to  the 
analysts  who  were  interpreting  the  Landsat  imagery. 
These  estimates  of  growth  stage  were  intended  to 
assist  the  analysts  in  separating  wheat  from  other 
crops  in  the  scene. 

in  addition  to  the  yield  and  crop  calendar  esti- 
mates which  were  objectively  derived  from  the 
models,  subjective  evaluations  of  crop  growing  con- 
ditions were  prepared  by  YES.  For  each  state  in  the 
U.S.  Great  Plains  and  each  m^jor  wheat-growing  area 
in  the  foreign  countries,  narrative  summaries  were 
written  to  describe  the  weather  in  particular  and  how 
the  growing  conditions  may  have  been  departing 
from  normal.  These  narrative  summaries  were 
published  weekly  to  aid  the  >n  alystx  in  identifying 
regions  of  abnormal  crop  development  (and  conse- 
quently abnormal  crop  appearance)  and  to  document 
the  conditions  under  which  the  project  was  gathering 
its  results. 


Organization  of  YES 

The  Yield  Estimation  Subsystem  blended  the 
abilities  of  specialists  who  were  skilled  in 


217 


a jmammam 


meteorology  with  those  who  were  skilled  in 
agronomy.  Personnel  were  assigned  from  all  three  of 
the  participating  agencies,  but,  because  of  the 
reliance  on  meteorological  data,  the  National 
Oceanic  and  Atmospheric  Administration  (NOAA) 
had  the  lead  responsibility  for  YES.  The  yield  sub- 
system manager  and  his  staff  were  located  at  the 
NASA  Johnson  Space  Center  (JSC)  in  Houston, 
Texas,  and  were  supported  by  personnel  at  other 
NOAA  working  locations  in  Columbia,  Missouri, 
and  Washington,  D.C. 

The  Houston  staff  was  primarily  responsible  for 
overall  management  by  defining  requirements  of  the 
yield  subsystem,  evaluating  its  products,  and  in- 
tegrating these  products  in: ■>  the  project  output.  It 
also  prepared  the  weekly  o r orological  summaries 
sent  to  the  analysts  and  other  project  reports  detail- 
ing crop  growing  conditions. 

At  Columbia,  the  Modeling  Division  of  NOAA’s 
Center  for  Climatic  and  Environmental  Assessment 
(CCEA)1  was  responsible  for  extending  its  yield 
modeling  methodology  to  all  areas  of  interest  to 
LACIE  and  conducting  preliminary  evaluations  of 
model  accuracy.  Once  the  yield  models  were 
developed,  this  division  programed  them  for  routine 
operation  on  the  main  NOAA  computer  located  at 
the  National  Meteorological  Center  (NMC)  in  Suit- 
land,  Maryland.  A remote  terminal  at  Columbia  was 
connected  to  the  main  computer  by  a high-speed  data 
transmission  line.  The  crop  calendar  model, 
developed  by  contractor  personnel  supporting  YES, 
was  also  implemented  on  the  NMC  computer 
and  was  operated  by  commands  from  the  staff  at 
Columbia. 

The  NOAA  Center  in  Missouri  was  supplemented 
in  its  LACIE  work  by  personnel  from  other  project 
components.  The  U.S.  Department  of  Agriculture 
(USDA)  assigned  to  that  location  an  agricultural 
economist  and  an  agronomist  to  assist  in  the 
development  of  the  yield  models  needed  by  YES. 
Contractor  support  was  also  available  from 
Lockheed  Electronics  Company,  principally  in  the 
gathering  and  formatting  of  historical  weather  and 
yield  data  needed  by  the  scientists  who  were 
developing  the  models. 

A third  NOAA  group,  located  in  Washington,  pre- 
pared the  real-time  data  needed  for  operation  of  the 


'in  a 1978  NOAA  reorganization,  CCEA  beca.ui  part  of  a 
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models  and  wrote  the  general  weather/crop  assess- 
ments for  each  country.  This  Assessment  Division 
of  CCEA  complemented  the  Modeling  Division  by 
providing  the  processed  meteorological  data  that 
division  required.  The  USDA  supported  the  Assess- 
ment Division  by  detailing  a foreign  commodity 
analyst  to  work  1 day  each  week  helping  to  prepare 
the  foreign  crop  assessments,  which  formed  the  basis 
of  the  narrative  material  provided  to  the  analysts 
reviewing  the  Landsat  data  in  Houston. 

Note  should  be  taken  of  an  ad  hoc  team  estab- 
lished to  advise  the  YES  manager  on  key  technical 
matters,  particularly  in  the  area  of  evaluating  and 
developing  alternate  approaches  to  yield  modeling. 
This  Yield  Advisory  Group  was  composed  of  two 
representatives  from  each  of  the  three  participating 
agencies  and  was  chaired  by  a seventh  member,  Dr. 
E.  C.  A.  Runge,  Chairman  of  the  Agronomy  Depart- 
ment at  the  University  of  Missouri  at  Columbia. 
During  the  project,  this  group  was  very  effective  in 
the  preliminary  screening  of  candidate  procedures. 


OPERATIONS 


Meteorological  Data  Acquisition 

The  Yield  Estimation  Subsystem  was  dependent 
on  timely  and  comprehensive  meteorological  data 
from  all  areas,  both  domestic  and  foreign,  for  which 
the  project  was  preparing  wheat  production  esti- 
mates. To  support  this  project  requirement,  NOAA 
provided  a data  base  of  global  weather  observations 
which  is  described  in  a separate  paper  within  this  col- 
lection (see  the  plenary  paper  by  Strommen  et  al.  en- 
titled “The  Impact  of  LACIE  on  a National 
Meteorological  Capability”).  The  files  of  daily 
weather  observations  included  data  from  about  8000 
locations  globally  with  about  2500  of  these  located  in 
the  countries  of  LACIE  interest.  The  reports  were 
collected  at  NOAA’s  NMC  from  foreign  sources, 
which  consisted  predominantly  of  the  Global 
Telecommunications  System  of  the  World 
Meteorological  Organization  and  the  U.S.  Air  Force 
military  weather  collection  network.  Dat„  from  the 
U.S.  Great  Plains  were  available  from  the  domestic 
weather  observation  collection  facilities  of  the  Na- 
tional Weather  Service.  The  data  elements  of  prin- 
cipal interest  were  daily  rainfall  totals  and  tem- 
perature extremes,  which  were  necessary  input  for 
operation  of  the  yield  and  crop  calendar  models. 
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The  data  available  at  NMC  were  placed  on  a file 
which  could  be  accessed  by  the  remote  terminals  at 
CCEA's  Assessment  Division  in  Washington  and 
the  Modeling  Division  in  Columbia.  In  addition  to 
the  basic  observational  data  available  from  NMC,  a 
limited  amount  of  meteorological  data  preprocessed 
by  the  U.S.  Air  Force  was  received  by  the  CCEA 
Assessment  Division.  These  data  included  average 
temperatures  and  rainfall  and  estimates  of  soil 
moisture  conditions  for  the  principal  crop  regions  in 
the  U.S.S.R.  and  the  People's  Republic  of  China 
(P.R.C.)  at  10-day  intervals.  These  data  were  for- 
warded to  Washington  in  table  and  map  form  in  time 
to  be  used  routinely  by  LACIE. 


Yield  Estimation 

The  wheat  yield  models  operated  by  YES  made 
their  estimates  from  inputs*  of  average  monthly 
weather  elements,  such  as  departure  from  normal 
monthly  precipitation  or  departure  from  average 
monthly  temperatures.  This  input  required  that  the 
models  be  operated  at  the  end  of  each  month  as  soon 
as  the  data  could  be  assimilated  and  the  values 
derived.  For  the  U.S.  Great  Plains,  4 working  days 
were  set  aside  after  the  end  of  each  month  for  the 
CCEA  Assessment  Division  to  process  the  domestic 
weather  data  and  to  make  the  data  available  to  the 
Modeling  Division.  The  models  were  normally  oper- 
ated on  the  same  day  the  data  became  available,  thus 
providing  the  project  with  yield  estimates  for  the 
United  States  on  the  fifth  working  day  of  the  month. 

Longer  periods  were  needed  to  process  the  foreign 
data  that  had  been  collected  on  the  NMC  data  base. 
For  the  U.S.S.R.,  9 working  days  were  used  to  pre- 
pare the  input  data  needed  for  the  33  different 
regions  for  which  models  had  been  developed. 

One  might  expect  that,  since  daily  weather  data 
had  been  routinely  collected  throughout  the  month, 
it  would  be  simple  to  operate  the  yield  models  on  the 
first  day  of  the  next  month.  This  was  not  the  case, 
however,  since  one  additional  step  was  necessary  to 
prepare  the  data  for  input.  The  reports  placed  on  the 
daily  data  base  were  individual  station  observations 
representative  of  point  weather.  The  yield  models 
used  average  weather  over  the  state  or  zone  for 
which  the  estimate  was  made  and  thus  required  that 
the  individual  station  reports  be  analyzed  to  obtain 
average  areal  values.  The  analysis  was  accomplished 
by  plotting  the  individual  station  data  on  a map  and 
drawing  the  appropriate  isolines.  The  approximate 


density  of  these  stations  in  the  U._.  Great  Plains  is 
shown  in  figure  1.  The  analysis  was  augmented  by 
meteorological  satellite  imagery  of  cloud  patterns. 
Once  the  data  were  analyzed  as  shown  in  figure  2, 
representative  values  were  assigned  to  each  zone  and 
these  became  the  input  data  for  -model  operations. 
The  analysis  was  made  by  CCEA’s  Assessment  Divi- 
sion, and  the  model  input  data  were  provided  to  the 
Modeling  Division.  That  division  operated  the 
models,  reviewed  the  output,  and  placed  the  esti- 
mates on  a file  to  be  accessed  by  the  project  elements 
in  Houston.  The  yields  were  copied  on  a portable 
computer  terminal  at  JSC  and  placed  in  a secure  area 
from  which  they  were  available  for  use  in  the  pre- 
paration of  the  routine  LACIE  production  estimates. 

The  number  of  yield  models  operated  varied  from 
month  to  month,  depending  on  the  country  and  on 
whether  spring  wheat,  winter  wheat,  or  both  were 
being  estimated.  During  June  of  Phase  III,  when  both 
U.S.  and  U.S.S.R.  spring  and  winter  wheat  were  being 
estimated,  a maximum  of  56  models  was  operated. 


Crop  Calendars 

The  crop  calendars  used  by  LACIE  to  estimate 
wheat  development  stages  also  utilized  the  data  base 
of  daily  weather  observations  available  at  NMC.  A 
selected  subset  of  individual  stations  was  used  to  pro- 
vide the  daily  maximum  and  minimum  temperature 
values  necessary  for  operation  of  the  models.  For 
each  of  those  stations,  a representative  crop  develop- 
ment stage  was  calculated  for  each  day  of  the  grow- 
ing season.  The  density  of  these  crop  calendar  sta- 
tions in  ihe  U.S.  Great  Plains  is  shown  in  figure  3. 

The  models  were  operated  for  the  United  States, 
the  U.S.S.R.,  Canada,  and  the  People’s  Republic  of 
China  during  LACIE  Phase  II.  Operations  in  the 
P.R.C.  were  on  a limited  exploratory  basis.  During 
Phase  III,  the  spring  wheat  crop  calendar  was  also 
run  for  the  exploratory  investigation  in  Brazil, 
Argentina,  and  Australia.  The  estimates  were  up- 
dated every  2 weeks  during  the  growing  season,  with 
about  half  the  countries  being  updated  each  week. 

Since  the  crop  calendar  model  operated  directly 
on  the  observed  temperature  reports,  it  was  not 
necessary  for  the  CCEA  Assessment  Division  to 
preprocess  the  data.  The  Modeling  Division  directly 
accessed  the  observation  files  at  NMC,  updated  the 
crop  stage  estimates  for  each  location,  and  placed  the 
new  estimates  on  file.  This  file  was  then  accessed 
from  JSC  to  acquire  a printed  listing  of  the  new  up- 
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dates.  In  addition,  a magnetic  tape  was  prepared  by 
the  computer  in  Washington  and  mailed  to  Houston. 

The  individual  station  crop  stage  estimates  at  the 
end  of  the  2-week  period  were  plotted,  analyzed,  and 
presented  to  the  analysts  in  the  form  shown  in  figure 
4.  During  Phase  III,  a computer  program  was 
developed  which  used  the  individual  station  esti- 
mates to  prepare  crop  stage  estimates  interpolated  in 
both  space  and  time.  A crop  stage  estimate  was  made 
for  each  specific  Landsat  segment  location  corre- 
sponding to  the  date  of  the  satellite  overpass.  This 
prevented  the  necessity  of  subjectively  interpolating 
between  locations  and  times  on  two  consecutive 
maps. 

As  with  any  series  of  estimating  equations,  a crop 
development  stage  estimation  model  or  its 
subroutine  to  estimate  planting  date  may  occa- 
sionally provide  troublesome,  inaccurate  values.  A 
procedure  was  developed  in  Phase  II  to  allow  the 
analysts  who  were  reviewing  the  Landsat  imagery  to 
advise  YES  of  obvious  inaccuracies  in  the  crop  calen- 
dar. This  analyst  feedback  was  reviewed  by  a panel 
of  agronomists,  plant  scientists,  and  agrome- 
teorologists. The  panel  made  its  recommendation  to 
the  YES  manager  as  to  what  corrections  should  be 
made  to  the  crop  calendar  model  for  a station,  a 
group  of  stations,  or  a region.  These  changes  in  esti- 


KKil  Rt.  2. — Typical  temperature  analysis  for  input  to  yield 
model. 


mate  of  growth  stage  for  each  required  station  for  a 
given  day  were  made  and  documented,  and  the  crop 
calendar  model  was  rerun  from  that  date  to  predict 
later  growth  stage  estimates.  This  revised  output  was 
again  reviewed  for  accuracy  and  further  corrections. 

A specific  example  of  such  modifications  oc- 
curred in  the  1977  crop  year  for  spring  wheat  in  the 
New  Lands  area  of  the  U.S.S.R.  The  spring  wheat 
starter  model  indicated  planting  in  this  area  from 
April  25  to  about  May  5.  Emergence  was  indicated  by 
the  analysts  to  have  occurred  from  May  3 to  May  15. 
Mid-June  Landsat  acquisitions  in  the  area  indicated 
very  little  visible  small  grains  at  that  time.  A growth 
stage  of  2.4  was  entered  as  the  datum  for  each  station 
where  the  analyst  was  able  to  detect  significant  areas 
of  emerged  smaii  grains.  These  operational  correc- 
tions were  made  and  the  crop  calendar  estimates  re- 
mained reasonably  accurate  during  the  remainder  of 
the  crop  year. 


Weekly  Weather  Summaries 

Each  week,  the  YES  meteorologists  at  JSC  pre- 
pared narrative  weather  summaries  describing  grow- 
ing conditions  and  likely  crop  response.  These  were 
based  on  the  domestic  and  foreign  assessments  writ- 
ten in  Washington  by  CCEA's  Assessment  Division, 
but  they  provided  additional  detail  needed  to  aid  the 
analysts  in  their  interpretation  of  the  satellite  im- 
agery. These  summaries  also  helped  the  project  man- 
agement to  understand  the  results  being  obtained,  to 
evaluate  problem  areas,  and  to  develop  alternate 
techniques  w hich  would  be  more  appropriate  for  ap- 
plication to  these  problem  situations. 

The  summaries  consisted  of  maps  and  charts  il- 
lustrating current  crop  stages  and  depicting  the  dis- 
tribution of  precipitation,  temperature,  soil  moisture, 
or  other  pertinent  weather  factors.  The  narrative 
material  then  described  and  discussed  the  growing 
conditions  and  their  likely  effect  on  the  crop's 
development  and  appearance.  In  some  cases,  the  nar- 
rative material  included  ancillary  information  such 
as  agricultural  attache  reports  or  other  onsite  obser- 
vations; however,  such  data  were  not  widely  availa- 
ble in  a timely  manner  nor  were  these  data  suffi- 
ciently comprehensive,  as  they  usually  related  only 
to  local  problem  areas  The  narrative  material  was  an 
attempt  to  provide  timely  updates  of  crop  conditions 
rather  than  to  summarize  the  entire  season  for  each 
region.  A typical  weekly  update  would  appear  as 
follows. 
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For  the  week  ending  May  29,  1977 

••  I A I'Hl'R:  Daytime  temperatures  dropped 
about  10°  after  midweek  from  around  the  mid 
80's  to  the  mid  70s. 

Minimum  temperatures  remained  in  the  high 
50's  and  low  60's,  dropping  a few  degrees  at  the 
end  of  the  week. 

Average  temperatures  ran  4°  to  6°  above  nor- 
mal. 

Adequate  rainfall  occurred  during  midweek 
with  some  excessive  amounts  causing  flooding 
in  the  northeast  part  of  the  state. 

0 ///  1 7.  The  growing  conditions  in  the  state 
for  wheat  continued  to  be  very  good.  The 
above-normal  temperature  has  caused  the  crop 
to  mature  ahead  of  the  normal,  but  the  w heat  is 
expected  to  fill  well  with  the  abundant 
moisture.  The  wheat  is  all  in  the  heading  stage 
with  some  beginning  to  turn  in  the  southeast 
and  south  central. 

The  printed  summaries  were  distributed  regularly 
to  the  analysts.  Also,  oral  briefings  were  presented  to 
enable  specific  areas  and  potential  problems  to  be 
discussed  in  detail. 

Typical  charts  prepared  for  the  weather/crop 
assessments  are  shown  in  figures  5 and  6.  In  addition 
to  temperature  and  precipitation,  a soil  moisture 
budget  was  available  for  the  U.S.  Great  Plains  and 
the  U.S.S.R.  The  weekly  soil  moisture  status  for  the 
United  States  was  depicted  by  the  crop  moisture  in- 
dex shown  in  figure  7.  The  crop  moisture  index  re- 
lates the  available  water  to  the  normal  requirements 
of  the  crops  grown  in  each  region.  This  analysis  was 
extremely  useful  to  project  scientists  who  were 
monitoring  drought  and  its  effect  on  the  appearance 
of  wheat  in  the  Landsat  imagery. 

Other  specialized  analyses  were  used  to  describe 
local  problems  and  weather  episodes.  For  instance, 
cliniagraphs,  such  as  the  one  in  figure  8,  were  pre- 
pared to  determine  the  nature  of  weather  trends  over 
time  at  a particular  location.  In  this  instance,  the 
onset  of  unusually  cold  temperatures  near  the  nor- 
mal wheat  planting  time  in  the  northeastern 
Caucasus  region  of  the  U.S.S.R.  limited  fall  establish- 
ment of  the  crop,  and  analysts  were  alerted  to  expect 
poor  wheat  signatures  in  that  area  at  that  time. 


FUil’Ri:  5.— Percent  of  normal  precipitation  for  Mav  1977  in 
the  U.S.S.R. 


FHil  Ri:  6.— Observed  weekly  average  temperature  in  the  Cana- 
dian prairie  provinces,  August  23  to  29,  1977. 
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NO  CHANGE  IN  moE.'*  DURING  WEEK 

FIGURE  7. — U.S.  crop  moisture  index:  June  7,  1975. 


n 


FIGURE  8. — Climagraph  for  Krasnodar  Kray,  U.S.S.R.; 
197.6-77, 


Fit. I RE  9.— Winterkill  analysis  for  U.S.S.R.;  January  5.  1977. 


Another  type  of  analysis  was  used  that  combined 
the  temperature  and  snow  cover  data  to  determine 
areas  where  critically  cold  temperatures  on  a single 
night,  without  the  benefit  of  an  insulating  blanket  of 
snow,  could  have  caused  injury  to  the  dormant  crop. 
In  figure  9,  the  analysis  for  the  U.S.S.R.  indicates  a 
region  in  the  southern  Ukraine  where  conditions 
may  have  led  to  cold  injury.  In  that  area,  analysts 
might  expt't  a weak  or  mottled  wheat  signature  in 
the  spring  t scause  the  damaged  fields  would  not  re- 
spond vigorously  to  warmer  weather. 

The  weather  assessments  and  analyses  of  particu- 
lar episodes  not  only  supported  the  analysts  but  also 
aided  YES  in  assessing  its  own  yield  estimates.  In 
most  instances,  the  models  did  not  account  for 
reduced  yields  caused  by  events  such  as  winterkill 
which  occurred  on  a single  day.  The  models  were 
designed  to  be  responsive  to  moisture  or  temperature 
stresses  which  were  apparent  in  the  monthly  weather 
data.  The  episodal  analysis  made  it  possible  to  iden- 
tify areas  where  additional  influences  on  yield  may- 
have  occurred;  however,  these  factors  could  only  be 
described  qualitatively  since  there  were  no  methods 
to  adjust  the  yield  estimates  quantitatively  for  the 
shorter  episodes. 


Project  Reports 

In  addition  to  the  weekly  weather  assessments, 
other  routine  reports  were  prepared  by  YES  at  JSC. 
An  overview  of  the  growing-season  weather  was 
w ritien  and  included  in  each  of  the  Crop  Assessment 
Subsystem  monthly  reports  which  released  the  proj- 
ect's estimates.  A second  section  entitled  "Yield 
Tracking"  was  also  included  to  describe  the  response 
of  each  individual  yield  model  to  the  growing  condi- 
tions. The  reports  provided  an  opportunity  to  de- 
scribe the  various  weather  episodes  or  other  prob- 
lems which  could  affect  yield  but  which  might  not  be 
specifically  accounted  for  in  the  models. 


SUMMARY 

The  Yield  Estimation  Subsystem  demonstrated 
during  the  three  phases  of  LACIK  that  it  is  possible 
to  use  the  flow  of  global  meteorological  data  and  pro- 
vide valuable  information  regarding  global  wheat 
production,  first,  it  was  able  to  establish  a capability 
to  collect,  in  a timely  manner,  detailed  weather  data 
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from  all  regions  of  the  world.  Second,  it  was  able  to 
develop  methods  for  evaluating  the  data  and  con- 
verting it  into  information  appropriate  to  the  proj- 
ect's needs. 

Although  the  various  elements  involved  in 
generating  the  products  of  YES  were  widely  dis- 
persed geographically,  it  was  possible  to  coordinate 
their  efforts  and  to  provide  the  needed  information 
for  integration  with  other  project  data.  Most  notable 
was  the  utility  of  the  information,  particularly  the 
objectively  derived  yield  estimates,  which  were  dem- 
onstrated to  be  capable  of  isolating  problem  areas, 
such  as  the  shortfall  in  the  U.S.S.R.  spring  wheat  dur- 
ing Phase  III,  and  to  do  so  early  in  the  crop  season. 
This  information  alone  has  significance  to  foreign 
commodity  analysts,  but  it  takes  on  additional  mean- 


ing when  combined  with  the  LACIE  estimates  of 
wheat  area  available  for  harvest. 

The  techniques  developed  and  demonstrated  by 
YES  to  monitor  and  qualitatively  assess  the  signifi- 
cant growing-season  weather  factors  have  added  a 
dimension  to  global  crop  assessment  capabilities. 
The  demonstrated  timeliness  and  available  detail  can 
provide  early  warning  of  significant  weather  condi- 
tions and  alert  analysts  to  likely  effects.  Even  though 
these  effects  are  not  quantified,  they  can  be  very 
useful  by  simply  pointing  the  direction  in  which  pro- 
duction may  depart  from  normal.  The  further 
development  of  these  analysis  techniques  and  the 
refinement  of  the  yield  models  will  be  major  ac- 
tivities for  the  group  which  succeeds  the  LACIE 
Yield  Estimation  Subsystem. 
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The  Crop  Assessment  Subsystem:  System 
Implementation  and  Approaches  Used  for  the 
Generation  of  Crop  Production  Reports 

W.  E.  McAllum ,a  R.  E.  Hatch, b S.  M,  Boatright, c C.  J.  Liszcz/ and  S.  M.  Evansb 


INTRODUCTION 

The  primary  responsibility  of  the  Crop  Assess- 
ment Subsystem  (CAS)  during  the  three  phases  of 
the  L ACIE  was  to  produce  crop  reports  that  included 
estimates  of  wheat  area,  yield,  and  production,  as 
well  as  a specified  set  of  associated  statistical  descrip- 
tors. Report  preparation  and  transmission  were 
based  on  a documented  reporting  schedule  which 
was  reviewed  and  approved  by  project  management 
at  the  beginning  of  each  phase  of  the  experiment. 
Generally,  monthly  reports  were  submitted; 
however,  procedures  existed  to  allow  for  the 
transmittal  of  an  unscheduled  report  if  circum- 
stances warranted.  Annual  reports  were  used  not 
only  as  a means  of  documenting  the  end-of-season 
wheat  estimates  but  also  to  document  results  ob- 
tained by  re-creating  (simulating)  estimates  for  each 
of  the  scheduled  reports  using  the  end-of-phase 
capabilities  (i.e.,  latest  aggregation  software  and/or 
approved  procedural  changes). 

Successful  performance  of  assigned  CAS  func- 
tions was  heavily  dependent  on  input  data  provided 
by  other  elements  of  the  Applications  Evaluation 
System  (AES)  and  on  the  aggregation/report  genera- 
tion capabilities  (status  of  system  implementation  in 
terms  of  hardware  and  software)  available  for  CAS 
analyst  use. 

The  purpose  of  this  paper  is  to  provide  insight 
regarding  CAS  operations  during  the  three  phases  of 
LACIE  in  terms  of  sampling  strategy,  CAS  input/ 
output  data,  evolution  of  aggregation/reporting 
system  capabilities,  and  CAS  aggregation  procedures. 


aNASA  Lyndon  B.  Johnson  Space  Center.  Houston,  Texas. 
bU.S.  Department  of  Agriculture,  Houston,  Texas. 
cFord  Aerospace  & Communications  Corporation,  Houston, 
Texas. 

^Lockheed  Electronics  Company,  Inc..  Systems  and  Services 
Division,  Houston,  Texas. 


SAMPLING  STRATEGY 

The  term  “sampling  strategy"  is  used  to  encom- 
pass the  entire  realm  of  methodologies  involved  in 
the  definition  of  the  basic  sampling  unit,  the  alloca- 
tion of  sample  segments  to  specific  political  or  ad- 
ministrative units,  and  the  actual  geographic  location 
of  the  sample  segments.  A Sampling  Strategy  Team  , 
(SST)  was  established  as  the  control  agent  for/ 
developing  and/or  modifying  sampling  allocation 
and  location  procedures.  In  addition,  the  SST  was 
responsible  for  specifying  the  basic  aggregation/ 
expansion  framework  and  the  appropriate  focmula- 
tions  for  a specified  set  of  statistical  descriptors.  This 
total  group  of  functions  is  extremely  critical  since  ( I ) 
the  details  inherent  in  the  selected  sampling  strategy 
are  primary  determinants  of  expansion  methods  to 
be  used;  (2)  a determination  is  made  regarding  the 
appropriate  level  at  which  essentially  independent 
estimates  of  area  and  yield  should  be  combined  to 
estimate  production;  and  (3)  the  techniques  or  for- 
mulations of  statistical  descriptors  to  be  used  in 
evaluating  the  accuracy  and  reliability  of  results  are 
specified.  Basically,  the  culmination  of  these  func- 
tions determines  many  of  the  primary  characteristics 
that  a software  package  must  possess  to  adequately 
support  an  aggregation/reporting  function. 

One  of  the  major  objectives  of  the  initial  sampling 
strategy  was  to  allocate  sample  segments  at  the 
lowest  (or  smallest)  political  subdivision  for  which 
historical  data  were  available.  For  example,  segments 
were  allocated  at  the  county  level  in  the  United 
States  and  at  the  oblast  level  in  the  U.S.S.R.  The  ini- 
tial sample  segment  allocations  were  based  on  the 
total  area  devoted  to  wheat  production  in  a selected 
year,  hereafter  referred  to  as  the  epoch  year.  In 
foreign  areas,  the  most  recent  available  data  were 
used  for  allocation  purposes;  the  1969  U.S.  Census  of 
Agriculture  was  used  &.>  the  data  source  to  support 
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the  allocation  process  in  the  United  States.  The  ac- 
tual location  of  sample  segments  was  confined  to 
agricultural  areas  (i.e.,  areas  having  discernible  field 
patterns),  as  defined  by  interpretations  of  available 
Landsat  imagery.  In  addition,  segment  locations 
could  not  violate  a set  of  prespecified  constraints  es- 
tablished by  the  NASA  Goddard  Space  Flight  Center 
(GSFC).  These  constraints  included  (1)  a minimum 
distance  between  locations  and  (2)  a restriction  on 
the  number  of  segments  that  are  contained  in  a full 
frame  of  Landsat  data.  Supportive  data  were  pro- 
vided by  the  Data  Acquisition,  Preprocessing,  and 
Transmission  Subsystem  (DAPTS)  of  the  AES. 
More  intricate  details  of  the  initial  sampling  strategy 
and  modifications  during  the  three  phases  of  LACIE 
are  available  in  the  published  requirements  docu- 
ment (ref.  1)  and  are  also  addressed  by  other  papers 
prepared  for  this  symposium  (presentations  and  sup- 
porting papers  in  the  Experiment  Design  Session, 
which  addresses  sampling  and  aggregation). 
Country-specific  issues  are  referred  to  in  several 
papers  that  report  results  for  individual  countries. 


BASIC  INPUT  DATA  TO  THE  CROP 
ASSESSMENT  SUBSYSTEM 

As  was  previously  mentioned,  CAS  utilized  data 
elements  provided  by  other  elements  of  the  AES  to 
generate  estimates  of  wheat  area,  yield,  and  produc- 
tion for  specified  geographic  areas.  The  general 
categories  of  data  required  were  (1)  proportion  esti- 
mates of  wheat  for  each  sample  segment  represented 
by  usable  Landsat  data;  (2)  yield  estimates  and 
associated  estimates  of  yield  variance  for  predefined 
geographic  areas;  and  (3)  historical  statistics,  includ- 
ing wheat  area,  yield  and  production,  as  well  as  area 
for  major  competing  crops  grown  during  the  wheat- 
producing  season.  More  detailed  discussions  regard- 
ing these  input  categories  follow. 


Landsat  Data 

The  CAS  aggregation  software  was  designed  to 
utilize  an  estimate  of  the  percentage  of  wheat  for  alt 
sample  segments  allocated  within  a country  as  the 
basis  for  area  expansion.  Historical  wheat  area  data, 
specifically  that  designated  as  epoch  year  data,  were 
used  in  conjunction  with  available  Landsat-based 
estimates  to  produce  a wheat  area  estimate  for 


geographic  units  that  were  not  assigned  sample  seg- 
ments in  the  original  allocation  and/or  that  did  not 
have  Landsat  data  acquired  for  allocated  sample  seg- 
ments (because  of  cloud  cover,  haze,  correlation 
problems,  etc.).  Classification  resuits  (proportion 
estimates  of  wheat  or  small  grains  for  a sample  seg- 
ment) were  produced  by  the  Classification  and  Men- 
suration Subsystem  (CAMS)  and  transmitted  to 
CAS  for  use  as  input  data  in  the  aggregation  process. 
Each  segment-level  classification  result  was  iden- 
tified by  segment  number  and  associated  with  a 
Landsat  acquisition  date,  a date  transmitted  to  CAS, 
an  evaluation  code,  and  several  classification- 
oriented  factors  (e.g.,  unitemporal  or  multitemporal 
classification,  bi  is  corrections,  analyst  remarks,  etc.). 
Key  characteristics  were  identified  and  eventually 
implemented  as  control  parameters  during  the  evolu- 
tion of  the  basic  aggregation  system. 

Methods  of  transmitting  and  handling  segment- 
level  data  changed  significantly  during  the  course  of 
LACIE.  The  primary  forces  that  encouraged  changes 
were  (I)  the  necessity  for  accommodating  increased 
data  loads;  (2)  a desire  to  minimize  the  chances  for 
transcription  errors;  and  (3)  the  development  of 
specific  input  formats  to  support  aggregation  soft- 
ware. In  Phase  I,  segment-level  resrlts  were 
transmitted  to  CAS  via  a worksheet  prepared  by  the 
CAMS  analyst.  It  was  then  necessary  for  CAS  per- 
sonnel to  code  needed  data  in  the  appropriate  format, 
to  have  the  data  set  keypunched,  and  to  perform 
sufficient  checks  to  ensure  that  an  accurate  data  set 
was  ready  for  entry  into  the  aggregation  data  base. 
Significantly  increased  data  volumes  were  antici- 
pated for  Phases  II  and  II),  and  it  became  apparent 
that  data-handling  tasks  were  likely  to  be  a bot- 
tleneck in  report  preparation.  As  input  formats  for 
the  CAS  system  stabilized,  a procedure  was  estab- 
lished for  CAMS  to  provide  segment-level  results  via 
punched  cards  utilizing  a prespecified  format.  This 
procedure  provided  the  CAS  analyst  the  capability  of 
handling  large  quantities  of  data  in  a relatively  short 
period  of  time.  However,  data  quality  checks  in 
terms  of  accuracy  and  completeness  of  data  transmit- 
ted were  still  necessary. 

Confusion  Crop  Ratios 

Increasingly  large  overestimates  of  wheat  area 
during  Phase  I alerted  LACIE  personnel  to  potential 
problems  in  segment-level  analyses.  Subsequent  in- 
vestigations identified  the  mqjor  problem  as  the  con- 
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fusion  of  wheal  with  other  crops  being  grown  in  the 
sample  segment.  The  principal  source  of  confusion 
was  other  small  grains  crops  that  had  a growing 
season  similar  to  that  of  wheat.  Since  this  crop  sepa- 
rability was  not  an  easily  resolved  issue,  project  man- 
agement required  CAMS  analysts  to  identify  small 
grains  (spring,  winter,  or  tout)  for  each  segment, 
especially  in  spring  and  mixed  wheat  areas,  until 
reliable  techniques  and  procedures  could  be 
developed  to  identify  spring  and/or  winter  wheat. 
Since  sampling  and  aggregation  procedures  and  all 
supporting  data  bases  were  designed  to  estimate 
wheat  area,  the  implementation  of  small  grains 
estimation  at  the  segment  level  necessitated  the 
development  of  confusion  crop  ratios  that  could  be 
used  to  derive  the  required  spring  and/or  winter 
wheat  proportion  estimates  for  each  segment. 

Initially,  confusion  crop  ratios  were  applied  only 
in  spring  and  mixed  wheat  areas  (c.g.,  Minnesota, 
Montana,  North  Dakota,  and  South  Dakota).  This 
procedure  was  based  on  the  assumption  that  winter 
wheat  classifications  and  winter  small  grains 
classifications  in  the  pure  winter  wheat  areas  of  the 
U.S.  Great  Plains  (USGP)  were  essentially  syn- 
onymous. During  Phase  III,  however,  the  use  of  con- 
fusion crop  ratios  was  extended  into  the  pure -winter 
wheat  areas.  It  is  important  to  recognize  that  the  rel- 
ative accuracy  of  this  raticing  procedure  is  heavily  in- 
fluenced by  two  factors;  (1)  the  degree  to  which  the 
ratios  being  used  reflect  the  true  distribution  of  crops 
in  the  current  year,  and  (2)  the  accuracy  of  the  classi- 
fication process  in  terms  of  including  all  confusion 
crops  (e.g.,  small  grains)  in  the  segment-level  propor- 
tion estimate. 

At  the  end  of  Phase  I.  confusion  crop  ratios  were 
constructed  using  state-level  historical  data  for  the 
previous  crop  year.  These  ratios  were  then  applied  to 
. -g.-nent-level  small  grains  estimates  prior  to  aggrega- 
tion. 

The  basic  approach  used  during  Phase  II  tor  the 
U S.  Great  Plains,  the  Canadian  prairie  provinces, 
and  the  U.S.S.R.  was  to  derive  needed  ratios  from  the 
most  recent  data  available  reported  for  the  lowest  po- 
litical subdivision  identified  in  the  allocation 
hierarchy.  The  four  necessary  ratios  were  (I)  winter 
wheat  to  winter  small  grains,  (2)  winter  wheat  to 
total  small  grains.  (3)  spring  wheat  to  spring  small 
grains,  and  (4)  spring  wheat  to  total  small  grains. 
Small  grains  crops  considered  in  deriving  ratios  were 
rye.  barley,  wheat,  oats,  and  flax.  During  the 
classification  process,  the  CAMS  analyst  identified 
the  proportion  estimate  for  a segment  as  winter 


wheat,  spring  wheat,  winter  small  grains,  spring 
small  grains,  or  total  small  grains.  In  the  CAS  data 
base  update  process,  the  proper  ratio  was  applied  to 
obtain  appropriate  spring  or  winter  wheat  propor- 
tions, thus  creating  two  additional  classes  (i.e., 
ratioed  winter  wheat  and  ratioed  spring  wheat). 

A task  was  initiated  in  December  1976  to  use 
econometric  modeling  techniques  to  estimate  confu- 
sion crop  ratios  for  the  four  U.S.  Northern  Great 
Plains  states  and  for  Canada.  Ratios  estimated  by  the 
developed  models  were  used  in  Phase  III  analyses. 
(For  further  details,  see  “Econometric  Models  for 
Predicting  Confusion  Crop  Ratios"  by  D.  E.  Um- 
berger  et  al.,  which  is  included  in  the  symposium 
proceedings  as  a supporting  paper  in  the  Experiment 
Design  Session.)  Because  of  existing  time  con- 
straints, limited  resources,  and  an  anticipated  lack  of 
necessary  supportive  data,  similar  modeling  efforts 
were  not  attempted  for  countries  having  planned 
economies.  If  confusion  crop  ratios  are  deemed  a 
necessary  element  of  future  Landsat-based  crop 
estimation  systems,  further  testing  and  evaluation 
are  needed  to  ensure  that  optimal  techniques  are 
used  to  estimate  any  required  ratios. 


D«t«  Editing  Procedures 

During  Phases  I and  II.  all  data  received  from 
CAMS  were  resident  in  the  aggregation  data  base  and 
were  used  to  support  aggregations  throughout  the 
season;  i.e..  a segment  classification  became  inactive 
only  if  replaced  by  a later  classification.  During 
Phase  ill,  it  was  suspected  that  early-season  acquisi- 
tions failed  to  detect  all  planted  wheat  because  of 
poor  and/or  later  emergence.  The  following  country- 
specific  methods  were  used  to  delete  the  “questiona- 
ble" acquisitions  from  the  data  base. 

1.  U.S.S.R.  newspapers  were  used  to  determine 
wheat  tillering  dates  for  oblasts,  and  segment  data  ac- 
quired prior  to  the  established  date  were  eliminated 
from  the  aggregation  process. 

2.  In  the  U.S.  Great  Plains,  rates  of  change  in  seg- 
ment wheat  proportion  estimates  were  monitored  for 
segments  that  had  multiple  acquisitions.  At  the 
average  date  when  the  rates  of  change  became  small, 
the  crop-growth  stage  was  estimated,  and  all  segment 
estimates  based  on  Landsat  data  acquired  prior  to  the 
derived  growth-stage  date  were  excluded  from  the  ag- 
gregation process. 

A screening  procedure,  which  identified  outlier 
wheat  proportion  estimates  via  comparison  of  seg- 
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menu  with  similar  historical  county  statistics,  was 
used  in  the  U.S.  Great  Plains  as  a tool  for  excluding 
questionable  segment  estimates  from  the  aggregation 
procedure. 


Yield  Estimates 

Estimates  of  wheat  yield  and  the  associated  esti- 
mates  of  yield  variance  were  provided  by  the  Yield 
Estimation  Subsystem  (YES).  Phase  I was  a testing 
period  for  yield  models;  thus,  a schedule  for  general* 
ing  and  transmitting  yield  estimates  and  variances 
was  not  established.  Beginning  in  Phase  II,  yield  esti- 
mates for  active  countries  were  generally  provided 
on  a monthly  basis  during  the  growing  season.  Yield 
variances  were  not  available  for  the  U.S.S.R.  and 
Canada  until  Phase  III. 

The  CAS  aggregation  software  was  designed  to 
utilize  yield  estimates  a*,  the  stratum  level;  e.g.,  the 
Crop  Reporting  District  (CRD)  level  in  the  United 
States.  In  Phase  II,  the  boundaries  of  the  yield  strata 
(area  represented  by  a specific  model)  were  defined 
as  CRD's  in  the  United  States,  as  Crop  Districts 
(CD's)  in  Canada,  and  as  crop  regions  in  the  U.S.S.R. 
However,  the  historical  data  used  to  develop  U.S.  and 
Canadian  models  represented  an  area  larger  than  a 
CRD/CD  (eg.,  several  CRD's/CD’s  or  an  entire 
state/province).  Therefore,  it  was  necessary  to  adjust 
the  computation  of  yield  and  production  statistical 
descriptors  to  account  for  correlations  resulting  from 
the  definition  of  model  development  boundaries 
which  did  not  match  the  boundaries  used  to  deline- 
ate the  area  stratum.  Input  of  yield  estimates  and 
variances  was  retained  at  the  stratum  level  in  the 
CAS  software  to  avoid  extensive  (and  expensive) 
software  modifications. 

Basic  variables  in  the  yield  models  are  monthly 
averages  of  specified  weather  parameters  such  as 
temperature  and  precipitation.  Thus,  yield  estimates 
for  a particular  month  would  include  weather  data 
through  the  end  of  the  previous  month.  The 
schedule  for  delivery  of  yield  estimates  and 
variances  was  established  to  allow  time  ( 1 ) to  include 
the  previous  month's  weather  in  yield  model  up- 
dates; (2)  to  operate  the  models;  and  (3)  to  support 
an  established  reporting  schedule  for  wheat  area, 
yield,  and  production.  The  yield  delivery  schedule 
adhered  to  during  Phases  II  and  111  was  (1)  estimates 
for  the  United  States  delivered  to  CAS  on  the  fourth 
working  day  of  each  month  and  (2)  estimates  for 
foreign  areas  delivered  to  CAS  on  the  ninth  working 


day  of  each  month.  Yield  data  were  transmitted  via 
telephone,  telefax,  and  magnetic  tape. 


CROP  ASSESSMENT  SUBSYSTEM  OUTPUTS 

The  primary  product  of  the  CAS  has  been  crop  re- 
ports containing  estimates  of  wheat  area,  yield,  and 
production  for  each  country  that  was  actively  being 
worked  in  a particular  phase  of  the  LACIE.  The 
general  format  and  content  of  the  reports  were  con- 
trolled by  an  Interface  Control  Document  (ICD)  be- 
tween CAS  and  the  Information  Evaluation  (IE) 
group  (USDA/LACIE,  Washington,  DC.)  (ref.  2). 
Reports  included  formatted  computer  outputs  of 
area,  yield,  and  production  estimates  and  a set  of 
statistical  descriptors  (standard  errors,  coefficients  of 
variation,  probability  of  less  than  10  percent  error, 
and  90-percent  confidence  intervals— upper  and 
lower)  associated  with  each  of  the  included  esti- 
mates; output  tables  were  provided  at  the  country, 
region,  zone,  and  stratum  levels.  In  addition,  sum- 
mary tables  of  supportive  data  (segment-level  data 
and  stratum-level  yield  estimates)  were  provided.  A 
narrative  section  of  the  report  was  utilized  to  sum- 
marize results  and  to  present  pertinent  analysis  of  in- 
put data  with  special  emphasis  on  mqor  factors 
responsible  for  changes  between  report  dates. 
Modifications  were  made  in  the  narrative  section  to 
accommodate  special  needs,  such  as  an  assessment 
of  drought  conditions  or  an  unusual  country-specific 
situation.  During  Phase  III,  a crop  condition  assess- 
ment section  was  added  to  all  reports  in  order  to 
more  adequately  highlight  unusual  circumstances 
that  could  impact  the  potential  production  level  of  a 
country  (e.g.,  drought  and  winterkill). 


Report  Schedules 

As  mentioned  previously,  only  estimates  of  wheat 
area  were  produced  in  Phase  I.  Thus,  the  established 
report  schedule  was  based  primarily  on  projected 
LACIE  processing  capabilities  and  a stated  require- 
ment for  monthly  estimates.  Phase  I reports  were 
generally  prepared  on  the  last  working  day  of  the 
month.  Only  an  end-of-season  report  was  transmit- 
ted to  IE  for  evaluation  purposes. 

In  addition  to  the  requirement  to  generate 
ti.onthly  estimates,  reports  in  Phases  II  and  III  were 
to  include  estimates  of  yield  ard  production,  were  to 
be  mailed  prior  to  comparable  official  releases  by 
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USDA,  and  were  to  demonstrate  an  operational 
capability  to  prepare  and  release  crop  reports  in  a 
timely  manner.  This  total  set  of  requirements,  in 
combination  with  established  input  data  availability 
constraints,  resulted  in  the  following  reporting 
scenarios. 

1.  U.S.  Great  Plains 

a.  Reports  should  be  mailed  no  later  than  the 
day  preceding  an  official  release  of  domestic  esti- 
mates by  USDA. 

b.  Yield  estimates  should  be  received  on  the 
fourth  working  day  of  the  month. 

c.  The  flow  of  segment  classification  results 
would  continue  up  to  and  include  the  scheduled  ag- 
gregation date. 

2.  USSR. 

a.  Yield  estimates  should  be  received  on  the 
ninth  working  day  of  the  month. 

b.  Phase  II  reports  were  scheduled  for  5 to  7 
working  days  following  receipt  of  yield  estimates. 

c.  Phase  111  report  dates  were  adjusted  to  sup- 
port scheduled  n otings  of  the  FAS/USD A/U.S.S.R. 
Grain  Estimation  Task  Force. 

In  summary,  tight  scheduling  in  terms  of  report 
release  dates  and  availability  of  required  input  data 
necessitated  efficient,  accurate  handling  of  large  data 
volumes  as  well  as  a quick  analysis  and  documenta- 
tion of  results. 

The  following  summarized  the  countries  covered 
by  crop  reports. 

1.  Phase  I — U.S.  Great  Plains:  spring  and  winter 
wheat,  area  only 

2.  Phase  II — U S.  Great  Plains:  area,  yield,  and 
production  for  spring  and  winter  wheat;  U SSR.: 
area,  yield,  and  production  for  a winter  wheat  indica- 
tor region  and  for  a spring  wheat  indicator  region; 
Canada:  area,  yieid,  and  production  of  spring  wheat 
for  the  three  prairie  provinces 

3.  Phase  III — U S.  Great  Plains:  area,  yield,  and 
production  for  spring  and  winter  wheat;  U.S.S.R.: 
full  country  estimates  of  area,  yield,  and  production 
fur  spring  and  winter  wheat 


Security  for  Commodity  Estimate* 

Although  LACIE  was  formally  designated  an  ex- 
periment, it  was  recognized  that  the  ultimate  product 
(i.e..  crop  estimates)  should  not  be  widely  distributed 
until  appropriate  evaluations  were  completed.  In  ad- 
dition, it  was  particularly  important  to  avoid  having 


experimental  results  confused  with  or  mistakenly  in- 
terpreted as  official  releases  by  the  USDA. 
Therefore,  a commodity  data  control  plan  (ref.  3) 
was  implemented  to  provide  necessary  control 
guidelines.  Some  of  the  salient  features  of  the 
security  program  are  highlighted  here;  if  more  detail 
is  desired;  interested  parties  are  referred  to  ’.he 
referenced  document. 

A controlled  access  area  was  established  to  house 
required  hardware  (remote  terminals,  hardcopier, 
and  printer),  fo  provide  workspace  in  which  reports 
could  be  assembled  and  prepared  for  mailing,  and  to 
provide  a storage  area  from  which  material  of  a sensi- 
tive nature  could  be  distributed  in  accordance  with 
the  established  commodity  security  guidelines. 

Two  basic  protection  periods  were  established  to 
cover  materials  (all  reports  and  briefings)  which  con- 
tained aggregated  area,  yield,  or  production  esti- 
mates. A maximum  protection  period  was  defined  as 
extending  from  the  date  of  issuance  until  the  next 
working  day  following  the  official  release  of  an  esti- 
mate by  USDA.  For  all  practical  purposes,  access  to 
data  during  this  pet  iod  was  limited  to  C AS,  Y ES.  Ac- 
curacy Assessment,  and  project  management  person- 
nel. During  .he  restricted  access  period,  all  controlled 
data  were  available  to  the  LACIE  staff  in  order  to 
support  evaluation  efforts,  the  preparation  of  techni- 
cal reports,  program  modifications,  and  other 
assigned  project-related  tasks.  The  duration  of  the 
restricted  access  period  was  4 months  from  the  end 
of  the  maximum  protection  period.  The  cover  and 
each  page  of  all  controlled  documents  were  to  carry 
ending  dates  for  both  protection  periods. 


EVOLUTION  OF  AQOREGATION/REPORTING 
SYSTEM  CAPABILITIES 

Currently  existing  capabilities  in  terms  of  hard- 
ware and  software  available  to  support  aggregation 
and  reporting  functions  were  not  available  at  the 
beginning  of  LACIE.  The  CAS  grew  from  a 
relatively  simple,  individual  dependent  system  to  a 
fairly  complex  system  that  is  usable  by  several 
trained  analysts  that  have  diverse  academic  and  job 
experience  backgrounds.  This  evolutionary  process 
was  strongly  influenced  by  the  rate  at  which  require- 
ments were  defined  and  documented,  by  the  time  re- 
quired to  develop  and  receive  concurrence  on  statisti- 
cal formulations,  and  by  the  degree  to  which  project 
personnel  understood  the  characteristics  and  poten- 
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tial  uses  of  available  input  data.  The  following  sec- 
tions trace  the  development  of  the  CAS  from  Phase  I 
through  Phase  III.  It  is  important  to  recognize  that 
existing  system  capabilities  at  any  particular  point 
during  the  LACIE  experience  are  necessarily 
reflected  in  the  quality  and  the  quantity  of  results 
obtained. 


Phase  I 

The  CAS  development  system  was  designed, 
developed,  implemented,  and  operated  on  a 
UNIVAC  1110  computet  located  in  Building  12  at 
the  NASA  Johnson  Space  Center  (JSC).  All  interfac- 
ing was  through  the  demand  terminals  located  on  the 
second  floor  of  JSC  Building  17. 

The  aggregation  equations  implemented  were 
those  specified  in  the  requirements  for  the  LACIE 
Phase  1 CAS  (ref.  4).  These  equations  were  basically 
unchanged  through  LACIE  Phase  III,  except  for  a 
logic  change  requiring  a minimum  number  of  ag- 
gregatable  segments  before  a direct  area  estimate 
would  be  made  at  the  stratum  level.  Otherwise,  the 
stratum  would  be  estimated  by  applying  a zone-level 
ratio  of  the  current  estimate  to  historical  data. 

The  variance  estimation  equations,  the  so-called 
standard  statistics,  were  implemented  in  a like  man- 
ner. However,  as  data  were  collected  and  the 
algorithm  was  run,  the  resulting  estimates  indicated 
some  erroneous  assumptions  concerning  the  manner 
in  which  wheat  was  distributed  at  the  substratum 
ievel.  Further  studies  were  conducted  (ref.  5),  and 
modifications  were  made  to  the  variance  estimation 
equations.  These  modifications  were  based  on  real- 
time data  and  reflected  new  assumptions  concerning 
the  within-substratum  wheat  distribution. 

These  new  standard  statistics  became  available 
late  in  LACIE  Phase  I and  were  incorporated  into 
the  development  system  software. 

Near  the  end  of  LACIE  Phase  I,  further  studies  of 
the  wheat  distribution  were  conducted  to  evaluate 
the  standard  statistics  model  with  respect  to  the 
latest  findings.  The  assumption  that  the  wheat  was 
relatively  homogeneously  distributed  throughout  the 
sampled  area  of  a zone  was  found  to  be  invalid. 
However,  usually  within  a zone,  two  or  more 
substratum  groups  could  be  found  that  were  fairly 
homogeneous.  Based  on  these  findings,  a new  model 
* was  developed,  tested,  and  implemented  and  then 
was  used  to  recalculate  the  LACIE  Phase  I standard 


statistics  (ref.  5).  This  model  was  then  used  through- 
out LACIE  Phases  II  and  III. 

The  development  software  and  data  bases  were 
converted  from  the  UNIVAC  1110  to  the  Program- 
med Data  Processor,  model  11-45  (PDP  11-45)  com- 
puter in  JSC  Building  17  late  in  Phase  I in  prepara- 
tion for  controlled  access  operations  during  Phase  II. 
The  system  was  further  modified  in  Phase  II  to  sup- 
port the  operational  requirements  of  CAS. 


Phase  II  Batch  Aggregation  Software 

The  batch  aggregation  software  was  developed  to 
bridge  a gap  between  the  UNIVAC  1110  system  and 
the  interactive  system.  The  CAS  analysts  defined 
data-handling  and  report  requirements  for  Phase  II 
which  could  not  be  met  by  the  Phase  I software,  even 
though  it  had  been  converted  to  run  on  the  PDP 
1145.  These  requirements  were  included  in  the  in- 
teractive software;  but,  because  of  the  complex 
nature  of  an  interactive  system,  they  required  a 9- 
month  lead  time  from  definition  to  implementation. 
The  decision  was  made  to  design  and  implement  an 
interim  system  to  be  executed  in  a batch  mode  that 
would  support  generation  of  CAS  reports  beginning 
in  March  1976.  This  system  also  would  have  the 
capability  of  providing  Phase  II  report  formats  and 
data  handling. 

Design. — The  batch  system  was  designed  to  pro- 
vide the  capability  of  aggregation  qualified  by  date  of 
acquisition,  date  passed  to  CAS,  biostage,  evaluation 
code,  and  level  of  allocation  hierarchy.  Require- 
ments also  indicated  a need  to  construct  a yield  esti- 
mate data  base  with  predefined  edit  specifications 
and  to  manipulate  that  data  by  making  additions, 
changes,  and  deletions.  A similar  requirement  for 
maintaining  a CAMS  data  base  (containing  the  date 
of  acquisition,  date  passed  to  CAS,  biostage,  evalua- 
tion code,  and  proportion  of  wheat  in  the  segment) 
existed  with  the  addition  of  requiring  a limited  query 
capability.  The  Phase  II  CAS  requirements  called  for 
10  basic  report  formats  with  variations  of  each  ac- 
cording to  the  desired  level  of  hierarchy.  Of  hese  10, 
4 were  levied  against  the  interim  batch  systt  n.  These 
were  the  area  estimate,  yield  estimate,  area-yield- 
production  summary,  and  production  estimate 
reports. 

Summary. — The  interim  batch  system  was  a useful 
means  of  meeting  early  Phase  II  CAS  software  re- 
quirements. One  of  the  major  factors  in  the  comple- 
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tion  of  the  design  was  the  cooperation  of  the  CAS 
analysts  in  freezing  new  requirements  until  after  the 
system  had  been  declared  operational  and  clarifying 
quickly  any  vague  or  inconsistent  areas  in  the  estab- 
lished  requirements.  This  afforded  the  designers  and 
programers  an  opportunity  to  complete  their  work 
without  major  revisions. 


PImm  II  Consolidated  System 

The  interactive  system  software  was  not  delivered 
until  May  1976  and  then  only  with  partial  capability. 
The  system  was  more  advanced  in  aggregation  and 
statistical  estimates,  but  the  data  bases  and  output  re- 
ports were  behind  the  batch  system  because  of  the 
longer  lead  time  required  to  incorporate  design 
modifications.  To  obtain  the  best  of  both  systems, 
the  batch  and  interactive  systems  were  combined 
into  a consolidated  CAS  system.  The  batch  data 
bases  were  made  compatible  with  the  interactive  ag- 
gregation software  through  an  extract  program.  The 
results  of  the  aggregation  were  extracted  from  the  in- 
teractive system  and  run  through  the  batch  report 
generator  to  obtain  the  IE  reports.  This  type  of 
operation  was  continued  through  Phase  II. 


CA8  Interactive  Software— Phaeee  II  and  III 

The  software  discussed  in  this  section  provided 
CAS  analysts  an  interactive  estimation  and  reporting 
capability  in  a single  operational  system.  The  first 
system  delivery  occurred  in  May  1976.  Several 
modifications  have  been  made  since.  The  final  ver- 
sion (delivered  in  July  1977)  is  described  below. 

Basic  design  decision. — The  estimation  and  report- 
ing software  was  implemented  on  the  Pl)P  11-45 
computer  in  an  interactive,  multiuser  environment. 
This  necessitated  the  choice  of  the  Resource  Sharing 
Executive,  model  1 II)  (RSX-I I D)  operating  system. 
The  Fortran  language  was  chosen  for  software  imple- 
mentation because  it  was  the  only  language  available 
on  the  POP  1 1-45  with  enough  versatility  for  a com- 
plex system. 

Requirements  were  for  a software  system  which 
would  be  operated  by  the  CAS  country  analysts. 
Since  the  analysts'  training  and  experience  varied 
widely,  the  interactive  interface  between  the  CAS 
analyst  and  the  CAS  software  included  a relatively 
complete  set  of  tutorial  prompts  to  guide  the  analyst 


through  various  phases  of  system  operations. 

The  CAS  software  performs  the  following  func- 
tions. 

1.  Processing  analyst  inputs  in  response  to 
tutorial  prompts. 

2.  Processing  and  storage  of  data  from  external 
sources;  i.e.,  CAMS,  yield  estimates,  historical  data, 
and  confusion  crop  ratios. 

- Crop  area,  yield,  and  production  estimation. 
Aic.;  estimates  are  computed  from  CAMS  and 
historical  data;  production  is  computed  from  area 
estimates  and  input  yield  estimates;  and  average 
yield  is  computed  at  the  higher  hierarchical  levels  by 
dividing  H.oduction  by  area. 

4.  Computation  of  standard  statistics  associated 
with  the  crop  estimates  to  assess  the  reliability  of  the 
estimates. 

5.  Aggregation  of  crop  estimates  to  the  various 
hierarchical  elements  of  the  active  LACIE  countries. 

6.  Generation  of  aggregation  reports. 

The  software  is  subdivided  into  functional  cate- 
gories which  include  batch  programs  to  initialize  the 
data  base;  data  base  management  software;  data  base 
change  software:  area,  yield,  and  production  estima- 
tion; and  report  generation  programs. 

Data  base  design. — The  CAS  data  base  must  in- 
clude data  such  as  classification  results  for  each  sam- 
ple segment  (CAMS  data);  yield  estimate  data; 
historical  area,  yield,  and  production  values;  data 
describing  the  hierarchical  structure  of  the  country; 
and  small  grains  historical  statistics  used  to  define 
the  confusion  crop  ratios.  In  addition  to  the  data  files 
and  allocation  files,  index  files  are  used  to  access  the 
data  files. 

Dual  software  systems,  both  batch  and  interac- 
tive, arc  used  to  maintain  the  data  base.  The  batch 
system  initializes  or  makes  large-scale  updates  to  the 
data  base.  The  interactive  system  is  used  for  small 
updates  or  diagnostic  changes  to  the  data  base.  Both 
systems  produce  a line  printer  record  of  the  data  base 
transaction  in  the  form  of  a file  dump  or  a data  base 
change  report. 

Data  base  security. — The  operating  system  secures 
the  data  base  through  User  Identification  Codes 
(UlC's)  and  passwords.  The  analyst  does  not  work 
directly  with  the  master  data  base,  but  instead  works 
with  his  own  copy.  A copy  can  be  made  only  by  in- 
dividuals with  access  to  the  system  password  or  by 
analysts  with  passwords  to  their  own  production 
UIC.  The  analyst's  copy  of  the  master  data  base  is 
secured  by  the  analyst's  password.  Thus,  several 
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analysts  can  be  working  simultaneously,  each  with  a 
secured  copy  of  the  data  base  for  the  L AC1E  country 
with  which  he  is  working. 

Analysts  can  add  files  to  the  master  data  base,  but 
only  individuals  with  access  to  the  system  passwords 
can  delete  files  or  change  files  in  the  master  data 
base. 

Application  program  design. — Three  types  of  ap- 
plication programs  are  in  the  CAS  software  system. 
One  type  includes  all  programs  to  perform  aggrega- 
tions; i.e.,  the  area,  production,  and  yield  estima- 
tions. Another  type  consists  of  the  report  generators. 
The  last  type  is  the  data  base  change  program. 

The  aggregation  programs  are  split  into  three 
tasks — area,  production,  and  yield.  Yield  estimates 
are  input  for  each  stratum,  and  CAMS  data  are  input 
for  each  sample  segment,  along  with  historical  area 
data  and  a set  of  analyst-specified  parameters.  Out- 
puts are  area,  production,  and/or  yield  estimates  and 
supporting  statistics  for  the  requested  hierarchical 
element  and  all  of  its  hierarchical  subelements. 
These  outputs  are  linked  to  the  report  generators 
through  the  application  report  files.  All  of  the  ag- 
gregation tasks  have  separate  report  files  for  spring, 
winter,  and  total  wheat. 

Two  report  generators — interactive  terminal  and 
line  printer — use  the  application  report  files  to  pro- 
duce reports  in  two  different  formats.  The  interac- 
tive reports  are  displayed  on  the  cathode-ray  tube 
(CRT)  with  optional  line  printer  output;  information 
is  obtained  directly  from  the  report  files  and  dis- 
played with  a minimum  of  reformatting.  The  second 
report  generator  produces  line  printer  output  in  a 
fixed  format  that  is  not  compatible  with  the  CRT 
display;  these  are  intended  for  use  by  the  IE  in 
Washington,  D.C.  Since  this  generator  gathers  infor- 
mation from  many  different  files  for  each  single  re- 
port, it  requires  significantly  more  time  to  complete 
than  the  interactive  report  generator. 

A third  report  generator  obtains  needed  informa- 
tion from  the  data  base  rather  than  from  the  applica- 
tion report  files.  It  outputs  yield  estimate  data  using 
analyst-specified  parameters.  This  generator  enables 
the  analyst  to  see  only  the  data  which  satisfy  the 
specified  input  criteria  and  which  are  qualified  for 
use  by  the  aggregation  programs. 

The  data  base  change  program  is  the  interactive 
counterpart  of  the  batch  programs  for  updating  the 
data  base.  This  data  base  change  program,  together 
with  the  interactive  report  generator,  can  be  used  as  a 
diagnostic  tool.  Temporary  data  base  changes  can  be 
entered  and  aggregation  results  viewed  immediately 


on  the  CRT  to  assist  the  CAS  analyst  in  evaluating 
test  aggregations  and  in  complying  with  requests 
from  Accuracy  Assessment. 


OVERVIEW  OF  CAS  OPERATIONS 

This  section  provides  an  overview  of  the  opera- 
tions within  CAS  that  were  necessary  to  support  es- 
tablished reporting  schedules.  CAS  analysts  used 
documented  procedures  that  covered  the  CAS  from 
computer  terminal  operation  to  report  distribution. 
In  general,  the  procedures  guided  the  analyst  through 
data  base  initialization  and  maintenance,  selection  of 
aggregation  parameters,  computer  report  generation, 
and  preparation  of  the  total  CAS  report.  A new  learn- 
ing situation  was  encountered  whenever  technical 
modifications  or  procedural  changes  resulted  in 
changes  to  the  existing  software. 

The  CAS  operations  were  initiated  in  December 
1974  with  a test  aggregation  of  28  segments  collected 
in  Kansas  for  crop  year  1973-74.  This  manual  ag- 
gregation was  used  to  verify  expansion  algorithms 
that  were  implemented  to  support  Phase  I area 
estimation  tasks.  As  might  have  been  expected,  con- 
siderable time  was  devoted  to  development  of  pro- 
cedures and  documentation  of  requirement1  early  in 
the  project.  Because  of  the  experimental  nature  of 
the  LACIE,  continual  changes  in  procedures  were 
necessary,  both  to  improve  the  quality  of  the  output 
and  to  solve  unexpected  problems.  Figures  l . 2,  and  3 
illustrate  the  technical  modifications  that  were  made 
during  the  three  phases  of  the  LACIE  and  correlate 
them  in  terms  of  time  with  CAS  system  deliveries 
and  reports  that  were  issued. 

The  following  discussion  focuses  on  data  bases 
used,  aggregation  methods,  time  lines  required  for  re- 
port generation,  and  analyst  interaction  with  other 
project  elements. 


Data  Bases 

Separate  data  bases  were  defined  for  each  country. 
These  data  bases  were  updated  with  segnii'iit  propor- 
tion estimates  and  yield  estimates  as  required  to  ob- 
'..iin  timely  area,  yield,  and  production  estimates  for 
each  level  in  the  aggregation  hierarchy  (i.e.,  the 
stratum  to  the  country). 

The  primary  data  base  required  by  the  CAS 
system  is  the  allocation  data  base,  which  defines  the 
country's  area  aggregation  hierarchy,  the  yield  strata. 
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and  the  location  or  the  sample  segments.  The  alloca- 
tion data  base  is  initiated  when  a country  is  activated 
on  the  CAS  system  and  requires  no  updates  unless 
the  aggregation  parameters  are  modified  (e.g.,  the 
number  of  segments,  agricultural  area,  total  political 
subdivision  or  hierarchy).  The  data  base  contains  the 
same  data  that  were  used  to  define  the  sample  for  a 
country.  The  system  will  not  accept  other  types  of  in- 
put data  and  will  not  aggregate  unless  the  allocation 
data  base  is  populated  and  the  hierarchy  is  correctly 
defined. 

The  historical  data  base  contains  space  for  15 
years  of  historical  area,  yield,  and  production  data. 
Three  of  these  years  are  dedicated  to  primary,  sec- 
ondary, and  allocation  epoch  years,  which  leaves  12 
years  for  other  historical  data.  The  primary  epoch 


year  is  used  for  ratio  estimating  and  area  statistical 
calculations.  These  data  are  used  for  determining  the 
maximum,  minimum,  average,  and  variance  of  the 
crop  of  interest  in  the  hierarchy  for  reporting  and 
analysis  purposes.  The  data  base  can  be  updated  as 
desired.  The  primary  epoch  year  is  required  for  the 
system  to  aggregate. 

A confusion  crop  ratio  for  each  segment  is  main- 
tained in  a data  base  to  ratio  winter  and/or  spring 
wheat  from  the  CAMS  small  grains  estimates.  These 
ratios  are  used  to  obtain  the  ratioed  wheat  stored  in 
the  CAMS  data  base,  as  previously  defined  in  the 
section  entitled  “Basic  Input  Data  to  the  Crop 
Assessment  Subsystem,"  subsection  “landsat 
Data.”  Ratios  can  be  calculated  external  to  the  soft- 
ware and  input  for  each  segment,  or  the  small  grains 
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historical  data  may  be  input  and  the  software  will 
calculate  the  ratios.  The  system  will  operate  without 
confusion  crop  ratios  in  the  data  base,  but  the  ratioed 
wheat  estimates  will  be  identical  to  nonratioed  wheat 
estimates  because  the  blanks  are  interpreted  as  a 
ratio  of  one. 

The  CAMS  data  base  will  accommodate  all  the 
sample  segment  classification  results  received  with  a 
classification  code  of  10  or  greater  for  all  segments  in 
a country.  (Table  I lists  the  CAS  classification 
codes.)  Five  classes  are  stored — winter  wheat,  spring 
wheat,  winter  small  grains,  spring  small  grains,  and 
total  small  grains.  From  these  classes,  two  additional 
classes  (ratioed  winter  wheat  and  ratioed  spring 
wheat)  are  obtained  by  applying  the  confusion  crop 
ratios  to  the  small  grains  estimates.  In  addition,  the 
following  parameters  also  are  stored  for  each 


classified  segment;  i.e , date  passed  to  CAS,  date  of 
Landsat  acquisition,  and  biostage  of  wheat  develop- 
ment. The  system  will  not  aggregate  a zone  unless  at 
least  three  aggregatable  segments  are  available  within 
a lower  level  of  the  hierarchy. 

The  allocation  and,  consequently,  the  aggregation 
hierarchy  were  based  on  political  subdivisions  to  uti- 
lize the  historical  and  census  data  that  existed  for 
these  areas.  The  base  level  in  a country  was  the 
lowest  political  subdivision  for  which  detailed 
historical  crop  statistics  were  available.  The 
hierarchy  for  each  country  became  (l)  United 
States — U.S.  Great  Plains — state,  CRD,  county;  (2) 
U.S.S.R.— crop  region— economic  region  oblast;  and 
(3)  Canada — prairie  provinces — province,  CRD, 
county.  This  resulted  in  the  United  States  and 
Canada  being  substratum-level  countries  (because  of 
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the  availability  of  historical  data)  and  the  U.S.S.R. 
being  a stratum-level  country  (because  of  the  non- 
availability of  historical  data). 

The  area  was  estimated  for  the  base-level  area  by 
direct  expansion  from  the  segments  and  was 
summed  to  higher  levels  in  the  hierarchy.  For  base- 
level  areas  with  no  segments,  the  area  was  ratio- 
estimated  using  areas  with  segments  available  for  ag- 
gregation and  historical  data. 

The  selection  criteria  for  the  CAMS  data  to  be  ag- 
gregated assumed  that  the  latest  data  collected  would 
result  in  the  most  accurate  classification;  therefore, 
the  latest  acquisition  data  and  biophase  were  the  pri- 
mary selection  parameters.  The  classifications  were 
rated  satisfactory,  marginal,  and  unsatisfactory  by 
CAMS  evaluation  procedures,  and  the  satisfactory 
and  marginal  ratings  were  considered  acceptable  for 
aggregation.  Although  the  unsatisfactory  segments 
were  not  aggregated,  they  were  carried  in  the  CAMS 


data  base  for  information.  The  segment  classifica- 
tions were  added  to  the  data  base  as  they  became 
available  to  CAS  or  as  required  to  produce  scheduled 
reports. 

As  the  experiment  progressed  in  Phases  11  and  111, 
it  became  necessary  to  add  the  CAMS  classification 
date  to  properly  handle  segments  that  were  reworked 
in  CAMS.  The  four  biowindows  were  expanded  to 
seven  crop  development  biostages  (table  11)  to  con- 
form to  LAC1E  crop  calendar  outputs.  This 
parameter  was  added  to  the  CAS  criteria  for  selection 
of  segments  as  a single  biostage  or  as  a set  in  order  of 
predetermined  priority.  The  selection  criteria  of 
CAMS  segment  data  were  classification  date,  acquisi- 
tion date,  biostage,  and  classification  code. 

The  early-season  winter  wheat  area  estimates  in 
Phases  II  and  III  were  low  because  the  LACIE 
system  identified  detectable  wheat;  i.e.,  wheat  area 
with  sufficient  ground  cover  to  be  detected  by 
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Table  /.— CAMS  Evaluation  Codes a 


Code  Description  Code  Description 


01  Not  machine  processed— clouds,  haze,  snow,  etc. 

This  means  that  the  segment  cannot  be  processed 
through  the  system  because  clouds,  haze,  etc.,  make  in- 
terpretation and  analysis  impossible. 

02  Not  machine  processed — confusion  crops  or  other  in- 
terpretation problems 

This  code  should  be  used  when  a segment  cannot  be 
processed  because  of  interpretation  difficulties, 
especially  when  confusion  between  wheal/small  grains 
and  other  crops  is  such  that  a wheat/small  grains  esti- 
mate cannot  be  determined.  The  rules  for  such  a deci- 
sion are  to  be  negotiated  between  CAMS  and  CAS  and 
included  in  the  CAMS  Detailed  Procedures. 

03  Bad  data— due  to  technical  problems,  not  reordered 

In  this  case,  the  segment  cannot  be  processed  due  to 
technical  problems  arising  from  an  unsatisfactory 
histogram,  excessive  scan-line  drop,  etc  If  the  segment 
is  not  reordered,  this  code  is  used. 

NOTE:  If  the  segment  is  reordered,  no  code  should  be 
listed  and  no  data  sheet  should  be  passed  to  CAS  since 
the  segment  will  again  be  sent  through  the  CAMS  for 
evaluation.  Any  segment  that  cannot  be  processed  due 
to  clouds,  etc.,  or  technical  problems  should  be  passed 
as  01  or  03. 

05  Not  machine  processed — dormancy 

In  cases  where  recognition  is  a problem  because  the 
crop  is  in  a state  of  dormancy,  this  code  is  used. 

07  Not  machine  processed — preemergence 

In  cases  where  the  acquisition  is  prior  to  the  criteria  es- 
tablished for  fall  wheat  recognition  (to  be  determined), 
this  code  is  used. 

09  Not  machine  processed — multiple  acquisitions 

Code  09  was  developed  to  take  care  of  the  problem  of 
processing  multiple  acquisitions  of  a segment  at  the 
same  time.  When  multiple  acquisitions  arc  available  at 
one  time, only  one  segment — which  is  determined  tube 
the  better  acquisition  for  interpretation  and  analysis  by 
the  CAMS  analyst— is  selected  for  complete  process- 
ing. Other  acquisitions  are  listed  as  Code  09  to  indicate 
that  they  have  been  reviewed  but  were  not  processed 
through  the  system.  This  code  enables  CAS  to  review 
the  segment  results  and  account  fur  each  acquisition 
and  how  it  was  evaluated  by  CAMS. 


10  Unsatisfactory — unsatisfactory  results  for  segment 

This  code  is  to  be  used  for  any  acquisition  that  has  been 
processed  through  the  system  and.  based  on  CAMS 
evaluation  procedures,  designated  unsatisfactory. 

12  Unsatisfactory — no  significant  change 

This  code  is  used  when  the  r.  w acquisition  is  evaluated 
to  have  no  significant  change  from  the  previous  un- 
satisfactory evaluation  for  the  segment. 

14  Unsatisfactory — rework,  reevaluated  segment 

This  code  is  used  when  a segment  that  was  previously- 
passed  to  CAS  is  reworked. 

18  Unsatisfactory — machine-processed  multitemporal 

analysis 

This  code  is  used  when  more  than  one  ac  , .isition  date 
is  used  to  produce  an  unsatisfactory  proportion  esti- 
mate. 

NOTE:  All  acquisition  dates  used  in  processing  should 
be  listed  on  the  C AMS  Evaluation  Form. 

20  Marginal — marginal  results  for  segment 

This  code  is  to  be  used  for  any  acquisition  that  has  been 
processed  through  the  system  and.  based  on  CAMS 
evaluation  procedures,  designated  marginal. 

22  Marginal — no  significant  change 

This  code  is  to  be  used  when  the  new  acquisition  is 
evaluated  to  have  no  significant  change  from  the  pre- 
vious marginal  evaluation  for  the  segment. 

24  Marginal — rework,  reevaluated  segment 

This  code  is  used  when  a segment  that  was  previously 
passed  to  CAS  is  reworked. 

28  Marginal — machine-processed  multitemporal  analysis 

This  code  is  used  when  more  than  one  acquisition  date 
is  used  to  produce  a marginal  proportion  estimate. 

NOTE:  All  acquisition  dates  used  in  processing  should 
be  listed  on  the  CAMS  Evaluation  Form 
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Tabu:  /.  — Concluded 


Tabu:  II. — 1.4  (IT  Biowindom  and 
Corresponding  Crt>p  Development  Manages 


Code 


Description 


HiDWIIIiIiIH 


Htoshige 


30  Satisfactory — satisfactory  results  for  segment 

This  code  is  to  be  used  fur  any  acquisition  that  has  been 
processed  through  the  system  and.  based  on  CAMS 
evaluation  procedures,  designated  satisfactory. 

32  Satisfactory— no  significant  change 

This  code  is  to  be  used  when  the  new  acquisition  is 
evaluated  to  have  no  significant  change  from  the  pre- 
vious satisfactory  evaluation  for  the  segment. 

34  Satisfactory— rework,  reevaluated  segment 

This  code  is  used  when  a segment  that  was  previously 
passed  to  CAS  is  reworked. 


3b  Satisfactory — less  than  5 percent  manually  (hand) 

counted 

This  code  is  to  be  used  for  any  segment  in  which  the 
proportion  estimate  is  manually  counted  rather  than 
machine  processed. 

NOTE:  This  category  would  not  be  used  if  the  segment 
was  a rework  segment.  Code  34  should  be  used. 

38  Satisfactory — machine-processed  multitcntporal 

analysis 

This  code  is  used  when  more  than  one  acquisition  date 
is  used  to  produce  a satisfactory  proportion  estimate. 

40  Segment  is  totally  nonagricultural 

This  code  is  used  when  the  segment  is  evaluated  as 
having  no  agriculture  at  all;  i.e.,  no  discernible  Held  pat- 
terns. 

NOTE:  Segments  in  an  agricultural  area  that  have  a 0- 
percent  proportion  estimate  are  to  be  designated  Code 
30. 


LACIE  classification  methods.  If  all  wheat  is  not 
detected  early  in  the  season,  then  the  estimates  will 
be  biased  low  and  will  continue  to  be  low  until  the 
segment  data  are  replaced  by  later  classifications. 
Thresholding  procedures  to  eliminate  the  early- 
season  biased  data  from  the  aggregation  were  imple- 
mented in  Phase  III  in  the  United  Stales  and  in  the 
USSR. 


1.  Crop  establishment 

2.  Greening 

3.  Heading 

4.  Maturity 


{ 

{ 


1.  Planting 

2.  Emergence 

3.  Jointing 

4.  Heading 

5.  Soft  dough 
t>.  Ripening 
7.  Harvest 


The  CAMS  data  base  is  updated  as  required  to 
maintain  the  most  curren1  data  available  for  aggrega- 
tion. Data  are  selected  from  this  data  base  for  area 
estimates  depending  upon  the  type  of  aggregation 
desired  and  the  priority  of  the  data  selection 
parameters.  A data  base  used  for  aggregation  may  be 
saved  for  future  evaluation  and/or  reaggregation  if 
required  by  procedure  modifications. 

The  yield  data  base  contains  a yield  estimate  and 
associated  yield  variance  for  each  area  stratum.  The 
yield  data  base  is  updated  as  the  yields  are  received 
from  YES,  generally  on  a monthly  basis  during  the 
growing  season.  The  data  base  can  be  updated  in- 
teractively or  in  the  batch  mode  using  cards  or  tape. 
The  allocation  data  base  defines  the  aggregation 
hierarchy  for  a country.  The  area  strata  and  yield 
strata  that  are  combined  to  obtain  production  are 
identified  in  the  allocation  data  base. 


Aggregation 

The  LACIE  used  the  same  sampling  and  aggrega- 
tion techniques  through  the  three  phases  of  opera- 
tion with  some  modifications  in  the  allocation 
parameters.  The  statistics  calculations  were  modified 
to  accommodate  yield  models  covering  areas  larger 
than  strata  and  to  adequately  consider  mixed  wheat 
(spring  and  winter)  areas.  The  historical  ratios  for 
estimating  areas  with  no  segments  available  (Group 
III)  were  changed  to  use  the  /one-level  history  rather 
than  the  stratum  if  less  than  three  segments  were 
available  for  aggregation.  The  CAS  procedures  were 
in  a development  mode  throughout  the  three  phases 
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and  were  modified  to  accommodate  new  techniques 
in  sampling  and  classification  as  the  project 
progressed.  Figures  1,  2,  and  3 show  the  progression 
of  LACIE  through  the  three  phases  in  relation  to  the 
reports  generated. 

During  the  course  of  CAS  operations,  statistical 
calculations  were  modified  to  accommodate  changes 
in  the  system  such  as  yield  model  boundaries, 
variances  for  substrata  that  are  estimated  with  prob- 
ability proportional  to  size  (Group  II),  and  mixed 
wheat  areas.  (See  the  presentations  and  supporting 
papers  in  the  Experiment  Design  Session  which  ad- 
dress sampling  and  aggregation  issues.)  These 
algorithms  were  implemented  in  an  off-line  develop- 
ment system  as  a test  prior  to  implementation  on  the 
CAS  system.  During  the  implementation  ;od,  ag- 
gregations were  performed  on  the  interactive  system, 
and  statistics  were  produced  on  the  development 
system. 

During  Phase  II,  algorithms  and  procedure 
modifications  occurred  faster  than  they  could  be  im- 
plemented into  the  CAS  operational  system.  For  this 
reason,  operations  were  carried  out  on  two  aggrega- 
tion systems  with  the  extract  software  to  manipulate 
data  between  them.  This  situation  actually  existed  at 
the  end  of  Phase  III  for  thresholding  and  screening 
of  the  CAMS  data  base  prior  to  aggregation,  because 
these  techniques  were  not  incorporated  into  the  in- 
teractive system. 

In  addition  to  thresholding  CAMS  data,  a pro- 
cedure called  screening  also  was  used  in  Phase  III  to 
identify  segment  wheat  proportions  that  were 
statistical  outliers.  The  test  statistic  was  the  ratio  of 
the  CAMS  estimated  wheat  proportion  to  the  histori- 
cal proportion  of  wheat  in  that  county.  If  this  ratio 
fell  outside  the  3-standard-deviation  limit  calculated 
for  its  group,  the  segment  was  an  outlier  and  was 
eliminated  from  the  aggregation.  This  screening  pro- 
cedure was  used  in  the  United  States  in  Phase  HI  but 
was  not  applied  to  the  U.S.S.R.  because  the  lowest 
political  subdivision  for  which  historical  data  are 
available  is  the  oblast  (stratum),  and  the  procedure 
was  not  applicable  to  the  larger  geographic  area. 

The  yield  was  input  at  the  stratum  level  and  com- 
bined with  'he  stratum  area  to  obtain  production. 
The  production  was  then  summed  to  obtain  totals  for 
higher  levels  in  the  hierarchy.  The  derived  yields 
were  obtained  by  dividing  the  production  by  the  area 
at  any  desired  level  in  the  hierarchy. 

The  CAS  analyst  reviewed  the  CAMS  segment 
data  inputs  and  updated  the  CAMS  data  base  as  re- 


quired to  keep  the  data  base  current.  A preliminary 
area  estimate  was  generated  and  evaluated  about  7 
working  days  prior  to  a scheduled  report.  The  yield 
estimates  were  reviewed  by  the  YES  and  the  CAS 
prior  to  updating  the  yield  data  base.  The  area,  yield, 
and  production  were  estimated  and  reviewed  for  ac- 
curacy and  reasonableness.  If  data  base  errors  ex- 
isted. they  were  corrected  and  a final  set  of  estimates 
was  generated. 


Time  Line  for  Report  Preperetlon 

The  CAS  operations  time  line  required  7 working 
days  before  a report  date  to  update  the  CAMS  data 
base  and  to  prepare  an  area  aggregation.  A backup  ag- 
gregation was  prepared  to  submit  as  a monthly  report 
to  cover  any  computer  failures  that  might  occur  dur- 
ing the  critical  period.  One  backup  aggregation  was 
submitted  as  a CAS  report  during  Phase  II  because  of 
computer  failure.  Under  special  circumstances  dur- 
ing Phases  II  and  III,  the  lead  time  was  shortened  to 
as  little  as  3 days  prior  to  a report  to  accommodate 
segment  processing  and  to  include  the  latest  data  in 
an  aggregation. 


Analyst  Interaction  With  Other  Project 
Elements 

The  CAS  analysts  provided  feedback  to  other  ele- 
ments of  the  AES  concerning  operational  require- 
ments for  segment  processing  or  problem  areas  iden- 
tified during  the  report  preparation.  Sample  segment 
results  which  produced  aggregate  area  estimates  that 
deviated  from  the  expected  values  (based  on  histori- 
cal data  combined  with  current  weather  information) 
were  referred  to  the  CAMS  for  review. 

Yields  that  did  not  follow  expected  trends  were 
referred  to  the  YES  for  verification.  If  an  evaluation 
of  the  data  resulted  in  a modification  or  deletion,  the 
appropriate  CAS  data  bases  were  updated. 

The  CAS  analysts  also  mt  with  the  Cron  Condi- 
tion Assessment  Team  and  the  YES  per.onnel  to 
review  their  inputs  to  the  CAS  report.  Information 
required  to  complete  a report,  such  as  operations 
processing  and  evaluation  of  segment  data,  were  ob- 
tained from  other  AES  elements.  The  completed  re- 
port was  delivered  to  the  Commodity  Control  Office 
for  reproduction  and  distribution. 
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LACIE  Status  and  Tracking 

V.  M.  Dauphin ,a  C.  H.  Jeffres^  and  J.  M.  Everette * 


INTRODUCTION 

The  LACIE  production  processing  system  at  the 
NASA  Johnson  Space  Center  (JSC)  called  Tor  the 
flow  of  electronic  and  physical  data  products  in  a 
timely  and  efficient  manner.  The  LACIE  data  bases 
at  JSC  included  electronic  imagery  on  the  IBM 
360-7$  computer  in  the  Data  Systems  and  Analysis 
Directorate  (DSAD,  Building  30),  LACIE  crop 
assessment  data  on  the  Earth  Observations  Division 
(EOD)  Programmed  Data  Processor  (PDP)  1 1-45 
support  processor  (Building  17),  and  physical  pro- 
ducts in  the  LACIE  Physical  Data  Library  (LPDL, 
Building  17).  In  addition,  imagery  tapes  were  ob- 
tained from  the  NASA  Goddard  Space  Flight  Center 
(GSFC);  film  products  were  provided  by  the  JSC 
Production  Film  Converter  (PFC),  and  crop  calen- 
dars were  acquired  from  the  U.S.  Department  of 
Agriculture  (USDA).  The  need  for  data  and  data  pro- 
ducts to  be  available  at  given  stations  simultaneously 
dictated  that  accurate  data  status  be  available. 
Further,  there  was  a requirement  to  measure 
throughput  rates  and  perform  efficiency  analysis. 
The  data  management  method  employed  to  meet 
these  requirements  is  known  as  the  Automated 
Status  and  Tracking  System  (AS ATS). 

The  ASATS  went  through  a number  of  evolutions 
as  the  LACIE  program  matured  over  four  phases  in 
the  last  3.5  years.  Data  flews  changed  as  a result  of 
modifications  to  basic  data  analysis:  man/machine 
interfaces  became  more  complex  with  the  advent  of 
more  machine  terminal  processing:  and  the  ASATS 
data  base  grew  in  size  from  4000  blocks  (2$6  16-bit 
words  per  block)  to  more  than  20  000  blocks  of  data. 

This  paper  will  discuss  the  operational  require- 
ments. the  evolution  of  the  status  and  tracking 
system  in  meeting  these  requirements,  and  a defini- 
tion of  the  final  ASATS  developed  to  meet  the 
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LACIE  needs.  Additionally,  the  lessons  learned  dur- 
ing this  evolutionary  process  will  be  discussed. 


LACIK  STATUS  AND  TRACKING  OPERA- 
TIONAL REQUIREMENTS 

The  LACIE  operational  elements  were  charged 
with  the  measurement  and  analysis  of  system 
throughput  in  both  numbers  of  sample  segment  ac- 
quisitions and  times  required  to  process  these  ac- 
quisitions. Further,  there  were  baseline  goals  against 
which  progress  was  required  to  be  measured  at  or 
between  predetermined  stations  in  the  LACIE  pro- 
cessing system.  Operations  also  had  to  identify  those 
conditions  that  deviated  from  specifications  for  both 
the  numbers  of  throughput  acquisitions  and  the  pro- 
cessing times  at  predetermined  status  points  and 
then  flag  problem  areas  and  assist  in  implementing 
solutions. 

In  order  to  accomplish  these  objectives,  a status 
and  tracking  system  was  required  with  inputs  pro- 
vided by  various  system  elements.  Evaluation  of  the 
level  at  which  tracking  should  be  accomplished  and 
the  information  to  be  maintained  dictated  that  a 
system  be  developed  so  that  a large  part  of  the  test- 
ing, correlating,  and  reporting  could  be  done  in  an 
automated  manner. 

The  data  bases  were  planned  to  contain  several 
types  of  information.  There  was  a basic  accounting 
set  that  was  unique  to  a sample  segment  or  set  of  seg- 
ments and  subject  to  very  limited  change  or  none. 
The  second  data  type  contained  production  status 
parameters,  which  were  subject  to  change  as  a func- 
tion of  data  being  moved  in  the  production  system.  It 
was  possible  that  up  to  16  sets  of  production  status 
parameters  could  exist  for  each  accounting  set.  Con- 
sidering the  probability  of  both  standard  and  varia- 
tion reporting  and  changes  in  the  stations  being 
tracked  due  to  changes  in  operating  procedures,  it 
was  necessary  to  allow  for  dynamic  manipulation  of 
data  base  definitions. 
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Additionally,  capabilities  were  required,  such  as 
the  comparison  of  elements  and  the  use  of  the 
arithmetic  tools  of  addition,  subtraction,  multiplica- 
tion, division,  and  statistical  calculation. 

There  was  a need  to  allow  for  batch  update  and  re- 
porting, as  well  as  the  capability  to  perform  inter- 
active queries.  Outputs  were  required  in  report 
formats  as  defined  by  the  user  at  run  time  or  as  pre- 
viously stored  for  batch  operation.  There  was  also 
the  need  to  allow  for  preprogrammed  queries  that 
could  be  called  by  the  user. 


EVOLUTION  OF  THE  LACIE  DATA  MANAGE- 
MENT/STATU8  AND  TRACKING  SYSTEM 


October  to  Doeombor  1074 

Discussions  were  initiated  regarding  the  need  fo*  a 
status  and  tracking  system  dedicated  to  determining 
the  progress  of  LaCIE  data  as  they  flowed  through 
the  analysis  process  within  the  EOD.  At  this  time,  it 
was  generally  believed  that  the  machine  processing 
status  to  be  provided  by  the  360-75  computing 
system  might  prove  adequate  for  the  purpose. 


January  and  February  1075 

Further  investigations  showed  that  because  of 
data  preparation,  data  quantity,  and  the  analysis 
steps  involved,  the  status  and  tracking  of  LACIE 
sample  segments  from  the  machine  processing  of 
GSFC  imagery  data  tapes  to  the  storage  of  data  on 
the  Earth  Resources  Interactive  Processing  System 
(ERIPS)  imagery  data  bases  and  the  PFC-gencrated 
products  was  not  sufficient  to  determine  status  dur- 
ing the  analysis  process. 

Based  on  the  roughly  600  LACIE  sample  seg- 
ments ordered  lor  Phase  I and  the  plan  to  analyze 
on:  acquisition  per  biowindow,  a 2400- file  data  base 
with  nine  status  stations  (preselected  points  in  the 
di.ta  flow  where  products  availability  was  the  key  to 
the  process  continuing),  some  averaging  40  to  60 
transactions  a day,  was  loo  large  to  maintain  by 
manual  processes. 

March  1975 

A decision  was  made  to  implement  an  automated 
status  and  tracking  system  to  support  the  LACIE 


program  by  June  1, 197$.  It  was  obvious  that  a fully 
operational  status  and  tracking  system  to  support  the 
requirements  that  were  at  the  time  beginning  to  be 
identified  was  difficult— -if  not  impossible — to 
develop,  debug,  and  acceptance  test  in  slightly  less 
than  the  4 months  remaining  prior  to  Phase  I produc- 
tion. An  alternative,  proposed  by  the  EOD  support 
contractor,  was  to  utilize  the  TRAC-8  series  of  status 
and  tracking  systems  developed  for  the  Data  Reduc- 
tion Complex  (DRC)  in  the  Institutional  Data 
Systems  Division  (IDSD),  on  an  interim  basis,  until 
a system  that  adequately  met  all  the  requirements  of 
LACIE  could  be  found  and/or  developed.  It  was  at 
this  point,  March  14.  1975,  that  the  LACIE  Interim 
Status  and  Tracking  System  (ISATS)  was  born. 

April  to  Juno  1975 

Because  of  the  limitations  on  flexibility  within  the 
TRAC-8  software,  ISATS  was  implemented  with 
three  directories.  One  directory  cross-referenced  the 
Data  Product  Requests  (DPR's)  and  the  LACIE 
sample  segment  numbers  and  tracked  the  processing 
of  a sample  segment  from  receipt  of  data  at  JSC  until 
the  Crop  Assessment  Subsystem  (CAS)  procedure 
war.  completed. 

The  second  directory  Hacked  each  data  product 
request,  whether  batch,  interactive,  or  update,  until 
all  electronic  data  processing  products  were  received 
by  EOD. 

A third  directory,  which  was  later  dropped  from 
the  requirements,  tracked  Discrepancy  Reports  until 
the  discrepancy  was  cleared.  (This  function  was 
assumed  by  the  Facilities  Configuration  Control 
Office.) 

initially,  inputs  to  the  ISATS  were  planned  to  be 
made  on  four  terminals.  However,  the  only  terminal 
put  into  use  was  located  in  the  LPDL.  the  control 
point  for  all  LACIE  data  products.  ISATS  went  ‘’on- 
line" for  LACIE  use  on  June  1,  1975. 


June  to  October  1975 

The  inadequacies  of  ISATS  became  apparent  very 
early  in  the  operation.  Part  of  the  problem  was  the 
inflexibility  of  the  TRAC-8  software  with  regard  to 
minor  changes  in  the  LACIE  data  (low  and  to  any 
new  reports  required,  This  was  compounded  by  the 
fact  that  there  was  no  input  verification  and  no  audit 
capability  for  the  relatively  new  users  of  the  system. 
The  rest  of  the  problem  was  physical  access  to  the 
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computer  via  the  one-demand  terminal  within  EOD. 
At  times  during  this  period,  the  ISATS  reports  were 
running  3 to  4 days  after  the  fact:  and.  while  ISATS 
was  building  up  and  saving  a valuable  data  base,  it 
was  not  helpful  on  a real-time  basis  in  accounting  for 
L ACIE  sample  segment  data  products  as  they  flowed 
through  the  EOD  analysis  process.  Before  the  end  of 
July,  it  became  apparent  that  ISATS  was  not  a 
satisfactory  interim  system  and  another  search  had 
begun  to  find  a system  more  compatible  and  respon- 
sive to  the  LACIE  needs. 

The  support  contractor  proposed,  as  a result  of  a 
capability  study,  that  the  LACIE  Phase  II  status  and 
tracking  be  accomplished  on  a commei dally  leased 
data  base  management  system  known  as  “COM- 
SHARE/COM  POSIT  77."  The  proposal,  as  finally 
accepted,  called  for  the  status  and  tracking  of  all  ac- 
quisitions from  roughly  1800  sample  segments — the 
LACIE  Phase  II  scope — from  the  time  they  were  or- 
dered from  GSEC.  through  JSC  processing,  and  untii 
sample  segment  summaries  were  provided  CAS. 
This  activity  used  card  input  for  overnight  batch  to 
update  the  data  base  and  to  generate  reports  by  the 
next  morning  containing  data  less  than  12  hours  old. 
The  requirements  called  for  nine  daily  reports  (seven 
statistical  summaries  and  two  tabular  activity  list- 
ings) plus  a weekly  ind  a monthly  summary  report. 
Software  optimization  was  required  to  provide  21 
status  stations  for  each  sample  segment  acquisition 
identified  in  the  LACIE  Phase  II  requirements. 

Cos^  of  operating  COMSHARE  was  originally 
estimated  at  $4000  per  month.  This  estimate  was 
based  on  projected  data  base  size  and  terminal  ac- 
tivity (connect,  line,  and  central  processing  unit 
(CPU)  costs). 

Even  though  cost  control  was  recognized  as  a 
problem  (special  query  reports  for  operations  were 
estimated  at  $50  per  report),  the  software  flexibility 
and  overnight  update  of  each  day's  activity  were 
desirable  in  meeting  the  LACIE  operational  require- 
ments, and  the  use  of  the  system  was  approved  by 
management. 

October  1975  to  March  1976 

During  the  first  2 weeks  of  October,  the  ISATS 
data  base  was  verified  and  transferred  to  the  COM- 
SHARE/ASATS.  Concurrently,  optimization  of  the 
software  to  meet  the  requirements  of  LACIE  Phase 
II  began.  ISATS  and  ASATS  ran  simultaneously  for 
the  last  2 weeks  of  October  to  ensure  a corral  data 


base  prior  to  full-scale  operation  on  ASATS  alone. 

Additional  requirements  for  LACIE  Phase  II  were 
generated  during  the  period  from  the  cross  reference 
of  Phase  I and  Phase  II  data  bases,  extended  report- 
ing capabilities,  and  additional  query  capabilities  to 
support  LACIE  performance  analysis.  These 
modifications  were  completed,  documented,  and  ac- 
ceptance tested  by  the  middle  of  March,  and  the 
system  was  turned  over  to  LACIE  operations  as  of 
April  I,  197b. 

April  to  September  1 976 

All  of  LACIE  Phase  II  was  supported  on  the 
COMSIIARE/ASATS,  but  the  software  required 
considerable  modification  to  meet  the  changes  in  re- 
quirements. The  largest  modification  to  the  system 
was  to  support  a change  in  the  Classification  and 
Mensuration  Subsystem  (CAMS)  analysis  process. 
This  required  the  addition  of  one  new  card  format, 
autopunch  of  five  cards,  two  new  reports  (daily 
packet  order  lists  and  operations  throughput  sum- 
mary). the  automatic  update  of  historical  data  bases, 
and  a more  comprehensive  input  verification 
subroutine. 

Non.*  rf  the  above  changes  came  cheaply.  In  fact, 
the  increased  operational  activity  in  addition  to  the 
software  modification  costs  slowly  escalated  ASATS 
costs  far  beyond  what  had  been  anticipated  from  the 
information  originally  available.  In  July,  upon 
receipt  of  the  LACIE  Phase  II/III  status  and  tracking 
requirements,  it  became  obvious  that  a cheaper 
method  had  to  be  found  to  support  this  function. 

Prior  to  the  COMSHARE/ ASATS  original  imple- 
mentation, an  EOD  machine-loading  study  was  done 
in  the  hope  that  the  status  and  tracking  function 
could  be  done  in-house  on  its  PDP  11-4$  computer. 
Unfortunately,  the  resources  were  not  available,  but 
late  June  brought  EOD  the  possibility  of  obtaining  a 
second  PDP  11-45.  which  would  solve  the  resource 
problem.  Hy  August.  EOD  management  had  ap- 
proved a feasibility  study  on  (he  duplication  of  the 
COMSIIARE/ASATS  function  on  the  second  PDP 
11-45. 

October  1976  to  January  1977 

The  LACIE  Phase  I i/I  1 1 requirements  were  accep- 
tance tested  on  the  COMSIIARE/ASATS  system 
and  turned  over  to  LACIE  operations  by  (Member  ! 5, 
1976.  During  the  same  month,  a proposal  was  maac 
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to  duplicate  the  function  on  the  PDP 1 1-45  and  move 
the  entire  operation  in-house  upon  acceptance  test- 
ing. The  plan,  as  presented,  would  provide  all  the  re- 
quired features  at  a significant  cost  reduction  via 
utilization  of  the  newly  acquired  EOD  computer  (the 
second  PDP  11-4S).  The  plan  was  approved  and 
work  began  immediately  to  modify  the  PDP- 
compatible  data  management  software,  the  Regional 
Information  Management  System  (RIMS),  to  pro- 
vide an  in-house  RIMS/ASATS  as  a replacement  for 
COMSHARE/ASATS. 

The  RIMS/ASATS  replacement  ran  simulta- 
neously with  COMSHARE/ASATS  for  the  last  half 
of  December  to  verify  the  new  data  management 
software  and  was  acceptance  tested  in  January  1977. 
At  that  time,  formal  release  of  the  use  of  COM- 
SHARE  was  made.  Average  operating  costs  dropped 
from  $20000  per  month  to  $2150  per  month  and 
have  not  exceeded  $4000  per  month  since  the  con- 
version was  made. 


February  1977  to  February  1978 

The  RIMS/ASATS  has  remained  the  basic  status 
and  tracking  function  during  this  period.  Only  three 
modifications  have  been  made  as  a result  of  changed 
requirements.  The  first  change  was  made  in  June 
1977  to  accommodate  a new  data  flow  that  involved 
the  Image-100  (“Procedure  I”)  method  of  CAMS 
data  analysis.  The  second  change  was  to  accommo- 
date Transition  Year  requirements  for  a different 
batch  input  stream.  The  last  modification  was  made 
in  June  1977  to  the  in-house  RIMS  to  add  data  base 
protection  and  to  provide  a better  arithmetic  opera- 
tor in  the  comparison  of  fields. 

As  of  this  writing,  no  further  modifications  to 
what  has  become  a very  satisfactory  Data  Manage- 
ment/Status and  Tracking  System  are  anticipated. 

DESIGN  CONSIDERATIONS 

From  the  initial  analysis,  it  was  determined  that 
most  of  the  data  desired  in  the  accounting  data  set 
was  already  available  in  card  format  from  the  Data 
Acquisition,  Preprocessing,  and  Transmission  Sub- 
system (DAPTS)  Landsat  data  orders.  This  set  of 
data  became  the  DAPTS  data  base.  There  were  addi- 
tional LACIE  operational  requirements  for  this  data 
base  and  an  additional  card  input  was  devised  to  add 
to  the  DAPTS  set.  A record  in  this  data  base  was 
generated  each  time  the  DAPTS  ordered  data  from 


CSFC  by  utilizing  the  order  cards.  Also,  records  were 
updated  or  deleted  by  utilizing  the  cards  produced  by 
DAPTS  operations  whenever  a segment  was  changed 
or  deleted. 

Additional  information  pertaining  to  the  segment 
was  input  manually.  This  information  pertained  to 
product  availability  in  the  analysis  packet.  The  pro- 
ducts tracked  in  this  manner  were  considered  critical 
to  the  analysis,  and  their  absence  caused  data  to  be 
held  or  backlogged  until  these  support  products 
became  available. 

The  data  base  that  contained  segment  acquisition 
status  id  tracking  information  was  generated  by 
electronic  imagery  from  an  acquisition  being 
received  at  JSC  and  entered  into  the  ERIPS  imagery 
data  bases.  Originally,  an  update  card  was  prepared 
manually  by  entering  the  segment  number  and  ac- 
quisition date  into  an  update  card.  This  data  base  was 
called  the  FLOCON  data  base  after  a CAMS  subele- 
ment responsible  for  CAMS  internal  operational 
flow  control. 

Once  segment  acquisitions  were  generated  in  the 
FLOCON  data  base  by  inputting  the  segment  num- 
ber, acquisition  date,  and  date  received  at  JSC,  there 
were  several  operational  requirements  imposed  on 
ASATS. 

First,  the  segment  was  checked  against  DAPTS  to 
assure  that  the  data  received  were  valid  and  that  they 
were  within  the  biological  window  dates  established 
for  analysis.  In  the  initial  ISATS,  these  requirements 
were  met  by  listing  the  two  data  sets  and  comparing 
them  visually.  In  later  systems,  this  became  an  auto- 
mated operation.  Also  in  the  later  systems,  a special 
report  was  formatted  for  printing  information  on 
gummed  labels.  These  labels  were  used  by  LPDL  for 
film  and  analysis  packet  identification. 

It  was  previously  stated  that  inputs  were  made 
into  the  DAPTS  data  base  regarding  the  availability 
of  analysis  products.  Some  of  these  analysis  products 
were  tracked  on  an  acquisition  basis.  These  were  the 
initial  film  products  and  selected  computer  products. 
Once  the  film  products  were  received  in  EOD,  an  up- 
date was  made  to  the  FLOCON  data  base  containing 
the  date  of  receipt  and  a comparison  was  made  to  in- 
formation in  the  DAPTS  data  base  to  determine  if 
the  key  ancillary  products  were  available.  If  all  the 
products  were  available,  the  segment  acquisition  was 
reported  to  CAMS  as  available  for  analysis.  Once 
more,  in  ISATS  this  was  done  manually:  it  was  auto- 
mated in  later  systems. 

The  report  listing  segments  available  for  process- 
ing became  known  as  the  CAMS  Order  Form  and 
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was  used  by  CAMS  to  request  data  packets  from 
LPDL. 

Once  the  packet  was  delivered  from  LPDL  to 
CAMS,  a card  was  input  to  the  system  denoting  the 
date  analysis  began.  At  various  stages  within  the 
analysis  cycle,  date  cards  were  input  to  ASATS  from 
different  stations  to  update  the  status  of  the  segment 
acquisition.  When  CAMS  completed  processing  of 
an  acquisition,  input  consisted  of  a card  containing 
the  date  the  results  were  sent  to  the  CAS,  a relative 
biostage  indicator,  a code  relating  to  the  process  used, 
and  an  estimate  of  data  quality.  This  essentially 
closed  the  tracking  record  for  a specific  acquisition. 

Various  reports  were  required  by  different  LACIE 
elements.  Because  of  changing  procedures,  these  re- 
port requirements  were  periodically  evaluated  for 
content  and  frequency  of  need.  After  each  evalua- 
tion, reports  were  often  dropped,  the  reporting  fre- 
quency changed,  or  two  or  more  reports  were  inte- 
grated to  make  a single  report. 

The  data  bases  were  updated  each  night  in  the 
batch  mode  and  the  standard  reports  were  run  at  this 
time.  Run  streams  were  set  up  for  batch  operations 
for  daily,  weekly,  and  monthly  reports.  The  proper 
run  stream  was  selected  by  LACIE  operations  at  the 
dose  of  business  each  day. 

Procedures  were  developed  so  that  LACIE  opera- 
tional elements  provided  operations  with  input  cards 
each  afternoon  and  received  their  output  reports  at 
the  next  morning's  operations  coordination  meeting. 

Special  queries  were  performed  by  the  Operations 
Section  on  request  from  LACIE  management.  Some 
queries  were  "canned”  ti.e..  preprogrammed  and 
maintained  on  the  disk  file),  with  arguments  for  ini- 
tialization and  execution  being  entered  interactively 
just  prior  to  batch  run  time. 

RIM3/ASATS  IMPLEMENTATION 

The  ASATS  design  is  based  on  achieving  max- 
imum use  of  RIMS  to  perform  ASATS  functions. 
The  design  provides  a standard  batch  update 
capability,  standard  reports  (both  periodic  and 
aperiodic),  an  ad  hoc  report  capability,  and  an  ad  hoc 
update  capability.  All  data  base  transactions  reflect- 
ing LACIE  activity  are  entered  as  standard  batch 
updates.  Ad  hoc  updates  are  used  normally  for  cor- 
recting sui.lt  problems  as  when  cards  arc  erroneously 
entered  into  the  system.  About  20  standard  reports 
currently  exist  for  the  system  (c.g.,  packet  order  re- 
port. biowindow  open/close  report).  Ad  hoc  reports 
•ire  requested  frequently  to  meet  special  needs;  they 


usually  originate  as  one-time  reports  but  sometimes 
become  standard  system  reports. 

The  ASATS  software  is  composed  of  special  pro- 
cessors built  for  ASATS  to  facilitate  the  auditing  of 
ASATS  data  base  updates,  RIMS  commands  aug- 
mented to  support  specific  ASATS  requirements, 
command  files  (sequences  of  RIMS  commands)  that 
will  cause  the  generation  of  specified  reports,  data 
base  definitions  describing  the  ASATS  data  base  to 
RIMS,  and  format  descriptions  describing  input  files 
and  report  formats. 

All  reports  and  ad  hoc  updates  are  made  using  the 
Data  Manipulation  Language  (DML)  of  the  RIMS 
data  base  management  syste:  • Standard  batch  up- 
dates were  implemented  using  data  preprocessor 
program  and  a special  data  base  u( . ne  program  (via 
FORTRAN  interface  with  RIMS). 

The  RIMS,  which  may  be  used  either  interactively 
or  in  batch,  provides  both  a Data  Definition 
Language  (DDL)  and  DML.  DDL  provides  for 
defining  the  data  base  structure,  input  formats,  and 
output  formats.  DML  provides  commands  which 
support  data  base  update  and  data  base  queries. 

Data  base  record  formats  arc  defined  in  terms  of 
field  names,  field  lengths,  data  type,  and  whether  or 
not  the  field  is  a key.  The  input  format  is  defined  in 
terms  of  field  name,  field  start  location  on  input 
record,  field  length,  and  an  input  verification  type  for 
a special  update  processor.  The  output  format  is 
defined  in  terms  of  field  name,  field  start  location  on 
output  record,  and  field  length. 

The  data  base  updah.  capabilities  include  adding  a 
new  record  or  modifying  existing  records  from 
records  on  an  external  file  or  changing  the  contents 
of  a field  or  fields  of  a record  or  records  w ithin  the 
data  base. 

Reporting  capabilities  include  output  of  field 
values,  specified  text,  and  arithmetic  computations 
for  records  in  the  data  base.  Formats  of  output  may 
be  determined  by  either  a predefined  format  or  by 
the  syntax  of  the  command.  The  ability  to  specify 
text  allows  for  the  annotation  of  reports.  Arithmetic 
and  statistical  computations  include  arithmetic  ex- 
pressions involving  field  values  and/or  numerical 
literal  values,  standard  deviations,  mean  values, 
maximum,  minimum,  summation,  and  count  of  oc- 
currences for  a group  of  records. 

Ciroups  of  records  for  update  or  reporting  max  be 
selected  by  explicit  identification  of  records,  kev 
field  value,  range  of  key  field  values,  arithmetic  rela- 
tionships between  fields  of  a record  or  hierarchical- 
related  records,  logical  relationships  between 
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records,  and/or  hierarchical  relationship  between 
records. 

The  RIMS  provides  for  device  independence  by  a 
command  that  allows  for  reassigning  system  flies,  in- 
cluding the  command  file,  the  data  input  file,  die 
message  file,  and  the  report  file.  This  feature  allows 
flexibility  in  how  the  system  is  used. 

Standard  reports  are  implemented  as  a file  of 
RIMS  commands.  These  files,  when  assigned  as  the 
RIMS  command  file,  produce  the  designed  report. 
Execution  of  individual  reports  during  the  nightly 
batch  run  is  caused  by  entering  the  file  name  to  be 
executed  in  a particular  file. 

Ad  hoc  reports  are  either  generated  from  com- 
mand files  or  produced  interactively  from  the  ter- 
minal. Ad  hoc  updates  are  generally  performed  in- 
teractively from  the  terminal. 

The  standard  batch  update  program  provides 
audit  reports,  tape  labels,  and  punch  cards  that  are 
reentered  into  the  system  in  addition  to  updating  the 
data  base.  The  preprocessor  program  sorts  cards  in  a 
particular  order  for  the  update  processor,  rejects  all 
cards  that  are  exact  duplicates,  rejects  invalid  card 
types,  and  separates  the  sorted  card  images  into  sepa- 
rate files  by  LAC1E  phase.  It  also  provides  audit  re- 
ports for  invalid  card  types,  cards  as  input,  and  cards 
as  sorted. 


LI8SONS  LEARNBO 

Each  LACIE  subsystem  had  the  responsibility  for 
definition  of  its  own  input,  data  flow,  software,  and 
output  requirements.  Only  after  these  requirements 
were  documented  and  approved  was  the  need  for  ex- 
tensive status  and  tracking  realized.  In  future 
systems,  more  emphasis  should  be  placed  on  early 
analysis  of  preliminary  subsystem  requirements  for 
the  definition  of  data  flow  and  status  points.  Further, 
a data  management  system  should  be  selected  very 
early  in  the  effort  in  order  to  help  ensure  that  neces- 
sary modifications  can  be  accomplished  before  the 
targeted  start-up  time. 

By  the  time  the  LACIE  status  and  tracking 
requirements  were  identified,  the  acquisition  and 
throughput  of  data  were  imminent.  This  led  to  the 
selection  of  an  interim  system  with  practically  no 
data  management  capability.  The  next  step  taken 
was  to  a commercial  system  which  gave  limited,  but 
increased,  flexibility  at  great  expense.  Only  after  a 
long  experience  and  development  effort  was  a 
system  realized  that  provided  the  required  flexibility 
as  well  as  a reasonable  operational  cost. 
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LACIE  Quality  Assurance 

G.  L.  Gutschewskfi 


INTRODUCTION 

This  paper  describes  and  explains  the  LACIE 
Quality  Assurance  (QA)  Program.  It  addresses  the 
beginnings  of  QA,  the  objective  and  concept  of 
LACIE  QA,  its  responsibilities,  and  its  accomplish- 
ments. 

What  did  QA  do  for  LACIE?  What  were  the 
methods  and  rationale  of  the  QA  group?  This  paper 
will  provide  answers  to  these  questions  but  will  not 
delve  into  detail  on  how  the  QA  tasks  were  per- 
formed. The  reader  is  referred  to  the  LACIE  Quality 
Assurance  Program  Plan  (ref.  1,  section  S.O)  and  the 
LACIE  Quality  Assurance  Procedures  for  more 
detail  on  specific  tasks. 

The  LACIE  Quality  Assurance  Program  Plan 
delineates  the  QA  system  to  include  all  of  the 
organizational  elements  within  LACIE  (listed  in  ref. 
1,  section  2.0)  and  has  the  goal  of  assisting  all  of 
these  organizations  in  attaining  the  highest  level  of 
performance  possible.  This  paper  will  describe  the 
extent  to  which  this  was  done. 


ORIGIN  OF  LACIE  QUALITY  ASSURANCE 

At  the  beginning  of  LACIE,  there  were  references 
in  such  documents  as  the  LACIE  Project  Plan  (ref. 
2)  on  the  need  for  quality  control,  but  no  definitive 
statements  were  made  on  the  direction  or 
methodology  of  this  function.  This  situation 
prompted  two  Review  Item  Dispositions  (RID's)  in 
December  of  1974.  One  of  the  RID’s  defined  the 
need  for  a QA  plan,  and  the  other  defined  the  need 
for  a data  quality  plan.  These  two  RID’s  were  major 
factors  in  the  decision  to  establish  a LACIE  QA  pro- 
gram. 

When  the  decision  was  made  to  have  a formal 
LACIE  QA  program,  it  became  necessary  to  develop 
a concept  as  to  what  type  of  QA  program  would  work 
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best  in  LACIE.  A LACIE  QA  group  was  established 
to  develop  a LACIE  Quality  Assurance  Program 
Plan  encompassing  all  of  the  LACIE  organizations 
and  to  assure  that  this  plan  was  implemented.  The 
size  of  this  QA  group  (3.5  man-year  equivalents)  and 
the  number  and  diversification  of  the  LACIE 
organizations  necessitated  that  each  organization 
perform  its  own  QA  tasks  with  the  QA  group  check- 
ing periodically  to  assure  that  the  tasks  were  being 
performed  satisfactorily.  Keeping  these  factors  in 
mind,  the  next  step  was  the  actual  development  of 
the  LACIE  QA  program. 


DEVELOPMENT  OF  THE  LACIE  QUALITY 
A88URANCE  PROGRAM 

In  the  development  of  the  LACIE  QA  program, 
the  first  tasks  undertaken  by  the  LACIE  QA  group 
were  (1)  the  preparation  of  a QA  program  plan,  (2) 
the  definition  of  .QA  checkpoints  which  each 
organization  should  include  in  its  operations,  (3)  the 
formulation  of  operational  procedures  by  each  of  the 
LACIE  organizations,  and  (4)  the  compilation  of  a 
document  entitled  “LACIE  Quality  Assurance  Pro- 
cedures," which  included  all  the  QA  checkpoints  for 
each  of  the  LACIE  organizations.  All  these  tasks 
were  interrelated  and  were  necessary  for  the  develop- 
ment of  a useful  QA  function  which  could  help  the 
LACIE  organizations  attain  the  highest  level  of  per- 
formance possible.  Following  are  comments  on  the 
four  tasks. 


Preparation  of  the  Plan 

The  LACIE  Quality  Assurance  Program  Plan  (ref. 
I)  was  required  to  define  the  QA  functions. 
Whenever  a function  was  defined  and  agreed  upon 
informally  by  management,  the  implementation  of 
that  function  was  begun  immediately.  For  example, 
very  early  in  LACIE,  it  was  known  that  procedures 
and  products  would  be  audited  regularly;  so  a set  of 
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procedures  Tor  auditing  was  prepared  as  quickly  as 
practical.  Therefore,  when  a given  set  of  operational 
procedures  was  available,  QA  immediately  began 
auditing  those  procedures.  Thus,  by  the  time  a for* 
mal  plan  was  approved,  a large  portion  of  *ie  QA 
effort  was  already  implemented,  and  the  effects  of 
good  quality  control  had  been  present  for  a major 
portion  of  the  project. 


Definition  of  Checkpoint* 

When  the  QA  effort  started,  quality  control  in  the 
project  was  almost  nonexistent;  the  quality  control 
that  did  exist  was  not  formalized  or  consistent.  The 
definition  of  the  LACIE  QA  checkpoints  within 
each  organization's  operations  was  a necessary  fust 
step  in  establishing  formal  quality  control  in  the  proj* 
ect.  This  was  a cooperative  effort  between  the  QA 
organization  and  the  other  LACIE  elements. 


Formulation  of  Operational  Procedures 

While  the  QA  plan  was  being  developed  and  the 
QA  checkpoints  were  being  established,  the  concur- 
rent effort  of  writing  formal  procedures  for  each  of 
the  LACIE  organizations  was  begun  by  the  organiza- 
tions. There  was  reluctance  by  some  organizations  to 
write  procedures  because  they  did  not  understand 
the  value  of  procedures  in  their  operations  and 
because  this  task  impacted  their  resources.  This 
delayed  some  of  the  procedures  as  long  as  a year  into 
the  project.  However,  most  of  the  organizations  were 
cooperative  and  proceeded  to  write  their  procedures 
as  quickly  as  they  could,  considering  their  opera- 
tional constraints.  This  task  did  not  end  with  the  first 
writing  but  continued  throughout  LACIE  as  new 
techniques,  hardware,  and  software  were  developed 
and  incorporated  into  the  experiment. 


Compilation  of  Procedures  Document 

As  the  QA  checkpoints  for  each  organization  were 
established,  the  QA  organization  compiled  them  into 
a general  document  which  included  the  QA  pro- 
cedures for  every  LACIE  organization  functioning  at 
the  time  of  publication.  These  procedures  were 
audited,  and  the  results  of  that  audit  were  included  in 
the  document,  making  it  a representative  document 
at  the  time  it  was  compiled. 


The  compilation  of  a QA  procedures  document 
was  a necessary  first  step  ^ establishing  a system  of 
quality  control  for  LACIE.  This  document  estab- 
lished guidelines  for  writing  the  operational  pro- 
cedures and  for  checking  the  output  of  the  various 
organizations  and  provided  the  mental  discipline 
whereby  the  LACIE  organization  could  achieve  a 
high  level  of  performance.  The  document  was  even- 
tually absorbed  into  the  operational  procedures  as 
they  were  written. 

While  this  document  was  being  compiled,  the 
LACIE  Quality  Assurance  Program  was  being  imple- 
mented. Therefore,  when  the  LACIE  Quality 
Assurance  Procedures  document  was  completed  in 
July  of  1975,  the  nuuor  portions  of  the  QA  program 
were  close  to  full  implementation.  Some  areas  of 
responsibility  were  not  fully  implemented  as  quickly 
as  others;  but,  basically,  from  August  of  1975  until 
December  of  1977,  most  of  the  QA  activities  were 
being  implemented. 


LACIE  QUALITY  A88URANCE 
RESPONSIBILITIES 

This  section  will  cover  the  QA  responsibilities  of 
all  of  the  organizational  elements  of  LACIE,  includ- 
ing the  LACIE  QA  group. 


Audits 

The  QA  group  conducted  audits  of  both  pro- 
cedures and  products.  The  audits  on  procedures  were 
conducted  to  ensure  that  the  procedures  were  being 
followed  or  that,  if  necessary,  the  procedure  was  up- 
dated in  a timely  manner.  The  audits  on  products 
were  conducted  to  ensure  adherence  to  specifica- 
tions. More  simply.  QA  checked  to  see  if  the  user 
was  satisfied  with  the  product.  If  the  user  was  not 
satisfied,  either  a work-around  technique  was 
devised,  the  product  was  upgraded, or  both.  Also,  the 
question.  “Is  this  product  really  necessary?"  was 
asked. 

Following  are  several  examples  of  problem  areas 
encountered  during  QA  auditing. 

1.  Frequently,  there  was  a slow  turnaround  on  up- 
dating procedures.  This  was  caused  by  a lack  of 
resources  and  a formal  procedure  for  updating  pro- 
cedures. Repeated  attempts  to  get  a formal  procedure 
for  updating  procedures  approved  failed. 

2.  At  the  beginning  of  the  experiment  and  at 
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various  other  times  during  the  experiment,  a lack  of 
procedures  caused  unnecessary  operational  prob- 
lems. These  unnecessary  problems  were  often 
caused  by  the  personnel  not  having  a clear  definition 
of  their  duties,  and  the  solution  to  the  problems  was 
a good  set  of  procedures. 

3.  In  some  situations,  there  was  an  absence  of  a re- 
quired product  or  the  product  did  not  meet  specifica- 
tions. This  lack  of  products  or  inferiority  of  a product 
caused  some  groups  to  find  alternate  solutions,  thus 
wasting  precious  resources. 

4.  In  the  early  days  of  LACIE,  at  one  time  or 
another,  some  of  the  organizational  elements  were  in 
doubt  as  to  the  products  they  should  be  providing  or 
receiving  or  where  to  deliver  or  receive  their 
products. 

As  a solution  to  problems  (3)  and  (4),  formal 
product  lists  and  their  regular  verification  were  es- 
tablished. 

The  answer  to  these  and  similar  problems  is  a 
good  quality  control  system  that  includes  definitive 
procedures,  QA  checkpoints  in  the  procedures,  prod- 
uct checklists,  and  audits  of  the  procedures  and  prod- 
ucts. The  quality  control  of  the  procedures  and  prod- 
ucts was  considered  the  primary  duty  of  QA,  and  this 
is  the  area  where  the  most  benefit  was  provided  to 
LACIE.  QA  accomplished  its  objectives  in  this  area. 


Discrepancy  Reporting 

The  QA  group  was  responsible  for  maintaining 
and  coordinating  a LACIE  Discrepancy  Reporting 
System.  This  system  was  a means  of  monitoring  the 
problems  in  LACIE;  of  determining  problem  areas 
that  needed  extra  attention;  and  of  statusing,  report- 
ing. categorizing,  and  documenting  the  various  prob- 
lems in  LACIE.  The  LACIE  Discrepancy  Reporting 
System  was  not  all-inclusive  of  LACIE  problems, 
since  some  problems  were  never  written  on  Discrep- 
ancy Reports  (DR's)  but  were  handled  via 
memorandum  or  personal  contact.  However,  a high 
percentage  of  operational  problems  have  been  docu- 
mented by  the  Discrepancy  Reporting  System. 

Examples  of  major  problem  areas  in  discrepancy 
reporting  are  as  follows. 

1.  Division  of  authority.  The  many  political  boun- 
daries in  the  LACIE  system  proved  a tremendous 
obstacle  to  the  smooth  and  efficient  operation  of  the 
LACIE  Discrepancy  Reporting  System.  Constant  in- 
teraction by  QA  with  the  various  groups  finally 
resulted  in  a relatively  efficient  system  which  pro- 


vided the  representative  status  of  LACIE. 

2.  Untimely  response  to  DR's.  Some  of  the 
organizations  in  LACIE  did  not  respond  in  a timely 
manner  to  DR's,  even  though  they  might  have 
solved  the  problem  in  question.  The  degree  of 
seriousness  of  this  problem  varied  between  organiza- 
tions. The  QA  group  at  times  had  to  check  the  inter- 
nal records  of  some  organizations  and  urge  them  to 
respond  to  the  DR's.  This  slow  response  to  DR's  did 
not  help  to  impress  on  personnel  the  usefulness  of 
the  Discrepancy  Reporting  System. 

3.  Reluctance  to  write  DR's.  At  the  beginning  and 
at  the  end  of  LACIE.  some  of  the  organizations  were 
reluctant  to  write  DR's,  and  QA  constantly  had  to 
urge  personnel  to  perform  this  task. 


Test  Certification 

The  LACIE  Quality  Assurance  Program  Plan 
defines  the  acceptance  testing  function  as  being  ap- 
plicable to  any  of  the  LACIE  organizations. 
However,  the  test  certification  effort  of  the  LACIE 
QA  group  was  concentrated  in  the  Data  Techniques 
Laboratory  (DTL)  of  the  Earth  Observations  Divi- 
sion (EOD)  in  Building  17  of  the  NASA  Johnson 
Space  Center  (JSC).  However,  the  LACIE  QA  group 
was  available  to  investigate  other  support  areas  if  re- 
quested by  management. 

There  were  two  mqjor  reasons  why  the  LACIE 
QA  group  usually  participated  primarily  in  the  EOD 
acceptance  testing: 

1.  Resources.  The  available  manpower  in  the 
LACIE  QA  effort  and  the  time  necessary  to  run  an 
acceptance  test  prohibited  the  QA  group  from 
monitoring  the  acceptance  tests  of  organizations  out- 
side EOD.  For  example,  hundreds  of  acceptance 
tests  were  run  in  EOD;  some  took  a few  hours,  and 
others  took  several  days  or  more.  Generally,  support 
organizations  external  to  EOD  would  test  their  soft- 
ware and  hardware  systems  prior  to  releasing  them 
to  EOD. 

2.  External  organizations.  Many  organizations 
outside  EOD  were  performing  acceptance  testing  but 
had  their  own  QA  monitors.  These  included  the  JSC 
Ground  Data  Systems  Division  (GDSD)  and  the 
Goddard  Space  Flight  Center  (GSFC). 

Three  problems  related  to  test  certification  existed 
at  the  beginning  of  LACIE  QA:  (1 ) inadequate  docu- 
mentation of  test  plans  and  procedures.  (2)  lack  of 
internal  testing  of  software  before  test  scheduling, 
and  (3)  lack  of  adherence  to  test  plans  and  pro- 
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cedures.  These  problems  were  quickly  Solved. 

Acceptance  tests  were  required  to  assure  that  a 
new  hardware,  software,  or  procedural  system  met 
specifications  and/or  performed  as  represented  by  its 
manufacturer  or  originator.  To  install  a new  system 
or  add  to  an  existing  working  system  without  testing 
the  new  component  is  to  invite  total  system  failure 
or  an  immense  load  of  useless  data.  Thus,  to  verify 
that  the  new  system  or  component  was  ready  to  be 
part  of  the  operation,  the  LACIE  organizations  were 
required  to  write  a test  plan  and  a test  procedure,  to 
review  and  approve  them  formally,  and  then  to 
adhere  to  their  own  test  specifications.  This  pro* 
cedure  assured  that  a reliable  test  was  performed  and 
verified  that  the  new  component  was  ready  to  be  part 
of  the  operational  system.  Unfortunately,  this  pro* 
cedure  was  usually  opposed;  however,  to  avoid  need- 
less operational  problems  and  sometimes  total  shut- 
down of  an  entire  operational  system,  the  QA  group 
had  to  rigidly  enforce  test  plans  and  procedures. 


Procedures  Reviews 

Whenever  a new  or  revised  procedure  was  issued, 
LACIE  QA  would  review  it  for  its  adequacy  as  a pro- 
cedure. In  addition,  the  adequacy  of  the  QA  check- 
points contained  in  the  procedures  would  be 
reviewed.  Initially,  many  of  the  groups  writing  their 
procedures  did  not  understand  that  they  should  write 
how  to  perform  each  task.  As  a result,  some  of  the 
procedures  were  inadequate.  As  these  groups  came  to 
understand  the  purpose  of  the  procedures,  this  prob- 
lem disappeared. 


Other  Tasks 

In  addition  to  the  responsibilities  just  described, 
the  LACIE  QA  function  included  defining  QA  re- 
quirements. giving  status  reports  of  QA  activities, 
providing  audit  reports,  assisting  the  LACIE 
organizational  elements  on  QA  policies  and  pro- 
cedures, identifying  necessary  configuration 
changes,  and  making  recommendations  to  manage- 
ment. 

The  QA  responsibilities  of  the  LACIE  organiza- 
tional elements  exclusive  of  the  LACIE  QA  group 
can  be  stated  briefly  as  follows  and  are  included  in 
the  LACIE  Quality  Assurance  Program  Plan. 

1.  Procedures  preparation.  Each  organizational 
element  was  responsible  for  writing  and  updating  its 


operational  procedures,  including  the  QA  in-process 
checkpoints, 

2.  Discrepancy  reporting.  Each  organizational  ele- 
ment was  responsible  for  originating  and  replying  to 
DR's  relative  to  its  area  of  interest. 

3.  Test  controls.  Each  organizational  element  was 
responsible  for  testing  software,  hardware,  data  flow, 
and  techniques  within  their  respective  areas  of 
responsibility.  Test  plans  and  procedures  were  sub- 
ject to  review.  The  actual  acceptance  test  was  subject 
to  monitoring  by  QA. 


INTERNAL  QUALITY  ASSURANCE  SUPPORT 

Four  of  the  organizational  elements  within 
LACIE— the  DTL,  the  GDSD,  the  Mission  Planning 
and  Analysis  Division  (MPAD),  and  GSFC— re- 
tained their  own  formal  QA  group. 


Data  Techniques  Laboratory 

The  DTL  of  the  EOD  had  a one-man  QA  effort 
which  consisted  of  monitoring  the  internal  DTL  Dis- 
crepancy Reporting  System,  approving  the  closure  of 
a DR,  and  approving  acceptance  tests  before  they 
could  be  closed. 


Ground  Data  8yatsma  Division 

The  GDSD  supported  LACIE  through  the 
LACIE/Earth  Resources  Interactive  Processing 
System  (LACIE/ERIPS).  This  group  had  several 
aspects  of  QA  performed  internally. 

1.  The  IBM  Corporation  provided  some  system 
design,  wrote  the  software,  tested  the  software  before 
it  was  made  operational,  and  responded  to  DR's  on 
the  software. 

2.  The  GDSD  QA  personnel  monitored  the  ac- 
ceptance tests  and  the  internal  Discrepancy  Report- 
ing System  in  JSC  Building  30. 

3.  The  MPAD  provided  internal  QA  support 
(more  fully  described  in  the  next  section). 


Miaaiort  Planning  and  Analyala  Dlviaion 

The  MPAD  provided  QA  support  to  LACIE  in 
two  major  modes. 

1.  Primary  mode.  In  support  of  GDSD.  the 


252 


MPAD  provided  LACIE  with  independent  testing  of 
the  LACIE/ERIPS  software/hardware  system.  The 
MPAD  goal  was  to  assure  that  the  system  met  the  re* 
quirements,  that  the  quality  of  the  system  output 
products  was  consistent  with  the  quality  of  the  input 
data,  and  that  the  sy  stem  performance  remained  sta- 
ble. 

2.  Secondary  mode.  In  support  of  EOD,  the 
MPAD  provided  independent  technical  evaluations 
in  problematic  technical  areas  and  in  the  technical 
performance  of  operational  output. 

Detailed  information  on  the  MPAD  activities  is 
available  (ref.  3),  but  the  following  is  a brief  delinea- 
tion of  the  tasks  performed  by  MPAD. 

Input  imagery  evaluation 

1.  Imagery  screening  with  film.  The  film  coming 
out  of  the  LACIE/ERIPS  system  was  screened  daily 
for  both  LACIE/ERIPS  problems  and  GSFC  prob- 
lems. 

2.  Imagery  screening  on  the  LACIE/ERIPS.  The 
imagery  was  screened  on  the  LACIE/ERIPS  pri- 
marily for  GSFC  problems  but  also  to  compare 
LACIE/ERIPS  film  with  the  original  imagery.  This 
task  gradually  phased  out  but  was  replaced  by  such 
tasks  as  the  development  of  an  automated  cloud 
screening  capability. 

3.  imagery  registration  evaluation  with  film.  The 
object  of  this  task  was  to  assess  visually,  using  film, 
the  accuracy  of  registration  between  sample  seg- 
ments for  a given  site.  This  task  continued 
throughout  LACIE. 

4.  Imagery  registration  validation  with  the  Se- 
quential Similarity  Detection  Algorithm  (SSDA).  A 
computer  algorithm  called  SSDA  was  used  to  evalu- 
ate selected  segments  to  determine  the  magnitude  of 
misregistration  and  to  detect  subtle  registration  er- 
rors. Written  reports  were  provided  on  these  seg- 
ments. 

Fields  definition  evaluation 

1.  FHd  definition  screening  with  film.  In  this 
task,  the  production  film  converter  (PFC)  product 
12  (field  boundary  overlay)  was  used  to  check  the 
field  definitions  for  such  errors  as  overlapping  fields 
and  misplaced  vertices.  PFC  product  12  was  a com- 
puter plot  on  film  of  the  analyst's  fields  in  a 
classification.  This  task  was  phased  out  when 
LACIE  Procedure  l became  operational. 

2.  Field  definition  screening  on  the  LACIE/ 
ERIPS.  A more  detailed  analysis  of  fields  definitions 
was  performed  on  those  selected  segments  being  pro- 
cessed under  the  Classification  and  Mensuration 
Subsystem  (CAMS)  product  evaluation  procedures 


where  anomalies  were  observed.  Using  the 
LACIE/ERIPS  for  this  task  provided  statistical  and 
measurement  capabilities  not  available  when  PFC 
product  12  was  used  alone. 

Software  confidence  /«//!«.— This  testing  was  con- 
ducted on  a regular  basis  and  consisted  of  a represen- 
tative sequence  of  production-oriented  operations 
using  known  input  data.  This  was  done  to  determine 
that  the  same  input  data  processed  in  the  same  man- 
ner will  produce  the  same  results  on  each  of  the 
LACIE/ERIPS  software  systems  (versions).  It  estab- 
lished the  consistency,  reliability,  and  accuracy  of  the 
relative  systems. 

CAMS  product  evaluation.— As  an  independent 
check  of  the  CAMS  analysts,  selected  segments  were 
processed  on  the  LACIE/ERIPS  using  the  field 
definitions  of  the  CAMS  analysts.  Any  errors,  such 
as  overlapping  fields,  were  delineated  in  a detailed  re- 
port provided  to  the  EOD.  This  was  an  independent 
check  on  the  analysts  and  the  system.  This  task  was 
gradually  phased  out. 

Problem  isolation  and  error  analysis. — This  task 
consists  of  the  attention  to  special  problems  and  the 
attendant  efforts  at  possible  solutions.  Special  studies 
may  be  placed  in  this  category,  as  they  usually  were 
performed  with  a specific  problem  in  mind,  on  re-- 
quest  and  sometimes  at  the  initiative  of  MPAD. 

Quality  assurance  data  base. — The  MPAD  kept  a 
v 'mputer  record  of  its  findings  on  the  data  studied 
and  could  recall  portions  of  this  information  from  its 
computer. 


Goddard  Space  Plight  Center 

To  complete  the  overall  view  of  the  QA  being  per- 
formed in  LACIE,  the  QA  tasks  being  performed 
at  GSFC  are  listed  under  their  general  headings— 
inspection  of  LACIE  sample  segments  from  imagery 
generated  on  the  color  film  recorder  or  black  and 
white  film  recorder;  inspection  and  analysis  of  out- 
put from  the  General  Purpose  Image  Preprocessor 
(GPIP)  line  printer  and  teletype:  data  retrieval  from 
the  GPIP  line  printer;  and  reporting  the  LACIE  QA 
assessment  of  these  to  the  Special  Projects  Group  in 
the  production  control  section  for  the  Landsat 
project. 

The  actual  number  of  QA  data  inspections  is  so 
great  that  it  is  impractical  to  include  them  in  this 
document.  However,  the  following  are  given  as  ex- 
amples to  show  the  thoroughness  of  QA  at  GSFC; 
Landsat  identification,  correlation  checks,  film  Hag 


checks,  cloud  pixel  checks,  edge  threshold  end  edge 
density  checks,  alinement  checks,  Sun  elevation  and 
Sun  azimuth  checks,  correlation  parameters  report, 
image  data  geometry,  and  pixel  dropouts.  The  reader 
is  referred  to  the  OSFC  detailed  quality  assurance 
procedures  (ref.  4)  for  more  detail. 


ACCOMPLISHMENTS 

The  foregoing  sections  demonstrate  the  breadth 
and  thoroughness  of  LACIE  QA.  Some  of  the  results 
or  benefits  of  the  LACIE  QA  program  are  discussed 
in  die  following  paragraphs. 


Quality  Control 

As  the  QA  procedures  came  into  effect,  the  num- 
ber of  DR's  diminished;  the  reprocessing  of  com- 
puter tasks  diminished;  the  technical  errors 
decreased;  and,  generally,  the  overall  efficiency  of 
the  organizational  elements  increased.  The  reliability 
of  the  LACIE  product  increased,  and  the  products 
became  more  measurable  and  more  consistent  as  a 
result  of  quality  control. 


Proeotfuroa 

Probably  the  single  most  important  result  that  the 
LACIE  QA  group  accomplished  was  to  pressure  ati 
the  organizational  elements  to  write  procedures  and 
keep  those  procedures  updated.  In  addition  to  estab- 
lishing and  maintaining  quality  control,  the  docu- 
mented procedures  became  a tremendous  source  of 
information  about  LACIE.  The  documentation  of 
technical  and  operational  changes,  technical  or 
operational  mistakes,  and  historical  information  is 
an  invaluable  aid  in  writing  LACIE  symposium  or 
follow-on  papers  and  as  a reference  for  future  plan- 
ners. 

The  procedures  also  saved  time  and  resources.  For 
example,  in  the  beginning  stages  of  LACIE,  some  in- 
dividuals were  spending  much  time  and  resources 
trying  to  determine  how  to  perform  their  tasks  or 
even  what  their  task  was.  An  evidence  of  this  situa- 
tion was  the  fact  that,  at  the  beginning  of  LACIE,  the 
DR's  pertaining  to  procedural  errors  comprised  more 
than  50  percent  of  the  total  number  of  DR's  In  con- 
trast, when  all  the  organizations  had  written  pro- 


cedures at  the  end  of  Phase  lit,  the  number  of  pro- 
cedural DR’s  usually  was  less  than  3 percent  (weekly 
basis).  As  a matter  of  fact,  the  number  of  procedural 
errors  in  a given  organizational  area  would  drop 
drastically  when  good  operational  procedures  were 
provided  to  those  performing  the  tasks. 


Product  Definition 

For  each  of  the  three  phases  of  LACIE  and  during 
die  Transition  Year,  each  functional  element  was  re- 
quired to  review  its  requirements  and  input  to 
LACIE  QA  a list  of  all  output  products  and  the  prod- 
ucts required  to  complete  the  assigned  tasks.  QA 
would  then  compile  a complete  list  for  LACIE  and 
verify  the  list  through  an  audit  If  a deficient  product 
was  found,  a solution  was  agreed  upon  by  the  receiv- 
ing and  the  providing  organizations.  If  an  excess 
product  was  found,  it  was  eliminated.  Compiling  this 
product  list  was  a very  difficult  task  in  Phase  1,  but  in 
the  subsequent  phases  it  became  much  easier. 

What  was  the  value  of  this  exercise?  First,  the 
compilation  of  these  product  lists  on  a regular  basis 
forced  the  LACIE  organizations  to  review  their  re- 
quirements on  a regular  basis;  in  fact,  this  was  the 
only  review  of  requirements  performed  in  a formal, 
systematic  method.  Secondly,  the  elimination  of  ex- 
cess products  and  the  attention  to  problem  areas 
resulted  in  much  more  efficient  LACIE  operations. 
And,  lastly,  the  actual  operational  requirements  of 
LACIE  are  documented  as  products  in  the  product 
lists. 


Discrepancy  Reporting  System 

Any  system,  program,  or  project  needs  a method 
of  documenting,  statusing  and  tracking,  and  monitor- 
ing its  problems  and  directing  the  solutions  to  these 
problems.  The  method  used  in  LACIE  is  called  the 
Discrepancy  Reporting  System. 

The  LACIE  Discrepancy  Reporting  System  docu- 
mented the  reported  problems,  providing  a summary 
reference.  Documenting  the  problems  helped  to 
avoid  rep>  . <ng  the  same  mistakes.  The  statusing  and 
tracking  of  discrepancies  helped  to  avoid  system 
shutdown  by  pointing  out  problem  areas  that  needed 
immediate  attention.  Monitoring  and  directing  the 
solutions  to  the  various  problems  assured  LACIE  of 
adequate  solutions  to  the  problems,  thereby  improv- 
ing the  system's  performance. 
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Product  Definition 

For  each  of  the  three  phases  of  L ACIE  and  dining 
the  Transition  Year,  each  functional  element  was  re* 
quired  to  review  its  requirements  and  input  to 
LACIE  QA  a list  of  all  output  products  and  the  prod* 
ucts  required  to  complete  the  assigned  tasks.  QA 
would  then  compile  a complete  list  for  LAGE  and 
verify  the  list  through  an  audit  If  a deficient  product 
was  found,  a solution  was  agreed  upon  by  the  receiv- 
ing and  the  providing  organizations.  If  an  excess 
product  was  found,  it  was  eliminated.  Compiling  this 
product  list  was  a very  difficult  task  in  Phase  I,  but  in 
the  subsequent  phases  it  became  much  easier. 

What  was  the  value  of  this  exercise?  First  the 
compilation  of  these  product  lists  on  a regular  basis 
forced  the  LACIE  organizations  to  review  their  re- 
quirements on  a regular  basis;  in  fact,  this  was  the 
only  review  of  requirements  performed  in  a formal, 
systematic  method.  Secondly,  the  elimination  of  ex- 
cess products  and  the  attention  to  problem  areas 
resulted  in  much  more  efficient  LAGE  operations. 
And,  lastly,  the  actual  operational  requirements  of 
LAGE  are  documented  as  products  in  the  product 
lists. 


Discrepancy  Reporting  System 

Any  system,  program,  or  project  needs  a method 
of  documenting,  sutusing  and  tracking,  and  monitor- 
ing its  problems  and  directing  the  solutions  to  these 
problems.  The  method  used  in  LACIE  is  called  the 
Discrepancy  Reporting  System. 

The  LACIE  Discrepancy  Reporting  System  docu- 
mented the  reported  problems,  providing  a summary 
reference.  Documenting  the  problems  helped  to 
avoid  repeating  the  same  mistakes.  The  statusing  and 
tracking  of  discrepancies  helped  to  avoid  system 
shutdown  by  pointing  out  problem  areas  that  needed 
immediate  attention.  Monitoring  and  directing  the 
solutions  to  the  various  problems  assured  LACIE  of 
adequate  solutions  to  the  problems,  thereby  improv- 
ing the  system's  performance. 


caused  by  a defective  component.  This  new  system 
or  component  could  be  software,  hardware,  or  even 
procedures' in  certain  cases.  With  a reasonable 
amount  of  testing,  most  of  the  major  Impacts  of  new 
systems  or  components  can  be  avoided.  This  is  what 
the  LACIE  QA  acceptance  testing  accomplished. 


CONCLUSION 

The  foregoing  has  stated  simply  what  the  LAGE 
QA  program  did  and  why,  so  that  both  the  critics  and 
the  defenders  of  QA  can  appreciate  the  magnitude  of 
the  task.  This  paper  does  not  delve  deeply  into  the 
details  of  QA.  Rather,  it  was  designed  to  give  the 
reader  a better  understanding  of  the  whys  and 
wherefores  of  QA  and  the  contributions  of  the 
LACIE  QA  effort. 


1.  LACIE  Quality  Assurance  Program  Plan.  LACIE.C00626, 
NASA  Johnion  Space  Center,  Aug  1977. 

2.  LACIE  Project  Plan.  LACIE-C0060S,  NASA  Johnion  Space 
Center,  Aug.  1975. 

3.  MPAD  LACIE  Product  Quality  A Me  lament  Software  and 
Procedure!  — Program  Detcription  and  Computer  Opera- 
tioni.  Kept.  T-00722,  NASA  Johnion  Spaa  Center.  Nov. 
1977. 

4.  DAPTS-GSFC  Detailed  Quality  A uu ranee  Inspection  Pro- 
cedures. Part  I.  LACIE-00721.  NASA  Johnson  Space  Center. 
Nov.  1977. 


Aecoptanco  Totting 

An  acceptance  test  of  a new  system  or  component 
prior  to  operation  is  necessary  to  prevent  the  ab- 
solute operational  shutdown  caused  by  the  new 
system  or  to  prevent  the  operational  disturbances 
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Operations  Reporting 

R.  G.  Musfrove0  and  Dale  R.  Marquis* 


The  need  for  operations  reporting  had  become  ob- 
vious to  the  personnel  involved  in  establishing,  coor- 
dinating, and  monitoring  data  flow  in  the  infancy  of 
the  LACIE  operations  in  the  fell  of  2974.  Even 
though  all  subsystem  elements  were  sincerely  in- 
terested in  and  concerned  with  ensuring  the  success 
of  LACIE  operations,  an  effective  coordination  and 
integration  function  was  required  to  provide  the 
cohesion  necessary  for  a smooth  operational  system. 
In  fact,  the  operations  coordination,  integration,  and 
management  function  required  effective  operations 
reporting  for  it  to  succeed. 

Anytime  a complex  data  flow  is  constructed, 
individual  components  will  break  down,  bottlenecks 
will  occur,  and  backlogs  will  build.  It  quickly  became 
apparent  during  LACIE  Phase  I that  the  very  simple 
accounting  originally  envisioned  was  not  providing 
sufficient  information  regarding  the  status  of  the 
data  system.  While  accountability  was  kept  in  terms 
of  raw  numbers  for  data  received  from  the  NASA 
Goddard  Space  Flight  Center  (GSFC)  or  for  data  in 
work,  nothing  was  really  known  about  the  con- 
stituency of  these  numbers.  For  example,  if  equip- 
ment broke  down  and  a bottleneck  occurred,  it  was 
of  course  known.  What  was  not  known,  however, 
was  whether  the  data  involved  were  U.S.  segments, 
U4uS.R.  segments,  spring  or  winter  wheat  data,  etc. 
The  inability  to  provide  appropriate  information  or 
relevant  statistics,  lu  alone  a coherent  status, 
prompted  the  need  fsr  a comprehensive  status, 
tracking,  and  reporting  system.  Thus,  operations  re- 
porting began  with  an  effort  to  satisfy  a need  for  im- 
proved communication  among  personnel  not  only  at 
the  working  level  but  at  the  project  management 
level  as  well. 

One  of  the  first  goals  of  operations  reporting  was 
to  ensure  that  management  understood  the  basic 
operational  data  flow  and  its  attendant  data  handling 
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operations  and,  just  as  importantly,  the  constraints 
on  the  operational  system.  LACIE  management  also 
had  to  be  apprised  of  accomplishments,  problem 
areas,  etc.  This  requirement  influenced  the  evolution 
of  operations  reporting  ss  strongly  as  tlm  need  for 
basic  operations  information  at  the  working  level. 

As  these  various  requirements  became  under- 
stood, specific  data  and  information  requirements 
were  in  turn  placed  on  the  appropriate  para  of  the 
operational  system.  These  requirements  were  the 
driving  force  for  the  development  of  the  Automated 
Status  and  Tracking  System,  ’vhich  is  the  subject  of  a 
separate  paper. 

The  development  of  the  information  require- 
ments began  by  identifying  the  critical  stages  in  the 
operational  data  flow.  Then  the  parameters  for  re- 
porting the  progress  of  LACIE  operations  end  the 
operations  status  were  identified.  Specific  informa- 
tion requirements  for  input  into  the  operations  re- 
porting system  were  levied  on  the  operational  ele- 
ments. The  operational  elements  responded  to  these 
requirements  by  supplying  the  inputs  derived  from 
manual  systems  or.  in  some  instances,  from  auto- 
matic systems  devised  by  the  operational  groups 
themselves. 

The  preceding  papers  show  how  involved  the  flow 
of  data  through  the  LACIE  system  is.  The  process 
starts  at  the  NASA  Johnson  Space  Center  (JSC)  with 
the  ordering  of  data  to  be  collected  by  the  satellite. 
Once  the  data  is  received  at  GSFC  from  Lardsat,  it 
must  undergo  a number  of  screenings  before  it  is 
shipped  to  JSC. 

0«e  of  the  first  reporting  and  statusing  systems 
instituted  in  LACIE  was  at  GSFC.  Originally,  the 
hope  had  been  to  have  a “full-up"  status  and  tracking 
system  for  dtta  in  the  GSFC  system  much  like  the 
one  established  at  JSC.  For  a number  of  reasons  (pri- 
marily resource),  an  all-encompassing  system  that 
could  be  interrogated  was  never  implemented.  For- 
tunately, however,  it  was  possible  to  prioritize  the  re- 
porting needs  from  GSFC  so  that  information  about 
the  key  arras  was  available.  For  example,  statistics 


have  been  meticulously  kept  since  the  first  data 
order  in  Phase  I on  the  number  of  acquisitions 
(spacecraft  hits),  Sections  for  excessive  cloud 
cover,  correlation  failures  (as  disclosed  previously), 
and  quality  rqjects.  By  the  end  of  Phase  I,  a definite 
pattern  had  emerted  indicating  that  about  SO  percent 
of  the  acquisitions  would  be  rejected  because  of 
clouds,  10  percent  because  of  correlation  failures,  and 
about  $ percent  because  of  miscellaneous  quality 
reasons.  Naturally,  these  statistics— particularly 
those  for  cloud  cover— would  vary  by  season  and  by 
country;  on  the  whole,  they  provided  a reasonable 
yardstick  by  which  to  gauge  performance.  If  signifi- 
cant  short-term  deviations  were  experienced,  queries 
would  be  made  to  GSFC  asking  that  they  assess  their 
operations  to  determine  whether  problem  areas 
existed. 

In  a preceding  paper,  an  account  hut  been  given  of 
both  the  evolution  and  the  design  of  the  statin  and 
tracking  system.  The  evolutionary  process  was  ar- 
duous,  but  by  the  time  spring  windows  had  opened 
for  Phase  II,  a viable  system  for  data  statusing  and 
tracking  had  been  developed.  In  what  follows,  the 
system  will  be  addressed  with  more  or  leu  its  current 
configuration  and  capabilities  and  its  contribution  to 
the  management  of  daily  operational  activities. 

To  those  charged  with  managing  the  operations 
data  flow,  there  were  several  basic  pieces  of  informa* 
tion  desired  from  the  system. 

1.  What  is  at  JSC  (by  country  and  crop  type)? 

2.  Is  it  in  work?  If  so,  where? 

3.  How  long  is  each  phase  of  processing? 

4.  What  segments  have  been  in  work  excenively 
long? 

5.  How  old  is  the  data  supporting  a production 
estimate? 

These  questions  comprised  the  essence  of  the 
requirements  for  the  status  and  tracking  system. 
They  comprised  the  basic  information  required  to 
manage  the  system  u well  as  that  needed  to  status 
higher  management  levels  concerning  the  health  of 
the  system. 

Potentially  one  of  the  most  valuable  reports  gener- 
ated wu  one  that  showed  the  length  of  time  each  seg- 
ment wu  in  work  at  each  processing  station.  For  ex* 
ample,  how  long  wu  it  from  the  date  of  acquisition 
until  GSFC  shipped  the  data  to  JSC?  How  long  did  it 
take  to  make  the  film  products  and  prepare  the 
analyst  packet  ready  for  work?  How  long  did  the 
analyst  have  the  segment  in  work?  These  were  com- 
pared against  a set  of  predetermined  nominal  proc* 
easing  times.  Segments  that  had  been  in  work  at  a 


given  station  longer  than  the  reference  time  allocated 
were  automatically  printed  out.  • 

From  its  inception,  the  “Delinquency  Report,”  as 
it  wu  called,  proved  to  be  a valuable  management 
tool.  In  Phase  til,  it  wu  common  to  have  500  seg- 
ments in  analysis  alone  plus  another  500  to  600  com- 
ing in  weekly.  In  all,  2000  to  3000  acquisitions  might 
be  on  the  move  through  the  data  fiow  at  any  given 
time.  With  this  much  active  data  in  the  system,  it 
wu  euy  for  some  of  it  to  get  sidetracked.  The  Delin- 
quency Report  caught  these  and  flagged  them  by  sta- 
tion and  segment  number.  After  a few  iterations  of 
these  reports  where  the  individual  responsible  for  a 
given  station  had  to  prepare  a response  to  the  Delin- 
quency Report,  there  wu  a noticeable  tightening  of 
the  data  flow  and  a significant  reduction  in  the  size  of 
the  report. 

As  a footnote  to  this  activity,  however,  it  wu 
found  that  the  Delinquency  Report  wu  a useful  tool 
in  managing  the  data  fiow  only  u long  u the  system 
resources  were  sufficient  to  meet  the  processing  re- 
quirements. When  the  system  became  overloaded, 
backlogs  began  building,  the  system  became  satur- 
ated with  data,  and  the  report  wu  of  little  value  as 
data  wu  often  set  aside  deliberately  to  allow  for 
processing  of  higher  priority  segments.  Thus,  it  wu 
often  possible  for  large  blocks  of  data  to  be  “delin- 
quent.” 

The  ability  to  tort  data  by  county,  crop  type,  and 
acquisition  date  proved  the  greatest  asset  of  the 
status  and  tracking  system.  Inevitably,  there  would 
be  breakdowns  in  the  systems  that  processed  and/or 
manipulated  the  data.  This  wu  compounded  by  the 
fact  that,  during  Phase  III,  the  incoming  flow  rate  of 
data  exceeded  the  analysis  capability.  The  data  wu 
prioritized  by  country  and  fed  into  the  status  and 
tracking  system.  In  areu  such  u the  U.&S.R..  where 
the  flow  rate  exceeded  analysis  resources,  determina- 
tions could  be  made  on  how  to  prioritize  the  data  to 
provide  the  but  possible  data  set  for  analysis  and  ag- 
gregation. 

As  the  growing  season  would  progreu.  there 
would  be  some  segments  that  the  Crop  Asseument 
Subsystem  (CAS)  believed  had  satisfactory  uti- 
mates  which  could  be  carried  forth  from  me  report 
to  another  and  other  segments  for  which  they 
believed  either  new  estimates  sho  >id  be  generated  or 
old  estimates  improved.  It  wu  possible  to  compare 
the  CAS  needs  with  what  wu  available  in  the  status 
and  tracking  system  and  to  flag  for  the  analysts  those 
segments  that  were  to  be  processed  on  a priority 
basis. 
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At  understanding  of  the  flexibility  of  the  tutus 
and  (recking  system  grew,  many  other  reports,  up- 
dates, etc.,  were  generated,  often  for  specific  users  or 
purposes.  As  a way  of  integrating  the  various  user  in- 
puts from  other  status,  tracking,  and  problems,  an 
Operations  Coordination  Center  (OCC)  was  estab- 
lished. The  purpose  of  the  OCC  was  to  provide  a 
focal  point  for  the  exchanged  information  regarding 
the  status,  problama,  etc.,  of  each  of  the  i rffivMual 
components  that  comprised  the  data  flow.  Hue 
much  of  the  detailed  data  that  came  out  of  status  and 
tracking  was  summarized  into  concise  displays  to 
provide,  at  a glance,  data  flow  tracking  of  each  major 
component  in  the  system. 

As  a vehicle  for  obtaining  the  necessary  informa- 
tion with  which  to  manage  the  data  flow.a  debriefing 
was  held  each  morning  in  the  OCC  to  discuss  the 
previous  day's  activities.  Information  exchanged 
during  these  sessions  consisted  of  throughput 
statistics,  problem  areas,  and  suggested  solutions  to 
problems.  Personnel  from  each  major  functional 
area  of  the  project  attended  these  sessions.  Where 
appropriate,  action  items  were  given,  and  tracking  of 
these  was  instituted  to  assure  their  completion. 

The  primary  processing  status  displays  in  OCC 
integrated  the  many  diverse  inputs  and  reported  the 
status  and  progress  of  the  processing  activities  from 
receipt  of  the  data  from  GSFC  through  completion 
of  the  Classification  and  Mensuration  Subsystem 
(CAMS)  analysis  of  the  data.  Specific  items  reported 
were  receipt  of  data  from  GSFC  and  update  of  the 
JSC  data  base,  arrival  of  the  film  products  at  the 
LACIE  Physical  Data  Library  <LPDL).  availability 
of  the  data  to  CAMS  for  analysis.  CAMS  interpreta- 
tion o!  the  data  and  preparation  of  computer  runs,  ar- 
rival of  computer  processing  data  products,  and 
evaluation  of  the  computer  run  and  delivery  of  esti- 
mates to  CAS.  The  quantity  of  data  that  could  not  be 
satisfactorily  classified  and  the  reason  were  also  re- 
ported. In  addition  to  providing  these  daily  opera- 
tions reports,  the  OCC  summarized  and  assimilated 
them  into  the  weekly  production  reports  that  were 
provided  to  LACIE  management.  Presided  over  by 
an  operations  manager,  the  daily  meetings  were  the 
key  to  the  operation  of  LACIE  in  that  they  provided 
a direct  exchange  of  information  among  the  working 
level  personnel  directly  responsible  for  the  manage- 
ment and  operation  of  the  data  flow.  Problems  could 
be  quickly  identified,  tracked,  and  brought  to  project 
ma“igcmcni'i  attention  if  necessary.  Often  they 
could  be  worked  by  direct  assignment  Further,  they 
provided  a mechanism  for  keeping  all  the  functional 


areas  apprised  of  the  latest  operational  develop- 
ments. 

Other  weekly  reports  were  generated  from  drily 
inputs  or  obtained  from  the  status  and  tracking 
system.  These  included  reports  on  the  throughput 
time  required  for  processing  data  by  the  key  ele- 
ments of  the  operating  system.  These  reports  ware 
soon  limited  to  the  OSFC  end  CAMS  processing 
because  the  other  subsystems  exhibited  a reasonably 
constant  time  within  that  nominally  expected.  This 
throughput  time  report  enabled  operations  manage- 
ment to  nontax  the  processing  time  line,  identify 
boidenerxs,  and  initiate  corrective  measures  as 
required. 

In  audition  to  compiling  summary  statistics  Need 
on  the  status  and  tracking  system,  the  OCC  also 
tracked  Discrepancy  Reports  (DR's).  As  discussed 
earlier  in  a paper  on  quality  assurance  <QA),  these 
DR's  represented  documented  summaries  of  major 
fai^tres  in  either  hardware,  software,  or  procedures 
and  were  categorized  as  critical  or  noncritical.  Criti- 
cal DR's  meant  that  the  potential  existed  for  a mqjor 
failure  in  the  LACIE  system,  possibly  resulting  in  a 
work  stoppage,  These  were  rigorously  tracked  and 
atatused  until  satisfactory  closeout  had  been 
accomplished. 

Another  major  source  of  information  coming 
from  QA  was  the  procedure  audit.  Although  this 
type  of  QA  departed  significantly  from  what  is 
thought  of  in  the  standard  context  of  product  QA.  it 
made  a substantial  contribution  to  the  overall 
management  of  the  system.  Because  of  the  complex- 
ity of  the  LACIE  operating  system,  it  was  imperative 
that  standard  operating  procedures  be  developed  for 
each  of  the  major  functional  arras.  Generally,  these 
procedures  were  built  around  an  operating  system  so 
that  adherence  to  them  was  necessary  for  proper 
operation  of  the  system.  The  QA  audits  of  personnel 
compliance  to  procedures  often  proved  enlightening: 
pressures  generated  by  these  audits  eventually  re- 
sulted in  a much  higher  quality  of  documentation 
than  would  have  been  the  case  otherwise. 

Other  special  reports  were  generated  internally  to 
operations  management  for  its  own  use.  Many  of 
these  reports  were  those  generated  at  the  beginning 
or  near  the  end  of  a processing  cycle  for  a specific  set 
of  date.  They  typically  detailed  the  status  of  date  ac- 
quisitions and  processing  such  that  specific  opera- 
tional "miniplans"  were  developed  to  process  the 
data.  This  assured  an  orderly  beginning  and  ending 
to  processing  date  sets. 

The  goals  of  providing  LACIE  management  with 
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the  appropriate  level  of  information  and  ensuring  the 
understanding  of  the  data  flow  and  the  operation's 
accomplishments  on  a weekly  basis  were  realized 
because  of  an  effective  operations  reporting  system. 
It  enabled  the  progress  of  data  processing  to  be 
measured  against  the  established  product  goals  and 
objectives.  It  also  provided  management  with  a view 
of  the  problems  and  pitfalls  associated  with  the 
operations  system,  permitting  it  to  provide  addi- 
tional direction  io  the  operations  system. 

The  status  and  tracking  system,  dialog  from  the 
daily  OCC  debriefings,  and  QA  reporting  armed 
those  responsible  for  managing  the  data  flow  with  a 
significant  amount  of  information.  The  problem 
then  existed  of  how  to  distill  this  information  into  a 
weekly  summary  briefing  that  would  keep  manage- 
ment adequately  informed  of  progress,  problems, 
and  overall  adherence  to  production  reporting 
schedules. 

Early  attempts  to  brief  management  by  letting  the 
numbers  speak  for  themselves  produced  elaborate 
matrices  showing  what  had  been  received,  what  was 
in  work,  average  time  in  work,  etc.  To  management's 
credit,  it  withstood  this  barrage  of  statistics  with  a 
measure  of  restraint.  Over  a period  of  time,  it  became 
apparent  that  while  this  procedure  was  historically 


documenting  what  had  occurred,  it  was  not  provid- 
ing a means  of  comparing  actual  versus  expected  per- 
formance. In  trying  to  do  this,  it  was  discovered  that 
while  ihe  status  and  tracking  system  did  contain  all 
the  relevant  statistics,  methodical  extraction  of 
summarized  reports  from  it  was  not  very  refined. 

As  the  understanding  of  the  capabilities  and  the 
limitations  of  the  status  and  tracking  system 
evolved,  so  did  the  reporting.  Confidence  was  gained 
in  the  ability  to  correctly  interpret  and  summarize 
the  reports,  and  the  number  of  elaborate  charts  and 
matrices  constructed  decreased.  By  the  middle  of 
Phase  HI,  the  reporting  had  -ome  full  cycle;  data 
flow  information  was  presented  essentially  in  a 
tightly  summarized  form,  consisting  of  an  assess- 
ment of  (1)  the  goal,  (2)  progress  to  date,  (3) 
schedule  variations,  (4)  forecasting  regarding 
adherence  to  milestones,  and  (5)  problem  areas. 

In  retrospect,  this  portrayal  of  the  status  of  the 
LACIE  data  system  is  so  obvious  that  one  must 
wonder  how  any  other  course  for  depicting  the  infor- 
mation could  have  been  considered.  It  must  be 
pointed  out,  however,  that  synopsizing  was  more  of 
an  acquired  skill  than  an  exact  science  and  was  in 
fact  subjected  to  an  evolutionary  process  throughout 
LACIE. 
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INTRODUCTION 

During  its  early  stages,  the  Earth  Observation 
Division  (EOD)  Data  Techniques  I aboratory  (DTL) 
at  the  NASA  Johnson  Space  Center  (JSC)  was  a test 
bed  for  experimental  software  and  hardware  tech- 
niques supporting  scientific  data  processing.  A 
central  control  point  for  management  of  configura- 
tion updates  and/or  modifications  did  not  exist.  In- 
dividual DTL  users  updated  or  modified  the  facility 
at  will,  within  their  own  projects  and  assignments. 
This  naturally  resulted  in  nondocumented,  un- 
controlled configurations  evolving  through  software 
updates  and  modifications  by  programmers.  Attend- 
ant problems  included  scheduling  conflicts,  lack  of 
defined  procedures,  and  software  processing  via  un- 
tested software. 

The  requirements  of  the  LAC1E  project  on  the 
DTL  changed  its  role  from  a testing  facili'y  to  a pro- 
duction facility.  It  was  paramount  that  resource 
scheduling  and  control  of  laboratory  hardware  and 
software  be  implemented. 


PLANS  FOR  CONTROL 

In  May  1975.  EOD  management  approved  the  for- 
mation of  the  Facilities  Configuration  Management 
Office  (FCMO).  FCMO  was  to  provide  process  con- 
trol for  LACIE  support  within  the  division.  The  pri- 
mary task  was  to  establish  standards  and  procedures 
for  central  DTL  configuration  control  for  hardware 
changes,  software  changes  (both  system  and  applica- 
tion), future  modification  to  established  baselines, 
temporary  changes,  and  anomaly  resolutions.  The 
goal  of  these  standards  and  procedures  was  to  assure 
compatibility  of  requirements,  plans,  and  applica- 
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tions  as  interrelated  with  the  multitude  of  users  with- 
in the  framework  of  well-defined  support  facility 
resources.  This  task  was  later  expanded  to  include 
DTL  system  operating  software,  machine-user  iden- 
tification access  control,  and  documentation  of  perti- 
nent information  required  by  in-house  processor 
users.  Since  the  DTL  was  a functioning  user  facility, 
implementation  of  classic  configuration  control  pro- 
cedures could  not  be  implemented  without  interrup- 
tion of  service.  Implementation  had  to  proceed 
firmly,  but  smoothly,  along  established  user  modes. 

The  first  step  in  establishing  configuration 
management  for  the  FCMO  was  the  development  of 
the  DTL  Software  Users  Guide.  This  guide  outlined 
DTL  processor  configuration,  defined  the  system 
capabiFties  available  to  users  developing  application 
software,  and  established  system  resources  rules  and 
limitations. 

Then,  while  DTL  users  were  absorbing  and  being 
tutored  in  the  DTL  Software  Users  Guide.  FCMO 
personnel  developed  a Configuration  Management 
Plan  (CMPI — the  first  step  in  a classic  approach — 
that  was  reviewed  by  JSC  management  at  periodic 
stages  to  assure  adherence  to  established  agency  stan- 
dards. 


IMPLEMENTING  CONTROLS 

The  purpose  of  the  CMP  was  to  maintain  the  in- 
tegrity of  all  DTL  production  software.  Because  im- 
plementation of  the  plan  could  not  be  accomplished 
in  one  motion  because  of  the  continuing  multiuser 
nature  of  the  DTL,  the  plan  had  to  be  instituted  in  an 
orderly,  nondisruptive  manner  to  support  the 
LACIE  project. 

First,  system  baselines  were  established.  This  was 
achieved  by  accepting  all  production  software  as  de 
facto  acceptance  tested.  With  this  baseline  estab- 
lished. all  further  changes  were  required  to  undergo 
acceptance  testing  as  defined  by  CMP  procedures. 
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The  Controlling  Organization 

As  the  OTL  was  evolving  into  a real-time, 
multiuser  facility,  it  became  apparent  that  a single 
responsible  body  was  required  to  control  modifica- 
tions, access,  and  utilization  of  the  operating  system. 

This  requirement  was  satisfied  with  the  establish- 
ment of  the  Systems  Management  (SM)  function 
within  the  FCMO.  As  the  technical  arm  of  FCMO, 
SM  was  given  the  responsibility  to  maintain  and  con- 
trol the  modifications  to  all  the  baselined  software 
utilized  for  "reduction  in  the  DTL. 

In  additio  to  user  control.  SM  continues  to  pro- 
vide the  following  functions. 

1.  Install  and  maintain  the  operating  system  soft- 
ware. 

2.  Investigate  system  failures  and  either  report 
the  failure  to  the  vendor  for  response  or  initiate  a 
local  system  correction  or  workaround. 

3.  Install  all  pertinent  operating  system  correc- 
tions received  from  the  vendor. 

4.  Inform  the  general  user  community  of  all  new 
features,  modifications,  problems,  and  workarounds 
in  regard  to  the  DTL  operating  systems. 

5.  Assign  user  identification  codes  and  protection 
codes  that  control  access  and  level  of  access  to 
system  capabilities. 

6.  Provide  analysis  support  to  users  experiencing 
problems  with  either  the  system  features  during  soft- 
ware development  or  acceptance-tested  software  dur- 
ing production  runs. 

7.  Provide  system  backups  and  backup  pro- 
cedures to  allow  regeneration  of  the  baseline  system 
in  the  event  of  total  system  failure. 


Enforcement  of  Controls 

To  ensure  the  effectiveness  of  the  SM  function, 
FCMO  has  the  power  to  enforce  the  rules  and  pro- 
cedures as  defined  in  the  CMP.  For  instance,  if  a user 
chooses  not  to  follow  esta  ^hed  rules  or  procedures. 
FCMO  can  refuse  the  user  access  to  the  system.  This 
system  denial  may  be  in  the  form  of  refusal  to 
schedule  an  acceptance  test,  or  it  may  be  through 
computer  lockout  of  the  user  via  a computer  console 
entry.  These  rules  and  procedures  are  necessary  to 
avoid  one  user  making  modifications  to  the  system 
that  would,  in  effect,  put  other  users  of  the  system 
out  of  business.  Review,  coordination,  and  impact 
analysis  of  all  planned  modifications  in  relation  to 
other  users  is  the  responsibility  of  FCMO. 


Included  in  the  configuration  management  plans 
are  acceptance  test  procedures  for  quality  assurance 
(QA)  signoff  for  both  hardware  and  software 
modifications. 


UMr  Allocation*  and  Identification 

Because  of  the  limited  disk  data  storage  in  the 
multiuser  environment,  the  amount  of  block  storage 
utilized  by  each  user  must  be  denned  and  controlled. 
To  accomplish  this,  each  system  user  must  file  a for- 
mal request  with  FCMO  via  a “User  Identification 
Code  (U1C)  Action  Form.”  This  form  specifies  the 
amount  of  storage  required  to  support  each  user  task. 
The  form  is  submitted  to  the  Data  Base  Manager 
(FCMO  NASA  Task  Monitor),  who  approves  or  dis- 
approve* the  request.  If  the  task  is  disapproved,  the 
Data  Base  Manager  notifies  the  requester.  If  the  re- 
quest is  approved,  the  requester  is  assigned  a unique 
UIC  and  the  allotted  storage  block.  The  Data  Base 
Manager  directs  FCMO  SM  to  install  the  new  UIC  in 
the  system.  When  this  is  accomplished,  the  UIC  re- 
quester is  notified  and  the  request  for  system  access 
has  been  approved  and  the  new  unique  UIC  has  been 
installed.  Each  system  user  must  sign  on  and  enter 
the  specific  unique  UIC  to  gain  system  access. 

Because  of  personnel  attrition,  transfers,  etc.,  all 
UlC's  must  be  kept  current.  This  is  accomplished  by 
requesting  all  system  users  to  renew  their  UIC  and 
computer  storage  request  on  a quarterly  basis.  All 
UlC's  not  renewed  within  10  days  after  the  specified 
renewal  time  are  purged  to  tape  and  retained  for  60 
days.  UIC  renewal  dates  are  communicated  through 
an  FCMO  bulletin.  The  FCMO  bulletin  is  also  used 
to  convey  information  concerning  newly  installed 
hardware  and  modifications  to  systems  and  produc- 
tion software. 


SUMMARY 

All  DTL  configuration  management  procedures 
and  guidelines  were  implemented  by  the  FCMO 
within  an  18-month  period.  This  time  frame  allowed 
easy  transition  from  an  un managed  to  a managed 
condition.  It  further  allowed  users  ample  time  to 
become  acclimated  to  the  newly  established  pro- 
cedures. and  it  allowed  the  procedures  themselves  to 
be  "fine  tuned”  for  more  effective  control. 

Configuration  management  has  improved  the  effi- 
ciency of  the  operational  system  in  the  DTL.  System 
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scheduling  and  operational  problems  have 
diminished  to  no  more  than  an  occasional  operator 
error.  Because  of  the  formal  acceptance  test  pro- 
cedures, software  reliability  has  improved  and  users 
have  gained  respect  for  system  integrity.  Finally, 


FCMO  tracking  of  all  deliverable  documentation 
assures  users  the  documents  associated  with  new 
and/or  updated  software  are  available  at  the  time  of 
software  installation. 


263 


N80-154?2 


Accuracy  Assessment  System  and  Operation 

D.  £.  Pitts,0  A.  G.  Houston ,a  G.  Badhwar ,a  M.  J.  Bender ,a  M.  L Rader, b W.  G.  Eppler ,c 
C.  W.  Ahlers,»  W.  P.  White, bR.  R.  VetafiE.  M.  HsufiJ,  F.  Potter, b andN.  J.  CHntonb 


INTRODUCTION 

The  LACIE  crop  estimation  system  is  composed 
of  several  operational  subsystems:  data  collection, 
classification,  yield  estimation,  crop  aggregation  and 
reporting,  data  storage  and  retrieval,  and  accuracy 
assessment.  The  Accuracy  Assessment  Subsystem  is 
responsible  for  determining  the  accuracy  and 
reliability  of  LACIE  estimates  of  wheat  production, 
area,  and  yield  made  at  regular  intervals  throughout 
the  crop  season  and  for  investigating  the  various 
LACIE  error  sources,  quantifying  these  errors,  and 
rating  them  to  their  causes.  Timely  feedback  of 
these  error  evaluations  to  the  LACIE  project  was  the 
only  mechanism  by  which  improvemer,  ts  in  the  crop 
estimation  system  could  be  made  during  the  short  3- 
year  experiment. 

Figure  1 illustrates  the  accuracy  assessment  data 
flow.  Estimates  of  wheat  production,  area,  and  yield 
are  compared  with  accurate  reference  data.  For  ex- 
ample, in  the  yardstick  region  of  the  nine  states  in 
the  U.S.  Great  Plains  (USGP),  the  U.S.  Department 
of  Agriculture  (USDA)  Economics,  Statistics,  and 
Cooperatives  Service  (ESCS)  estimate  is  used  as  the 
reference.  In  areas  outside  the  United  States,  the 
USDA  Foreign  Agricultural  Service  (FAS)  and  offi- 
cial country  estimates  are  used  as  the  reference  stan- 
dard. In  most  cases,  the  LACIE  reports  are  published 
a few  days  before  the  release  of  the  corresponding 
ESCS  or  FAS  report.  The  LACIE  estimates  and  stan- 
dard error  estimates  are  compared  month  by  month 
to  the  corresponding  reference  data  as  well  as  to  the 
end-of-crop-year  reference  data  to  determine 
whether  the  project  accuracy  goal  of  90/90  (i.e.,  90 
percent  of  the  time,  the  estimate  of  wheat  production 
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FIGURE  1 . — Accuracy  assessment  analysis  steps  and  (tow  of 

data. 

should  be  within  ±10  percent  of  the  reference)  is 
being  met  (see  the  paper  by  Houston  et  al.  entitled 
“Accuracy  Assessment:  -T\.e  Statistical  Approach  to 
Performance  Evaluation  in  LACIE"). 

To  produce  timely  reports  of  these  results,  a brief 
accuracy  assessment  report  called  a “quick-look  re- 
port" is  published  a few  days  following  each  LACIE 
crop  estimation  report.  Four  times  each  year,  the  Ac- 
curacy Assessment  Subsystem  produces  comprehen- 
sive reports  of  the  error  source  studies,  which  sepa- 
rate the  wheat  production  error  into  its  component 
parts  of  wheat  area  error  and  wheat  yield  error. 
These  errors  are  further  divided  into  component 
parts  based  on  field  observations  (fig.  2). 


FIGURE  2.— Production  error  components. 
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Figure  3 shows  the  Held  data  used  in  accuracy 
assessment.  The  29  intensive  test  sites  are  special 
nonoperationa!  sites  on  vhich  very  detailed  data  are 
collected  each  18  days.  These  sites  are  used  for 
Classification  and  Mensuration  Subsystem  (CAMS) 
procedure  verification  and  are  also  used  in  the  quick* 
look  reports  to  illustrate  particular  situations  encoun- 
tered during  the  crop  year.  Because  of  the  widely 
varying  crop  conditions  from  one  county  to  the  next, 
ground  truth  on  large  numbers  of  operational  sample 
segments  must  be  obtained  to  properly  separate  po- 
tential error  sources  such  as  classification  and  sam- 
pling. The  166  blind  sites  with  their  “wall  to  wall”  in- 
ventories meet  this  requirement.  Throughout  the 
crop  year,  all  elements  of  the  project  receive  infor- 
mation resulting  from  the  blind  site  studies. 
However,  to  protect  the  integrity  of  the  blind  sites  as 
a testing  tool,  the  pound-truth  information  from  the 
blind  sites  is  not  released  to  elements  of  LACIE 
other  than  the  Accuracy  Assessment  Subsystem  un- 
til after  the  year  is  over. 


BACKGROUND 

During  the  mid-1960*s,  agricultural  remote- 
sensing experiments  were  performed  using  aircraft 
platforms.  Photographs  from  these  missions  were 
used  to  construct  photomosaics,  which  were  used  as 
base  maps  for  recording  the  crop  identity  observed 
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FIGURE  3.— Field  dstt  acquisition. 


on  the  ground.  An  example  is  the  familiar  C-l  flight- 
line multispectral  scanner  (MSS)  experiment  con- 
ducted in  Indiana  by  the  Purdue  University 
Laboratory  for  Applications  of  Remote  Sensing 
(LARS)  (refs.  1 and  2),  which  included  the  recording 
of  ground-truth  and  ancillary  data  for  about  400 
fields.  Field  vertices  were  manually  registered  to  the 
MSS  gray-scale  printouts,  which  were  then  used  to 
construct  test  fields  in  the  classification  data  base  for 
evaluation  of  accuracy.  This  procedure  was  not  via- 
ble for  LACIE,  since  the  use  of  test  fields  in  the 
operational  data  base  would  have  potentially  com- 
promised the  results. 

With  the  advent  of  the  Landsat  spacecraft,  ground 
truth  could  be  registered  to  the  Landsat  MSS  and 
used  several  times  during  the  crop  year  so  long  as 
crop  rotations,  abandonment,  etc.,  did  not  occur  and 
the  pass-to-pass  registration  system  (see  the  paper  by 
Qrebowsky  entitled  "LACIE  Registration  Process- 
ing") was  accurate  to  ±1  pixel.  This  method  was 
used  in  Crop  Identification  Technology  Assessment 
for  Remote  Sensing  (CITARS),1  in  which  six  coun- 
ties in  the  Corn  Beit  were  sampled,  with  one  5-  by  20> 
statute-mile  sample  located  in  each  county.  To  train 
the  CITARS  classifier  to  distinguish  corn  and  soy- 
beans from  other  crops  and  to  assess  classification 
accuracy,  ground-truth  inventories  were  performed 
in  20  randomly  chosen  quarter  sections  in  each  of  the 
6 segments.  These  data  were  transferred  by  image  in- 
terpretation to  gray-scale  printouts  or  cluster  maps  of 
Landsat  data.  Thus,  for  these  sample  segments  com- 
prising 600  square  statute  miles,  30  square  statute 
miles  were  inventoried  by  the  USDA  Agricultural 
Stabilization  and  Conservation  Service  (ASCS)  each 
18  days— six  times  during  the  growing  season. 

The  CITARS  effectively  illustrated  the  need  for 
ground  truth  to  determine  error  sources  in  classifica- 
tion, but  the  project  did  not  involve  large-region  pro- 
duction estimation  and  therefore  did  not  include  area 
estimation  sample  error,  yield  estimation  error,  or 
the  wide  range  of  classification  error  that  can  occur 
over  large  areas  containing  hundreds  of  counties 
with  widely  varying  climatic  conditions  and  cropping 
practices.  Early  in  LACIE,  the  only  too)  for  identify- 
ing sources  of  classification  error  was  the  ntensive 
lest  site.  However,  with  only  29  intensive  test  sites, 
none  of  which  were  located  in  Colorado,  Oklahoma, 
or  Nebraska,  this  sample  did  not  adequately  rep  re 
sent  the  9 states  of  the  U.S.  Great  Plains,  containing 
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several  hundred  segments.  Furthermore,  the  inten- 
sive test  sites  were  chosen  to  include  high  percent* 
ages  of  wheat  and  small  grains  and  therefore  were 
not  necessarily  representative  of  the  sites  chosen  by 
the  LACIE  sampling  strategy. 

To  overcome  these  shortcomings,  a pilot  effort 
was  undertaken  in  Phase  I in  which  29  LACIE  seg- 
ments in  North  Dakota  and  Montana  were  inven- 
toried in  August  1975.  The  term  “blind  site”  was 
chosen  for  these  sites  because  they  were  part  of  a 
blind  test  in  which  none  of  the  LACIE  analysts  knew 
the  sites  chosen  or  had  any  information  about  the 
identity  of  the  fields  as  determined  in  the  inventory. 
In  Phase  I,  the  29  blind  sites  were  setected  randomly 
from  the  set  of  about  55  sites  in  Montana  and  North 
Dakota  with  biowindow  1 an ; 2 inquisitions.  Ac- 
quisition histories  as  of  late  July  1975  were  used  as 
the  basis  of  selection.  Proportions  of  small  grains 
were  determined  by  planimetry  of  the  ground-truth 
annotated  photography,  in  the  Phase  I blind  sites, 
wall-to-wall  ground  truth  was  prepared  by  a com- 
bination of  ground  surveys,  aerial  inspection  from 
light  aircraft,  and  interpretation  of  current  aerial 
photographs.  Without  visits  to  all  the  fields,  separa- 
tion of  oats,  rye,  and  barley  from  wheat  was  not  al- 
ways possible.  In  Phases  II  and  III,  the  inventories 
were  conducted  using  current  aerial  photographs  as 
the  base  map  and  ground  surveys  to  determine  each 
field  crop  type. 

In  Phase  II,  1 36  blind  sites  were  selected  randomly 
from  the  LACIE  crop  estimation  reports.  The  seg- 
ments for  the  Southern  Great  Plains  were  selected 
randomly  from  the  segments  used  in  the  February 
1976  Crop  Assessment  Subsystem  monthly  report 
(CMR).  The  June  CMR  was  used  for  Montana  and 
Sou'h  Dakota  winter  wheat,  and  the  July  CMR  was 
used  for  the  spring  wheat  states.  These  segments 
were  chosen  so  as  to  represent  equal  numbers  of  seg- 
ments with  high  as  well  as  low  estimates  of  small 
grains.  Wall-to-wall  ground  truth  was  then  collected 
for  all  fields  by  the  ASCS  personnel  using  current 
aerial  photographs  as  the  base  map. 

in  Phase  III,  166  blind  sites  were  selected  ran- 
domly from  the  USGP  segments  so  that  each  crop  re- 
porting district  had  approximately  one-third  of  the 
segments  chosen  as  blind  sites.  This  selection  was 
made  in  October  1976  before  the  crop  yea=  com- 
menced. Wall-to-wall  ground  truth  was  then  col- 
lected for  all  fields  by  the  ASCS  personnel  using  cur- 
rent aerial  photographs  as  the  base  map.  Proportions 
of  all  crops  were  determined  by  (1)  digitizing  the 
field  vertices  from  the  annotated  aircraft  photo- 


graph, (2)  registering  the  data  to  the  Landsat  MS 
image,  and  (3)  numerically  integrating  the  area  rtf 
each  field.  To  ensure  timely  assessment  of  propor- 
tion accuracy,  the  proportions  of  wheat  and  small 
grains  were  also  estimated  for  both  the  planted  aid 
the  at-harvest  inventories,  using  a dot  grid  overlaid 
on  the  ground-truth  annotated  photograph. 


CLASSIFICATION  ACCURACY 

A crop  estimation  system  such  as  LACIE  encoun- 
ters a wide  variety  of  phenomena  that  can  contribute 
to  the  classification  error  (fig.  4).  The  Accuracy 
Assessment  Subsystem  has  the  task  of  determining 
the  contribution  of  each  of  the  components  to  the 
overall  classification  error  as  well  as  to  the  area 
estimation  error  on  the  state  and  national  levels. 
Estimates  of  the  percentage  of  small  grains  in  the 
LACIE  5-  by  6-nautical-mile  segments  have  been 
found  to  contain  errors  caused  by  various  sources,  of 
which  the  most  important  were  (1)  abnormal  sig- 
nature development  due  to  a variety  of  causes  includ- 
ing late  planting,  drought,  cattle  grazing,  crop  rota- 
tion, disease,  and  soil  variability;  (2)  inadequacy  of 
the  Landsat  scanner  in  resolving  small  fields;  and  (3) 
mislabeling  of  small  grains,  grasses,  pasture,  and  idle 
fallow  when  key  acquisitions  are  missing. 

In  each  of  the  last  2 years  of  LACIE.  ground  truth 
was  collected  for  one-third  of  the  operational  sample 
segments.  Approximately  12  000  square  statute  miles 
of  ground-truth  data  have  been  produced  (about 
twice  the  area  of  the  state  of  New  Jersey)— easily  the 
largest  amount  ever  attempted  in  agricultural  remote 
sensing.  The  ground  truth  was  used  to  evaluate  the 
error  sources  and  magnitudes  for  64  000  analyst  dot 
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labels  per  crop  year  and  the  accuracy  of  clustering 
and  classification  of  about  14  000  000  pixels  per  year. 
In  order  to  process  these  large  amounts  of  data,  the 
ground-truth  wall-to-wall  Inventories  were  digitized 
wnd  registered  to  thf  Landsat  faasge  digital  date.  This 
procedure  not  only  enables  objective  repeatable  ex- 
periments to  be  performed  by  ail  users  but  also  en- 
ables traceability  of  the  ground-truth  data  from  the 
fieldwork  to  the  pixel  label  and  facilitates  rapid,  effi- 
cient correction  of  errors  in  the  digitized  ground- 
truth  data.  Since  a ground-truth  label  is  obtained  for 
each  Landsat  MSS  pixel,  it  is  very  easy  to  compare 
other  LACIE  data  produced  at  the  pixel  level  includ- 
ing the  following. 

1.  Landsat  data  and  transformations,  such  as 
green  number 

2.  Analyst  labels 

3.  Classification  maps 

4.  Cluster  maps 

In  addition,  data  produced  at  the  field  level  can  be 
evaluated  in  terms  of  their  effect  on  the  pixel-level 
accuracies.  Examples  include  the  following. 

1.  Crop  stage  development 

2.  Yield 

3.  Fertilizer  effects 

4.  Irrigation  effects 

5.  Cropping  practices 

6.  Rainfall 

7.  Soil  type 

8.  Atmospheric  optical  depth 

Thus,  many  other  LACIE  groups  besides  Ac- 
curacy Assessment  find  these  data  important;  they 
include  classification  technique  development  groups, 
spcctromet  yield  model  development  groups,  and 
spectromet  crop  development  model  groups. 

The  processing  of  the  ground  truth  into  digital 
forms  and  its  use  in  Accm-scy  Assessment  are  illus- 
trated in  figures  S <uia  6.  A Phase  III  blind  site  (num- 
ber 1523  in  Minnesota)  is  discussed  in  the  following 
paragraphs  as  an  example  of  these  processing  steps. 
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FIGURE  5.— Collection  of  dtto  (or  Mind  sties  and  Intensive  test 
sites. 
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FIGURE  Use  of  Held  data  to  evaluate  classification 
accancy. 


This  segment  was  randomly  chosen  from  the  avail- 
able blind  site  segments  with  good  acquisition 
histories. 


Aircraft  Maps 

After  selection  of  the  blind  sites,  Landsat  imagery 
is  used  to  determine  the  true  position  of  each  site, 
which  is  plotted  on  a 1:24  000-scale  or  1:12  500-scale 
map  for  use  by  aircrews  in  acquiring  the  aerial 
imagery. 

Aircraft  Photographs 

Aerial  photography  using  color-infrared  Him  is 
collected  for  each  Mind  site  by  using  predesignated 
flight  lines.  If  possible,  flight  altitudes  are  greater 
than  14  kilometers  to  enable  single-photograph 
overage  of  the  entire  site.  These  photographs  are 
enlarged  to  a scale  of  1:24  000  for  field  use  (fig.  7). 

Flold  Ovarlays  and  Fiald  Sagmant  Kits 

If  the  imagery  is  of  satisfactory  quality,  trans- 
parent overlays  are  prepared  (ref.  3)  and  forwarded 
to  ASCS  personnel  in  the  appropriate  county  (see  the 
paper  by  Spiers  and  Patterson  entitled  "Ancillary 
Data  Acquisition  for  LACIE"). 

Blind  8Hs  Fluid  Data  Acquisition 

The  USD  A ASCS  personnel  provide  complete  in- 
ventory data  based  on  ground  observations.  The 
overlay  is  annotated  with  the  standard  crop  symbols 
for  each  field  (ref.  3).  These  inventory  packages  are 
completed  by  the  ASCS  personnel  and  forwarded  to 
the  NASA  Johnson  Space  Center  (JSC)  to  be  logged 
and  tracked  by  the  Data  Acquisition,  Preprocessing, 
and  Transmission  Subsystem  (DAPTS). 
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M(.(  Rt  ’.—Aircraft  phoiofraph  o'  I', id  site  segment  I5M, 
"ilkin  ( (.unit.  MillllcsnU,  Jim,.  |,  |d77 


Near  the  time  of  wheat  emergence.  15  fields  in 
each  l SliP  blind  site  are  chosen  and  annotated  on 
the  overlay  by  ASCS  personnel  in  accordance  with 
the  following:  5 helow-average  fields.  5 average 
fields,  and  5 above-average  fields  The  ASCS  person- 
nel identify  the  plant  height  and  ground  cover  of 
each  til  these  fields  at  this  time  Beginning  in  October 
lor  winter  wheat  and  in  May  lor  spring  wheat,  they 
revisit  these  15  fields  in  concert  wnh  the  Landsat 
overpasses  so  that  classification  performance  can  be 
related  to  wheat  field  stands  Observations  are  con- 
tinued until  all  1 5 fields  are  harvested  or  abandoned 
Inventories  for  the  winter  wheat  sites  are  attempted 
twice  (planted  and  harvested!,  but  only  an  at-h.irvest 
inventory  is  obtainable  lor  spring  wheat  because  of 
the  shorter  growing  cycle 


Delineating  Photographs 

fhe  JSC  C artographic  I aboiatory  overlay  s the 
photographs  with  a second  sheet  ol  acetate  and  out- 
lines each  of  the  homogeneous  areas  as  irregular 
polygons  with  .W  or  fewer  vertices  \ homogeneous 
aiea  has  a uniform  cover  type,  examples  area  wheat 
field,  a pond,  pasture,  timber  Curved  field  bound- 
aries are  approximated  by  a series  of  straight  lines 


Assigning  Crop  Code  and  Field  Number 

As  the  polygons  are  being  delineated,  each  one  is 
assigned  a digital  code  (ret  .fi  which  indicates  a par- 


ticular crop  type  determined  by  the  field  personnel. 
A field  number  is  assigned  for  use  in  quality- 
assurance  and  to  enable  efficient  correction  of  errors 
on  the  interactive  disk  data  file 


Digitizing  Field  Vertices 

After  the  fields  have  been  annotated  and  deline- 
ated on  acetate  overlays,  the  polygon  vertices  are 
measured  and  stored  on  the  interactive  drafting 
system  The  vertices  are  measured  sequentially  and 
encoded  together  with  the  field  crop  code  and  num- 
ber The  digitized  results  are  plotted  with  a line  plot- 
ter (tig  8)  for  quality  checking  and  prepared  for  final 
output  and  registration.  The  digitization  of  the  200  to 
12(H)  fields  in  a l AC  Hi  sample  segment  takes  b to  14 
hours  and  is  the  major  throughput  problem  in  the 
Accuracy  Assessment  Subsystem. 


Registration 

To  define  th  • geometric  relationship  of  the  aerial 
photograph  to  the  Landsat  segment,  registration 
coefficients  for  each  photograph  must  be  obtained 
(see  the  paper  by  Rader  and  Vela  entitled  "Cartogra- 
phy Lull's  Spatial  Processor”).  This  procedure 
entailed  selecting  S to  12  points  per  photograph  and 
independently  solving  for  the  coefficients  of  each 
photograph  using  least  squares  techniques  W hen 


H«,t  Kt  X —An  ivpandrd  portion  of  hlmd  vile  wsnirnl  1521 
slum  inti  „ (irrbt  r plot  ol  di||ill/rd  (o  ld  d.lin.ulimit  |„,  qua  lilt 
control. 
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more  than  one  photograph  is  used  in  the  inventory, 
tie  points  arc  necessary  to  make  points  common  to 
both  photographs  occur  on  the  same  Landsat  line 
and  sample  position. 


Conversion  of  Field  Vertices 
to  Unlversel  Format  Tape 

Because  the  LACIE  imagery  and  the  LACIE  out- 
put products  (classification  and  cluster  maps)  arc  in 
universal  format,  which  can  be  read  by  the  software 
of  many  institutions,  the  universal  format  was  used 
for  the  ground-truth  information.  To  perform 
registration  with  accuracy  greater  than  ±1  pixel  and 
to  assess  the  effect  of  boundaries  and  mixed  pixels 
on  the  classification  process,  the  decision  was  made 
to  digitize  (he  ground  truth  with  six  subpixels  com- 
prising each  Landsat  pixel  (fig.  9). 

The  accuracy  assessment  software  (ref  4)  reads 
'he  tape  containing  the  field  vertices  and  determines 
all  the  subpixels  falling  within  each  field.  Each  sub- 
pixel is  assigned  a digital  code  that  identifies  the  crop 
found  in  the  field  observation  Gray-scale  printouts 
of  the  universal  tape  are  produced  at  the  pixel  and 
subpixcl  levels.  These  printouts  arc  checked  lor  ac- 
curacy by  Cartographic  Laboratory,  Accuracy 


KM. I tO  V. — K simple  of  diifilal  ground  trulli.  Pixels  I and  } are 
eslima’.i-d  as  pure,  while  pixels  2 and  4 are  mixed. 


Assessment,  and  CAMS  personnel  against  the 
quality  assurance  plot  and  the  original  photograph  If 
a discrepancy  occurs  for  any  field,  the  cause  is  deter- 
mined, the  disk  data  file  is  updated,  and  a new  tape  is 
produced  and  checked  for  accuracy 


Registration  Acr  jracy 

To  date,  only  a limited  study  has  been  done  on  the 
registration  accuracy  of  this  process  A segment  w ith 
large  fields  in  Oklahoma  (1048.  Cimarron  County) 
and  a segment  with  small  fields  in  North  Dakota 
(1602,  Mountrail  County)  were  misregistered  by 
known  amounts  to  determine  the  number  of  ground- 
truth  pixel  label  assignments  that  would  change  The 
results  for  the  segment  with  large  fields  were  as 
follows. 


Emu 

< 1 mitv 

U 5 pixel 
1 0 pixel 
1 5 pixels 

4 percent  ol  l.ibcls  ol  dots 

8 percent  of  labels  ol2(W  dots 
1 2 percent  ol  labels  ol  dots 

The  results  for  the  segment  with  small  fields  were 
as  follows. 

Emu  ( Vui/vi 

0 5 pixel  H percent  uflaocls ol  2(N  Unix 

1 0 p'xcl  14  percent  of  labels  ol  20V  Uois 

I 5 pixels  19 percent  ol  labels uf  209 u »s 

To  verify  the  registration  accuracy,  detailed  pho- 
lointerpretalion  was  done  for  all  dots  in  1 1 segments 
in  Oklahoma  and  18  segments  in  North  Dakota,  tak- 
ing into  account  the  NASA  Goddard  Space  I hgh: 
Center  (GSFC)  misregistration  from  pass  to  pass 
Comparison  with  the  ground-truth  digital  tape 
showed  4-percent  disagreement  in  Oklahoma  and  10- 
pcrcent  disagreement  in  North  Dakota.  In  accord- 
ance with  the  preceding  tables,  these  values  indicate 
a registration  accuracy  of  about  ±0.5  pixel  lor  the 
digital  ground-truth  produce 


Figure  10  shows  a ground-truth  tape  for  blind  site 
1523  as  imaged  on  the  Image-100  (MOO)  system  The 
1-100  can  read  only  every  third  line  ol  ground-truth 
tape;  therefore,  the  field  boundaries  appear  more 
uneven  than  they  arc  on  the  digital  tape  For  clarity. 


Ground-Truth  Tape  Image 
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WILKIN  COUNTY.  MINNESOTA 


INVENTORY  DATE.  AUGUST  »,  1977 


• SPRING  WHEAT  • SALMON  PINK 

• HARVESTED  SPRING  WHEAT  • LIGHT  PINK 

• OTHER  SPRING  GRAINS  ■ PINK 

• HARVESTED  OTHER  SPRING  GRAINS  • GOLD 

• OTHER  CROPS  (CORN.  SUNFLOWER.  SOYBEANS, 

SUGAR  BEETS)  - RED 

• PASTURE.  GRASS.  HAY  ALFALFA  • BLUE 

• IOLE  FALLOW.  IDLE  COVER  CROPS.  IDLE  RESIDUE  • GRAY 

• NON  AGRICULTURE  (TREES.  HOMESTEADI  • GREEN 

Ht.l  Kt  III. — Picili/iil  crmiitd  truth  rtcislirti)  In  I .iinls.it  iuiat!- 
»'f>  fur  I.A(  If  sit  mini  I52T.  Phase  III  blind  sin-. 

man>  of  the  numerous  ground-truth  classes  arc 
lumped  together  to  form  a superclass,  eg.,  pasture, 
grass,  hay,  and  alfalfa  are  all  displayed  by  the  color 
blue  For  comparison,  the  corresponding  Landsal 
data  are  shown  in  figure  1 1 

The  preparation  and  use  ot  LACIE  accuracy 
assessment  ground-truth  tapes  does  not  require  that 
the  entire  scene  have  contiguous  inventory  Before 
digitization  of  all  held  vertices,  the  entire  scene  is 
made  .nto  one  large  field  containing  the  crop  code 
representing  areas  without  ground  observations  and. 
thereby,  a crop  code  for  each  Landsat  pixel  is 
ensured. 


Evaluation  of  Labeling  Errors 
and  Wheat  Proportion  Error 

Using  these  digital  ground-truth  data,  many  types 
of  labeling  error  analyses  are  performed  routinely  for 
all  LACIH  blind  sites 

I  All  crop  proportions  are  correlated  with  wheat 
proportion  estimation  iccuracy 


2 Analyst  labeling  accuracy  is  determined  lor  all 
labeled  dots. 

3 Sampling  accuracy  is  determined  for  all  209 
dots  and  for  the  subset  labeled  by  the  analyst. 

4 I lisiograrns  of  I ands.it  data  lor  w heat  are  com- 
pared w ith  histograms  of  labeled  dots  to  evaluate  sig- 
natures omitted 

An  example  lor  segment  1523  is  given  in  table  I. 
which  shows  that  labeling  accuracy  and  the  accuracy 
of  estimation  of  small  grains  and  w heat  improved  as 
the  season  progressed 

Moreover,  it  is  not  sufficient  just  to  know  the  ac- 
curacies The  Accuracy  Assessment  group  must  in- 
vestigate the  causes  of  each  mislabeling  of  a dot  In 
this  error  characterization,  a special  analyst  uses  the 
ground  truth  and  the  information  in  the  CAMS 
packet  to  attempt  to  deduce  the  mislabeling  cause 
These  causes  can  be  grouped  into  three  categories 

1 Those  causes  the  analyst  can  do  very  little  to 
correct,  such  as  insufficient  acquisitions,  border/edge 
locations,  and  narrow  fields  near  the  limit  of  sensor 
resolution 

2.  Thosecauscsrcpresentingahnorm.il  signatures 
in  production  film  converter  (PIC)  Product  I (e  g . 
lig.  11).  which  are  inconsistent  with  the  CAMS  pro- 
cedure. such  as  wheat  fields  with  temporal  color  se- 
quences that  do  not  follow  the  wheat  growth  cycle, 
nonwheat  fields  that  do  follow  the  wheat  growth 
temporal  color  sequence,  and  temporal  sequence  sig- 
natures of  wheat  fields  that  are  behind  or  ahead  of 
the  majority  of  the  wheat  fields  in  the  same  acquisi- 
tion 

3 Those  causes  that  arc  merely  interpretation  or 
clerical  errors 


Evaluation  of  Classification  and  Cluster  Maps 

The  clustering  and  subsequent  classification 
results  Iron’  each  ( AMS  run  (fig  I2i  are  transmitted 
to  Accuracy  Assessment  on  digital  tape  The  \c- 
curacy  Assessment  Subsystem  compares  all  these 
products  tor  each  classification,  pixel  by  pixel.  w,th 
the  ground  truth  to  determine  the  accuracy  with 
w hich  they  are  produced  I or  example,  the  classifica- 
tion shown  m ligure  12  was  compared  to  the  ground 
truth  m ligure  10  Ot  the  54  437  srr.all-grains  suhpix- 
els  m the  scene.  46  042  were  correctly  classified  as 
small  grams,  likewise. ot  the  S3  155  non-small-grains 
subpixels.  67  57|  subpixels  were  correctly  classified 
as  non-small-grains  In  fact,  each  ground-truth  class 
is  investigated  to  see  how  much  is  called  small  grains 
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BIOWINDOW  1.  BIOSTAGE  2.5 


BIOWINDOW  1.  BIOSTAGE  1.5 


PLANTED.  APRIL  30.  1977 
BIOWINDOW  2.  BIOSTAGE  3.5 


2 5 TO  4 5 IN.  WHEAT. 
MAY  18. 1977 

BIOWINDOW  2.  BIOSTAGE  4.3 


6.5  TO  13  IN.  WHEAT, 
JUNE  5,  1977 


IB  TO  30  IN  WHEAT. 
JUNE  23.  1977 


Mill  HI  II. — I \<  II  Hum  III  liliii'l  MU  1521.  Wilkin  (nuflty.  MiiiiiimiU 


and  how  much  is  called  non-small-grains  In  this  ex- 
ample, most  oats,  spring  wheal,  and  barley  were  cor- 
rectly called  small  trains,  however,  some  sprint 
wheal  and  oals  were  called  non-small-trains  Some 
corn,  sunflower,  (tras-*.  hay.  and  pasture  were  er- 
roneously included  in  the  small-trams  category 
I iture  1 3 shows  the  location  of  the  15  wheal  fields 
in  blind  .;ic  1523.  These  fields  ar  ? used  to  identify 
the  causes  of  labeling  and  classifying  small  grains  as 
non-small-grains  One  way  in  which  the  15  fields  and 
their  crop  height  and  ground  cover  information  arc 


used  is  to  determine  the  labeling  accur.it.  \ of  dots 
that  fall  within  these  lidds  \nother  use  is  to  in- 
vestigate the  classification  accuracy  if  each  ol  the  15 
fields  In  the  example  segment,  all  the  licit  - had  ex- 
cellent accuracy  (better  than  95  percent  except  two 
fields  which  were  found  to  occur  in  a region  of 
poorly  drained  soil  The  signatures  ol  ihe  wheat 
fields  in  this  soil  were  not  identified  by  the  analyst 
because  no  dots  fell  within  any  wheat  fields  in  this 
area  ol  the  scene  llo  sever,  these  signatures  had 
enough  commonality  with  the  wheat  field  signatures 
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BIOWINDOW  2.  BIOSTAGE  4.3 


BIOWINDOW  4.  BIOSTAGE  6. 
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in  (he  remainder  of  the  segment  to  enable  correct 
identification  of  about  65  percent  of  these  two  fields 
This  example  illustrates  the  diverse  types  of  labeling 
and  classification  problems  that  can  be  investigated 
using  the  15-ficld  information  together  with  the  ac- 
curacy assessment  software. 

Ground-Truth  Spectral  Plot* 

Other  diagnostic  tools  are  available  to  aid  in  the 
determination  of  sources  of  classification  and  label- 


ing errors.  One  of  these  is  the  scattcrplot — a two- 
dimensional  histogram  plot  for  two  fandsai  MSS 
channels  or  for  rotations  of  these  data  (e  g . green- 
ness and  brightness).  Figure  14  shows  such  a plot  for 
spring  wheat  in  the  poorly  drained  soil  in  segment 
1523  Plots  of  this  type  are  useful  in  determining 
spectral  signatures  of  wheat  and  other  crops  in  the 
scene,  as  well  as  within-segment  variability  of  sig- 
na lures  due  to  soil  type  planting  date,  irrigation,  fer- 
tilizer applications,  drought,  atmospheric  effects, 
crop  rotation,  variety,  and  disease 
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TaHI.I  I. — Dot  Labeling  and  Estimated  Proportion  Accuracy  for  Four  Classifications  for 
Segment  1523  as  Compared  to  Ground  Truth 


Acquisition  datclbiouaxi 1 
f/977) 

Estimated 
sprntx  wheat, 
percent 

Estimated 

spriitK  uraitn. 
per  ent 

/Air 

type 

l‘cn  ent 
labeled 

Percent 
correct  small- 
X rainy  dots 

Pt  rcenl 
correct 
other  dots 

April  30  1.5,  May  18  2.5 

11.0 

19  3 

2 

60 

25 

07 

1 

32 

80 

76 

April  30  1.5,  June  5 3 5, 

240 

42  0 

i 

54 

82 

85 

June  24  4 3 

1 

32 

80 

76 

April  30  1 5,  June  24  4 3. 

24.3 

42  5 

2 

51 

79 

83 

July  29  6.0 

1 

32 

93 

100 

Ground  truth 

203 

400 

FIG  l RF.  12. — ( ompuler-gcneratcd  iluslcr  and  classification  maps  for  blind  site  segment  1523,  Wilkin  County,  Minnesota,  (al  l ncon- 
dilional  cluster  map  before  assignment  of  clusters  to  classes;  August  17,  1977.  (b)  Conditional  cluster  map;  black  - threshold,  IK), 
Dt:  yellow  - lionspring  -.mall  grains;  green  - spring  small  grains;  other  - conditional  clusters:  August  17.  1977.  (c)  Classification 
map;  black  threshold;  green  - spring  small  grains;  orange  ~ nonspring  small  grain’ ; July  29.  1977. 
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* KK  I V — Ditiili/rd  crutiml  Irulh  rrcisterrd  tu  ljindsul  imag- 
••r>  f«r  wltikd  spring  wheal  fields  in  I U IV  segment  152.A. 
I’liast-  III  blind  site,  Wilkin  t'uunlv.  Minnesota.  I*»77. 


YIELD  ACCURACY 

lo  support  the  evaluation  of  ihe  production  esti- 
mates in  meeting  the  W/W  criterion  (see  Houston’s 
paper),  the  yield  estimates  must  he  tested  for  bias, 
and  the  accuracy  of  the  estimated  variances  must  be 
determined.  Accuracy  Assessment  (see  the  paper  by 
I’hinney  el  al.  entitled  "Accuracy  and  Performance 
of  Lac  IE  Yield  Estimates  in  Major  Wheat-Produc- 
ing Regions  ol  the  World")  uses  10  or  more  years  of 
independent  temperature  and  precipitation  data  to 
test  the  yield  model  for  each  zone  (state)  for  all 
monthly  truncations  (ref.  5).  During  the  crop  year, 
the  Accuracy  Assessment  group  attempts  to  deter- 
mine the  error  sources  by  investigating  the  modeling 
error  sources— trend,  variable  selection,  and  stability 
of  coefficients— together  w ith  the  measurement  er- 
ror sources— temperature  and  precipitation  (fig  15). 

lo  evaluate  the  contribution  of  measurement  er- 
ror sources,  temperature  and  precipitation  sampling 
errors  are  determined  for  states  for  which  large 
deviations  front  the  ESCS  estimate  are  observed. 
Sampling  error  is  determined  by  comparing  the  ob- 
jective analysis  using  the  dense  network  of 
meteorological  stations  (cooperative  network)  with 
that  using  a manual  synoptic  analysis  of  the  sparse 
network  of  meteorological  stations  (climatic  sta- 
tions) used  operationally  by  ».AC'IE  for  each  climatic 
district.  In  the  example  case  (figs,  lb  and  1 7).  the 
sparse  network  for  Oklahoma  gave  consistently  less 
estimated  precipitation  than  did  the  dense  network. 
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indicate  occurrences). 


which  agreed  closely  with  the  synoptic  LACIE  esti- 
mate (table  II). 

Io  search  out  other  measurement  error  sources, 
the  mean  and  standard  deviation  of  temperature  and 
precipitation  are  plotted  as  a function  of  time  of  y ear 
I he  current-year  monthly  temperature  and  precipita- 
tion are  also  plotted  on  these  charts  for  ease  of  com- 
parison. 

Several  tasks  are  undertaken  to  check  for  model- 
ing error  sources. 

1 Temperature  and  precipitation  are  plotted  ver- 
sus month  for  the  3 highest  yield  years  and  the  3 
lowest  yield  years  from  the  historical  record. 

2 The  trend  term  is  evaluated  by  performing  la- 
tent root  regression  without  allowing  for  trend  and 
calculating  trend  on  the  residuals  from  the  most  sta- 
ble lit  This  procedure  has  the  advantage  of  removing 
the  long-term  changes  of  the  climate  from  the  trend, 
but  it  has  the  disadvantage  of  not  allowing  for  the  in- 
teractions between  climate  and  agricultural  tech- 
nology 
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FIGURE  16. — Objective  analysis  of  May  1977  Oklahoma 
pseudozone  precipitation  (in  inches)  using  14  stations. 


FIGURE  17. — Objective  analysis  of  May  1977  Oklahoma 
pseudozone  precipitation  (in  inches)  using  147  stations. 

3.  The  variable  selection  is  verified  by  adding  the 
current  year  to  the  data  set  and  reselecting  the  varia- 
bles. Latent  root  regression  on  the  pseudozone  yield 
and  meteorological  data  enables  determination  of  the 
most  stable  set  of  variables.  An  example  of  this 
result  for  Oklahoma  (table  111)  shows  little 
difference  between  the  variables  for  the  LACIE 
Phase  III  models  and  the  optimum  variables  for 
statistical  stability 


RESULTS 

The  following  is  a brief  description  of  some  of  the 
results  of  using  the  Accuracy  Assessment  Subsystem 
during  the  3 years  of  LACIE.  More  detailed  discus- 
sion of  these  results  can  be  found  in  symposium 
papers  by  Houston  et  al.,  Phinney  et  al..  Marquis, 
Hickman,  and  Conte. 

In  Phase  I,  operational  estimates  of  wheat  area 
only  were  made  for  the  USGP.  A comparison  of  the 
USDA  ESCS  and  the  LACIE  estimates  indicated 


Table  II. — Analysis  of  Sample  Error  for  May  1977 
Precipitation  in  Oklahoma 


Climatic 

district 

Precipitation ,a 
in. 

Sparse  Dense 

LACE 

synopi'c 

analysis 

South  central 

468 

5 08 

Southeast 

2.31 

3.10 

— 

North  central 

8 05 

908 

— 

Central 

7.34 

8.13 

— 

West  central 

9.01 

1017 

— 

Southwest 

7.33 

8 64 

— 

Northeast 

5.13 

6.02 

— 

State 

7.71 

8.76 

8.90 

“Objective  analysis. 


support  of  the  90/90  criterion  for  winter  wheat.  Sig- 
nificant underestimates  were  found  in  the  spring 
wheat  region,  the  largest  in  North  Dakota.  To  better 
understand  these  differences,  the  blind  site  program 
was  initiated.  A statistical  comparison  of  the  LACIE 
estimate  with  the  ground-truth  data  and  with  the 
ESCS  county  estimates  for  20  blind  sites  in  North 
Dakota  indicated  that  the  classification  accuracy  was 
good  and  that  the  source  of  the  problem  was  sam- 
pling error  (fig.  18).  Because  of  timely  feedback  from 
Accuracy  Assessment,  approximately  20  additional 
sample  segments  were  added  in  Phase  II  to  alleviate 
this  problem. 

In  Phase  II,  estimates  for  all  three  components — 
area,  yield,  and  production — were  made  for  the  first 
time.  The  LACIE  estimates  of  wheat  production 
were  encouraging.  An  overall  accuracy  of  90/75  was 
achieved  in  the  U.S.  Great  Plains  (i.e.,  90  percent  of 
the  time,  the  estimate  was  within  ±25  percent  of  the 
reference).  For  the  winter  wheat  in  the  U.S.  Southern 
Great  Plains,  the  data  indicated  that  the  LACIE  and 


Table  III. — Oklahoma  Yield  Mode I 


Latent  nun 
variables  selected 

l.  U IE  Phase  III 
variables  selected 

October  precipitation 

August-lebruary  precipitation 

March  precipitation 

March  precipilation/cvapotranspiration 

May  precipitation 

May  precipitation 
May  precipitation  squared 

June  precipitation 

June  precipitation 

March  temperature 

— 

June  temperature 

May  degree  days  above  90"  I- 
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I St'S  estimates  of  wheat  area  were  not  significantly 
dtff.'rent.  However,  in  some  states,  the  LACIT  esti- 
mate was  higher  than  the  1 SC'S  estimate,  and  in 
other  states,  the  LACIE  estimate  was  lower  than  the 
1 SC'S  estimate  The  largest  difference  was  in 
Oklahoma,  where  the  l At'll  estimate  of  wheat  area 
was  found  to  he  less  than  the  ESCS  estimate.  The  in- 
vestigation of  20  Oklahoma  blind  sites  indicated  that 
this  difference  was  due  to  the  mislabeling  of  wheat 
signatures  as  nonw  heat  because  of  early  drought  (fig 
I1))  and  the  grazing  of  wheat  by  cattle.  Without  this 
ground-truth  inventory  of  about  700  square  statute 
miles,  the  error  source  would  never  have  been  iso- 
lated. since  no  intensive  test  sites  were  located  in 
Oklahoma  and  the  1 At'll  crop  estimation  system 
had  not  previously  estimated  wheat  acreage  under 
drought  conditions 

for  spring  wheat,  the  problem  encountered  in 
North  Dakota  in  Phase  I was  solved  However,  in 
Minnesota  and  Montana,  the  estimates  of  w heat  area 
and  production  were  low,  which  caused  the  l At'll 
estimates  for  the  l S Northern  tireal  Plains  and  the 
Great  Plains  as  a whole  to  be  significantly  lower  than 
the  I St'S  estimates  The  principal  problem  in  Mon- 
tana was  the  mtsclassificalion  of  strip-fallow  fields 
that  were  loo  narrow  lor  1 andsat  resolution  (llg  201 
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The  use  of  the  blind  sites  indicated  that  the  under- 
estimate of  spring  wheat  area  in  Minnesota  was  the 
result  of  sampling  error,  caused  bv  the  use  of  l^n1) 
data  for  the  sampling,  whereas  an  increase  in  wheat 
acreage  occurred  from  IdbUui  I97(i  However. during 
this  same  crop  year,  this  tendency  to  underestimate 
spring  w heal  area  was  not  observed  in  the  ITS  S R . 
because  the  large  number  of  sample  segments  (200(1) 
placed  in  the  U.s.S.R  gave  a low  sample  error  and 
the  large  fields  in  the  ITS  S R were  considerably 
easier  to  interpret  than  the  small  ITS  and  C anadian 
spring  wheat  fields 

In  Phase  III.  several  steps  were  taken  to  solve  the 
problems  encountered  in  Phase  II  The  number  of 
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sample  segments  in  the  U.S.  Great  Plains  was  in- 
creased to  satisfy  the  required  sampling  accuracy, 
and  a new  muliilemporal  machine  classification  pro- 
cedure was  introduced. 

As  in  Phases  I and  II,  the  final  LACIE  Phase  III 
winter  wheat  production  estimate  lor  the  USGP  sup- 
ported the  L.AC1E  accuracy  goal.  The  LACIE  esti- 
mates of  USGP  spring  wheat  production,  however, 
were  significantly  different  from  the  corresponding 
reference  estimates.  The  LACIE  90/90  accuracy  goal 
was  not  supported  by  the  spring  wheat  production 
estimate  primarily  because  of  the  underestimation  of 
yield,  although  area  also  was  significantly  underesti- 
mated. As  a result,  the  final  LACIE  total  wheat 
USGP  production  estimate  supported  a 90/85  cri- 
terion, marginally  missing  the  LACIE  accuracy  goal 
of  90/90. 

As  in  Phase  1 and  Phase  II,  the  final  LACIE 
winter  wheat  area  estimate  for  the  USGP  was  not  sig- 
nificantly different  (at  the  10-percent  level)  from  the 
corresponding  USDA  ESCS  estimate.  The  final 
LACIE  spring  wheat  area  estimate  for  the  USGP  was 
significantly  smaller  than  the  corresponding  ESCS 
estimate,  but  there  was  great  improvement  in  the 
relative  difference  of  this  estimate  over  the  corre- 
sponding Phase  I and  Phase  II  estimates.  This  im- 
provement is  attributed  to  the  implementation  of  the 
new  classification  procedure.  Procedure  1. 

Based  on  the  blind  site  ground-truth  investiga- 
tions (for  166  sample  segments),  the  primary  source 
of  errors  in  classification  (in  both  spring  wheat  and 
winter  wheat)  was  found  to  be  the  mislabeling  of 
wheat  signatures  as  nonwheat  because  of  (1)  abnor- 
mal signature  development  caused  by  late  planting, 
drought,  grazing,  crop  rotation,  plant  variety,  disease, 
and/or  soil  type;  (2)  inability  to  resolve  small  fields 
using  Landsat  imagery;  and  (3)  lack  of  Lar.dsat  ac- 
quisition for  both  the  postemergencc  stage  and  the 
tillering-to-heading  stage.  In  addition  to  providing  a 
good  understanding  of  U.S.  wheat  labeling  ac- 
curacies, the  extensive  blind  site  analysis  effort  add- 
ed to  the  confidence  in  the  U S S R,  classification  ac- 
curacy since  the  small-grains  fields  in  the  U.S.S.R. 
are  much  larger  and  the  field  signatures  appear  more 
normal  and  homogeneous  (figs.  21  and  22)  than  in 
the  USGP  (fig.  23). 

Unlike  Phases  I and  II,  the  Phase  III  LACIE  total 
wheat  yield  estimate  was  significantly  different  from 
the  corresponding  ESCS  estimate  in  every  month 
because  of  underestimates  for  both  spring  and  winter 
wheat.  The  largest  differences  occurred  in  Oklahoma 


FIGURE  20. — SiKp-fallim  field'  in  Montana.  August  8.  1977. 


and  Texas  winter  wheat  yields  and  in  Minnesota  and 
Montana  spring  wheat  yields.  The  spring  wheat  yield 
errors  were  due  primarily  to  trend  terms  which  failed 
to  account  for  new  varieties  of  whoa'  .n  Minnesota 
and  for  increased  fertilizer  usage  in  Montana  during 
the  past  5 years.  The  winter  w heat  yield  errors  were 
also  due  to  trend  terms  which  failed  to  account  for 
more  wheat  acreage  being  fertilized  in  the  last  two 
decades  in  Texas  and  Oklahoma. 

The  results  of  LACIE  Phase  III  production 
estimation  indicated  that  the  accuracy  goal  of  90/90 
was  achieved  in  'he  U.S.S.R.,  where  the  technology 
was  able  to  identify  the  shortfall  in  the  spring  wheat 
crop  3 months  before  completion  of  harvest  and 
achieved  similar  results  in  the  winter  wheat  regions. 
The  initial  LACIE  estimate  in  August  was  within  6 
percent  of  the  U.S.S.R.  January  28  figure  of  92 
million  metric  tons,  and  the  LACIE  final  modified 
estimate  released  on  January  23  was  within  I per- 
cent. A detailed  examination  of  the  conditions  which 
led  to  the  U.S.S  R.  shortfall  in  spring  wheat  produc- 
tion and  the  response  observed  in  the  LACIE  models 
provided  conclusive  evidence  that  the  LACIE 
forecast  technology  did  indeed  respond  for  good 
reason  and  in  a timely  fashion.  Over  most  of  the 
U.S.S.R.  spring  wheal  regions,  warmer  than  average 
temperatures  predominated  during  the  growing 
season.  An  investigation  of  the  Landsat  data  and  the 
yield  model  response  at  subregional  levels  indicated 
that  the  drought  conditions  were  clearly  observable 
ir.  the  Landsat  data  and  that  the  yield  models  ac- 
curately responded  by  reducing  yield  estimates  in  the 
affected  regions. 
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14  JULY  1976  JOINTING  TO  HEADING 


1 AUG  1976  HEADING  TO  TURNING 


H(il  KK  21. — Kustanai,  I .S.S.k.  si'Kimiil  H224.  spring  wheal  growth  stages. 


CONCLUSION 

T i:e  Accuracy  Assessment  Subsystem  has 
matured  through  the  3 years  of  LACIE.  Early  in 
LACIE,  the  small  amount  of  ground  truth  that  was 
available  precluded  accurate  statistical  estimates  of 
sampling  and  classification  accuracy.  As  the  program 
culminated.  Accuracy  Assessment  was  not  only  able 
to  meet  this  goal  but  was  even  able  to  evaluate  com- 
ponent labeling  errors,  such  as  boundary  effects,  ab- 
normal signatures,  and  lack  of  key  Landsat  acquisi- 


tions. Furthermore,  the  ground-truth  data  processing 
matured  through  LACIE  from  collecting  data  for 
one  crop  (small  grains)  to  collecting,  quality  check- 
ing, and  archiving  data  for  all  crops  in  a LACIE  sam- 
ple segment.  This  data  collection  not  only  assisted 
LACIE  in  determining  causal  factors  but  has  pro- 
vided an  invaluable  data  set  for  new  system  develop- 
ment for  other  crops,  such  as  corn  and  soybeans,  and 
for  the  testing  of  new  classification  systems  for 
Landsat  data 
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11  SEPT  1976  PRE  EMERGENCE 


29  SEPT  1976  PRL  EMERGENCE 


29  DEC  1976  EMERGENT 


FKil'RF.  23. — Very  sum  1 1 winter  nlwil  fields  in  l.ACIF  segment 
1503.  Stanton.  Nebraska.  October  26.  1977. 
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LACIE  Applications  Evaluation  System 
Efficiency  Report 

Timothy  T.  White a 


SUMMARY 

The  LACIE  Applications  Evaluation  System 
(AES)  encountered  significant  increases  in  scope 
over  the  three  phases  of  the  project.  The  increases  in 
scope  and  data  load,  as  well  as  increased  complexity 
and  sophistication,  were  accommodated  without  sig- 
nificant increases  in  operational  resources  by 
developing  and  implementing  a number  of  system 
efficiencies.  In  general,  the  operation  of  the  AES 
which  was  expected  to  be  a major  implementation 
problem  turned  out  to  be  a manageable  one  with  data 
timeliness  being  the  only  major  problem  encoun- 
tered. The  timeliness  of  the  data  never  met  the  ex- 
pected goals  because  of  (1)  a highly  fragmented 
system,  in  which  the  data  accumulated  large 
amounts  of  queue  time  waiting  on  delivery  to  the 
next  function,  and  (2)  the  staffing  for  average  loads, 
which  resulted  in  backlogs  during  peak  loads.  With 
these  exceptions,  the  very  complicated  LACIE 
system  functioned  in  an  increasingly  efficient  and 
productive  manner  in  a state-of-the-art  environment. 
The  experience  of  operating  a global  remote-sensing 
inventory  experiment  produced  valuable  insight  into 
the  operation  of  a future  system. 


INTRODUCTION 

The  operation  of  the  LACIE  system  required  con- 
stant monitoring  and  management  to  keep  all  the  dis- 
parate functions  producing  in  such  a way  as  to  meet 
the  operational  goals  of  the  project.  This  required  the 
implementation  of  a number  of  monitoring  and  effi- 
ciency analysis  tools.  The  efficiency  monitoring 
function  often  afforded  enough  insight  into  the 
operation  of  the  system  to  avert  the  disruption  of  the 
data  flow  caused  by  a situation  that  might  arise  in  the 


aNASA  Jol.nson  Space  Center,  Houston.  Texas. 


operation  of  a given  element.  This  paper  discusses 
the  scope  of  the  three  phases  of  LACIE  and  the 
system  efficiencies  which  had  to  be  implemented  to 
cope  with  the  resulting  Landsat  data  load.  The 
methodologies  used  in  system  analysis,  some  of  the 
specific  data  collected,  and  the  inferences  of  these 
data  and  their  implication  on  future  systems  are  also 
discussed. 


LACIE  PHA8E I 

Phase  I of  LACIE  was  the  most  inefficient  pnese 
operationally  because  a major  portion  of  the  avail- 
able resources  was  being  utilized  for  system  develop- 
ment. The  scope  of  Phase  I was  very  modest — 692 
segments,  of  which  411  were  aggregable  U.S.  Great 
Plains  segments  and  the  remainder  were  exploratory 
sites  distributed  over  the  other  LACIE  countries  and 
intensive  study  sites.  The  Landsat  acquisitions 
utilized  in  Phase  I were  not  as  complete  as  those  used 
in  the  later  phases.  The  Landsat  data  were  not  col- 
lected in  real  time  until  after  April  1,  1975.  The  data 
from  the  first  of  the  crop  year,  September  1974 
through  March  1975,  were  selected  manually  fnm 
the  available  Landsat-1  acquisitions;  and,  typically, 
only  one  or  two  images  were  selected  for  each  seg- 
ment. Real-time  data  were  obtained  after  April  1, 
1975,  from  Landsat-2.  This  scheme  resulted  in  about 
24C0  acquisitions  reaching  the  NASA  Johnson  Space 
Center  (JSC)  for  processing  after  the  Landsat  data 
had  been  screened  for  cloud  cover  and  registered. 
The  NASA  Goddard  Space  Flight  Center  (GSFC), 
operating  five  shifts  per  week,  produced  50  to  75  ac- 
quisitions per  week  and  fell  well  behind  the  incom- 
ing data  load;  turnaround  times  were  running  typi- 
cally 15  days  from  satellite  acquisition  to  completion 
of  GSFC  processing.  In  June  1975,  GSFC  imple- 
mented a 10-shift-per-week  operation;  backlogs  were 
eliminated,  and  turnaround  times  averaged  6 days. 
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The  significant  increase  in  throughput  from  GSFC, 
averaging  170  acquisitions  per  week  during  the  peak 
months  of  June  and  July  1975,  inundated  the  rest  of 
the  AES,  especially  the  analyst  capabilities,  for  much 
of  the  remainder  of  Phase  1. 

The  performance  of  the  preprocessing  functions 
at  JSC  improved  steadily  during  Phase  I as  pro- 
cedures were  refined,  reaching  a nominal  turnaround 
lime  of  5 days  from  GSFC  completion  until  receipt 
by  the  LACIE  Physical  Data  Library  (LPDL)  at  JSC 
(2  of  the  5 days  were  allocated  for  air  transportation). 
The  mejor  bottleneck  at  JSC  was  in  the  classification 
of  the  data.  The  Classification  and  Mensuration  Sub- 
system (CAMS)  analysts  were  required  to  classify 
the  first  acquisition  for  each  of  four  predefined 
biowindows.  This  impacted  processing  considerably 
in  that  the  acquisition  required  for  processing  might 
not  have  had  an  adequate  signature  or  might  have 
contained  confusing  information  which  resulted  in 
much  rework  before  satisfactory  results  could  be  ob- 
tained. The  interpretation  of  the  imagery  and  the 
definition  of  training  fields  was  done  by  14  image 
analysts.  The  segments  were  subsequently  classified 
by  29  data  processing  analysts,  leading  to  very  ineffi- 
cient communications.  In  addition,  unproved  sig- 
nature extension  techniques  were  being  attempted, 
along  with  the  implementation  of  a newly  developed 
batch  processing  system,  which  further  inhibited  the 
data  flow. 

These  situations  caused  a prolonged  analyst  con- 
tact time  of  about  12  hours  per  classification  and  a 
significant  amount  of  rework  of  200  to  300  percent 
(typically  two  attempts  at  batch  processing  and  one 
at  interactive  rework).  Overall,  the  Phase  I CAMS 
processing  produced  only  about  1100  classifications 
(approximately  1.6  estimates  per  segment;  see  table 
I).  The  total  man-hours  required  during  Phase  I to 
process  the  acquisitions  classified  for  each  segment 
ordered  was  a rather  lengthy  25  analyst  hours,  which 
includes  all  rework  and  the  processing  ot  subsequent 
acquisitions. 


LACIE  PHASE  II 

The  scope  of  Phase  11  was  increased  significantly 
by  the  addition  of  680  U.S.S.R.  segments  and  280 
Canadian  segments  to  the  Phase  I scope,  thus  raising 
the  total  to  almost  1700  segments,  2.5  times  that  of 
Phase  I.  The  Phase  II  data  collection  period  was  ex- 
tended, and  data  were  gathered  over  the  entire  crop 
season  in  real  time.  This  produced  more  than  9000 


Table  t, -‘Phase  l Landsat  Processing  Summary 


Hem 

Number 

Analyst  lime 

Segment* 

691 

Acquisition*  at  JSC 

2400 

Acquiiiiion*  not  processed 

1300 

Machine-processed  acquisitions 

1100 

12  hr 

Batch-reworked  acquisition* 

540 

4 hr 

Interactive-reworked  acquisitions 

740 

3 hr 

Time  per  segment 

25  hr 

Average  time  per  acquisition 

!6hr 

Estimates  per  segment 

1.6 

Average  throughput  time 

40  day* 

acquisitions,  almost  4 times  as  many  as  in  Phase  I. 
The  number  of  Phase  II  acquisitions  per  segment  in- 
creased dramatically  from  the  Phase  I value  of  3.5  to 
5.4,  producing  more  dal”  for  the  system  to  contend 
with  but  also  more  information  for  the  analyst  to 
utilize  in  decisionmaking.  GSFC  implemented  a 
nominal  10-shift/week  operating  schedule  in  Phase 
11.  The  peak  processing  load  in  the  summer  brought 
backlogs  of  up  to  a week  to  GSFC  and  required  the 
system  to  operate  at  maximum  capacity  in  June, 
July,  and  August  1976  (17  shifts  per  week  were  re- 
quired to  handle  the  workload).  This  increased  the 
average  weekly  output  for  the  months  of  June  and 
July  to  370  acquisitions,  which  was  twice  as  great  as 
the  Phase  I peak  output. 

In  Phase  1,  it  was  noticed  that  although  initial  ac- 
quisitions were  received  for  some  segments,  subse- 
quent acquisitions  were  not  obtained.  This  problem 
was  diagnosed  as  being  caused  by  bad  reference 
scenes.  A data  management  system  was  imple- 
mented at  GSFC  to  identify  these  segments  pri- 
marily by  noting  the  ones  for  which  fewer  than  the 
average  acquisitions  were  received.  The  bad 
reference  scenes  were  usually  updated  toward  the 
end  of  a phase,  allowing  the  acquisition  efficiency  on 
a per-segmen;  basis  to  improve  eventually  if  the  seg- 
ment was  kept  for  a subsequent  phase. 

The  data  preprocessing  at  JSC  remained 
unchanged  in  Phase  II  except  for  two  hardware  aug- 
mentations. A direct  data  iink  between  GSFC  and 
JSC  was  implemented,  which  eliminated  the  occa- 
sional “misdirection”  applied  to  the  data  when 


282 


shipped  via  commercial  airlines.  A significant  im- 
provement in  the  classification  time  on  the 
LACIE/Earth  Resources  Interactive  Processing 
System  (LACIE/ERIPS)  resulted  from  the  imple- 
mentation of  a parallel  processor  which  reduced  the 
four-channel  classification  time  from  6 to  3 minutes. 
This  reduced  the  required  computer  operations  lime 
in  Phase  II  arid  allowed  a significant  increase  in  com- 
putational complexity  to  be  added  to  the  system  in 
Phase  ill. 

In  Phase  II,  about  one-half  of  the  acquired  Land- 
sat  data  were  unprocessable.  As  shown  in  table  II,  10 
percent  of  the  Phase  II  acquisitions  were  of  poor 
quality  (caused  by  the  presence  of  ha2c,  snow, 
clouds,  etc.),  16  percent  had  preemergence  or  dor- 
mancy (no  wheat  signatures),  19  percent  were  next- 
day  passes  acquired  in  overlap  requires,  and  2 per- 
cent were  acquired  for  segments  located  in  non- 
agricultural  areas.  These  nonagricultural  segments 
were  used  in  the  aggregation  but  were  moved  to 
agricultural  areas  for  Phase  III. 

A number  of  changes  were  made  in  the  CAMS 
analyst  operations  to  improve  operating  efficiency. 
The  image  analysts  and  data  processing  analysts 
were  integrated  into  teams  to  improve  communica- 
tion and  feedback.  Thirty-six  team  equivalents 
resulted,  including  about  a dozen  analysts  who  per- 
formed both  functions.  The  procedure  of  processing 
only  the  first  acquisition  in  a biowindow  (one  of  the 
four  LACIE  data  collection  windows)  was  replaced 
with  one  that  required  the  analyst  to  analyze  and 
process  every  acquisition.  However,  the  analysts 
were  not  required  to  machine  process  every  acquisi- 
tion. If  a small  percentage  of  wheat  existed  in  the 
scene  (less  than  500  resolution  or  picture  elements), 
the  picture  elements  (pixels)  would  be  counted  by 
hand.  If  the  analyst  could  determine  that  the  current 
acquisition  under  examination  had  not  changed  sig- 
nificantly from  a previous  estimate  for  the  segment, 
a "no  change"  would  be  submitted  and  the  previous 
result  would  be  used  in  the  aggregation.  Of  the  ac- 
quisitions obtained,  17  percent  were  machine 
classified,  9 percent  v ere  hand  counted,  and  27  per- 
cent were  determined  to  have  no  significant  change 
from  a previous  estimate. 

A significant  improvement  in  the  total  amount  of 
time  an  analyst  spent  in  processing  a segment — from 
25  hours  in  Phase  I to  1 1 hours  in  Phase  II — was  the 
result  of  a number  of  factors.  A tenfold  reduction  in 
rework  resulted  from  the  improved  communication 
between  the  interpretation  and  the  processing  of  an 
acquisition,  an  improved  operating  system,  and  a 


TaHI.I  II.— Phase  II  Landsat  Proccssinn  Summary 


Item 

Xumhrr 

Anahu  time 

Segments 

1683 

Acquisitions  at  JSC 

9150 

Acquisitions  screened  but  not  proc- 
essed due  to— 

Poor  image  quality 

900 

Preemergence  or  dormancy 

1454 

Nest-day  pass 

1738 

Nonagricultural 

183 

Proportion  estimates 

Hand-counted  acquisitions 

823 

2.5  hr 

Machine-processed  acquisitions 

1555 

7 hr 

Acquisitions  reworked  by 

machine  processing 

389 

2 hr 

No  significant  change 

2470 

2 hr 

Total  time  per  segment 

II  hr 

Average  lime  for  machine 

processing 

6 hr 

Average  time  per  estimate 

4 hr 

Estimates  per  segment 

29 

Average  throughput  lime 

33  days 

good  quality  assurance  program  instituted  during  the 
phase.  The  analyst  contact  time  for  machine 
classification  of  a segment  was  reduced  to  6 hours  by 
the  implementation  of  the  team  approach,  improved 
systems  and  procedures,  and  experience.  The  use  of 
the  no-change  criterion  allowed  the  analyst  to  pro- 
vide an  estimate  without  spending  the  time  to 
machine  process  the  segment.  This  reduced  the 
average  analyst  time  involved  in  producing  an  esti- 
mate to  4 hours,  with  an  average  of  three  estimates 
being  produced  for  each  segment  in  Phase  II. 

LACIE  PHASE  III 

The  Phase  111  scope  was  much  larger  than  Phase  II 
with  about  17  000  acquisitions  for  3000  segments — 
almost  6 acquisitions  per  segment  compared  to  5.4  in 
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Phase  I!  and  3.5  in  Phase  I.  The  biowindows  were 
lengthened  again  in  Phase  III,  opening  earlier  to  ob- 
tain the  acquisitions  during  seedbed  preparation  and 
remaining  open  longer  during  the  winter  dormancy 
period.  The  GSFC  improved  its  throughput  by  about 
30  percent  by  eliminating  the  visual  screening  step  in 
its  procedure,  by  reducing  the  scene  extraction  con- 
straints. and  by  relaxing  some  quality  assurance  con- 
straints. With  this  increased  efficiency  and  operating 
17  shifts  per  week,  the  GSFC  peak  output  in  June 
and  July  1977  averaged  610  acquisitions  per  week — a 
65-percent  increase  over  Phase  II.  Even  with  this  in- 
creased capacity,  a 2-week  backlog  (21-day  turn- 
around time)  was  encountered  at  GSFC  during  the 
peak  processing  period. 

The  preprocessing  steps  at  JSC  remained  the 
same,  although  some  support  resources  were  in- 
creased. This  was  required  because  of  the  increased 
data  load  and  because  a number  of  new  products 
were  added  to  the  analyst  repertoire,  such  as  cluster 
maps,  dot  overlays,  bias-correction  information, 
green  numbers,  spectral  plots,  feature  selection,  and 
trajectory  plots.  The  new  products  increased  the 
computer  time  for  a four-channel  classification  back 
to  the  Phase  I level  of  5 to  6 minutes.  Late  in  Phase 
111,  8-  and  16-channel  multitemporal  classifications 
were  taking  as  long  as  10  to  12  minutes.  To  a large  ex- 
tent, this  was  caused  by  the  feature  selection  portion 
of  the  processing  logic. 

The  classification  function  again  underwent  a 
number  of  changes  in  Phase  III.  The  team  concept 
was  replaced  with  about  30  individual  analysts  who 
were  regionalized  into  two  areas.  One-half  of  the 
analysts  were  involved  in  processing  about  600  U.S. 
segments,  and  the  other  half  processed  2000  U.S.S.R. 
segments.  The  processing  strategy  also  was  changed. 
All  proportion  estimates  were  to  be  the  result  of 
multitemporal  machine  classifications,  requiring  at 
least  two  acquisitions,  with  initial  processing  being 
deferred  until  emergence  was  detected.  A priority 
system  was  employed  to  process  the  segments  which 
were  needed  most  for  an  upcoming  aggregation.  This 
Phase  III  processing  strategy  produced  5000  machine 
processings.  This  represented  about  1.7  classifica- 
tions for  each  segment  ordered  in  Phase  III,  which 
was  similar  to  the  1 .6  estimates  per  segment  in  Phase 
1.  Although  it  was  somewhat  less  than  the  2.9  esti- 
mates encountered  in  Phase  II  inc'  iding  the  no- 
change estimates,  it  was  slightly  more  than  the  1.4 
estimates  per  segment  in  Phase  II  excluding  the  no- 
change  estimates. 


The  Phase  III  processing  results  are  shown  in  ta- 
ble III.  Some  differences  between  Phase  I1  and  Phase 
III  »■'.  noticeable.  The  percent  of  the  total  acquisi- 
tions “screened  and  not  processed"  increased  signifi- 
cantly in  Phase  III.  This  occurred  for  two  reasons: 
(I ) some  Phase  III  end-of-season  acquisitions,  which 
were  not  processed  due  to  resource  limitations,  were 
included  in  the  “not  processed"  category;  and  (2) 
backlogs  and  priority  processing  in  Phase  III  often 
caused  several  unprocessed  acquisitions  to  be  en- 
countered in  a packet.  Usually  only  the  one  expected 
to  produce  the  best  estimate  was  classified  and  tne 
remaining  acquisitions  were  statused  as  a multiple 
acquisition  in  addition  to  any  next-day  passes  en- 
countered. This  increase  in  unprocessed  acquisitions 
corresponds  to  a decrease  in  the  percentage  of  total 
acquisitions  for  which  estimates  were  passed  in 
Phase  III  (30  percent)  compared  to  Phase  II  (53  per- 
cent). It  can  also  be  noted  that  the  problem  encoun- 
tered in  Phase  II  of  having  segments  located  in  non- 
agricultural  areas  was  essentially  eliminated  in  Phase 
III. 


Tahu  III. — Phase  III  Landsai  Pmcssing  Summary 


lirm 

\umhfr 

Analyu  Innr 

Segment* 

2W 

Acquisitions  at  JSC 

16  640 

Acquisition*  screened  but  not 
processed  due  to — 

Not  processed 

4 041 

Prccmcrgence  or  dormancy 

32IJ 

Multiple  acquisition* 

4 382 

Nonagricultural 

16 

Proportion  estimate* 

Machine  classified 

4 088 

3 Hr 

Rework 

«;  10  percent 

Total  lime  per  segment 

5 hr 

Average  time  per  acquisition 

3 hr 

Estimates  per  segment 

17 

Average  throu*  to  time 

S3  da>* 
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The  implementation  of  Procedure  I (P-l)  in 
Phase  III  made  the  analyst's  job  easier  by  eliminating 
some  mechanical  tasks  previously  required  and  by 
providing  a number  of  new  products  to  aid  in  the 
analysis.  The  average  contact  time  was  reduced  sig- 
nificantly  in  Phase  lit  to  an  average  of  3 hours  per 
segment,  although  other  portions  of  the  system  were 
burdened  somewhat  with  the  increased  number  of 
products  Another  efficiency  of  P-l  was  the  use  of 
the  stratified  areal  estimation  procedure,  which 
allowed  the  analyst  to  correct  for  misclassification 
without  reworking  the  segment.  The  rework  rate  in 
Phase  III  was  reduced  to  a negligible  level,  with  the 
rework  encountered  consisting  primarily  of  hatch  job 
format  errors. 


DEMONSTRATION  OF  OPERATIONAL 
TECHNOLOGY 

Because  of  the  scope  of  L ACIE  and  the  magnitude 
of  the  data,  the  project  encountered  a number  of  new 
situations  related  to  the  development  of  a fully 
operational  remote -sensing  system.  The  following 
sections  describe  some  of  these  situations,  the  effi- 
ciency monitoring  associated  with  them,  the  impor- 
tance of  these  kinds  ol  data,  and  their  relationship  to 
future  systems. 


Data  Acquisition 

The  ordering,  acquiring,  quality  screening,  extract- 
ing. and  registration  of  Landsat  data  results  in  an 
unpredictable  quantity  of  data  because  of  the 
difficulty  in  forecasting  weather  conditions  and  the 
problems  associated  with  registration.  However,  it  is 
necessary  to  estimate  an  approximate  data  load  as  a 
function  of  time  m order  to  schedule  resources. 

A computer  prediction  model  was  developed  to 
estimate  the  number  of  acquisitions  to  be  expected 
over  a given  area  for  a given  time,  considering  the 
satellite  parameters  and  historical  weather  informa- 
tion. The  Landsat  acquisition  frequency  for  each 
defined  segment  is  calculated  using  simple  spherical- 
vector  equations  and  data  such  as  orbit  inclination, 
repeat  cycle  in  days  and  revolutions,  and  image 
frame  si/e  The  latitude  and  longitude  of  the  target  of 
interest  and  the  period  of  time  over  which  the  data 
arc  to  be  calculated  are  input  The  attrition  of  the 
satellite  acquisitions,  because  of  cloud  cover,  was 


determined  using  the  cloud  cover  data  obtained  from 
the  US.  Air  Force  Environmental  Technical  Ap- 
plications Center  (USA L-LT AC).  The  climatological 
average  number  of  days  per  month  with  leas  than  2$ 
to  30  percent  (depending  on  available  data)  cloud 
cover,  closest  tc  the  Landsat  pass  time,  was  used. 

Although  relatively  simple  in  structure  and  opera- 
tion. the  model  has  proven  quite  satisfactory  for 
estimating  the  data  to  be  encountered  in  LACIE— 
with  a couple  of  caveats.  First,  the  area  of  coverage 
must  be  rather  large  (e.g..  encompassing  several  con- 
tinents). This  provides  an  averaging  effect  so  that  the 
acquired  data  over  the  short  period  of  a month  or 
two  will  be  accurately  estimated.  If  the  area  for 
which  predictions  arc  to  be  made  is  a country  or  a 
small  portion  of  a continent,  the  predictions  tend  to 
be  reasonable  only  over  a minimum  period  of  6 to  12 
months.  The  model  docs  not  predict  with  any  ac- 
curacy the  coverage  to  be  expected  over  a small  area 
(e.g..  a state)  Tor  a short  period  of  time  (e.g.,  a 
month)  because  of  the  variability  of  local  weather. 

The  cloud  covcr/acquisition  prediction  model  out- 
put is  reduced  to  compensate  for  attrition  ir.  the 
system  because  of  data  quality  and  because  of  ac- 
quisitions failing  registration.  This  affects  only  about 
15  percent  of  the  data  acquired  by  the  satellite  The 
(»SFC  processing  routinely  resulted  in  the  following 
output:  50  percent  lost  to  clouds  and  snow.  10  per- 
cent lost  to  registration.  5 percent  lost  to  quality  con- 
trol. and  30  percent  being  sent  to  JSC  for  LACIF 
processing. 

The  10  percent  of  the  Landsat  acquisitions  lost 
because  of  ladure  to  register  to  the  reference  scene 
might  be  considered  insignificant.  <r  some  cases,  it 
w as.  hut  many  of  the  misregistrations  were  caused  by 
bad  reference  scenes.  When  this  occurred,  a segment 
could  not  be  used  for  aggregation,  since  one  or  two 
acquisitions  over  a growing  period  seldom  allowed 
the  analyst  to  produce  a reliable  estimate.  The  loss  of 
aggrcgablc  segments  will  cause  the  sampling  bias  to 
increase  or  require  additional  samples  to  be  ordered 
in  subsequent  years  to  compensate  for  these 
problems. 

An  analysis  of  the  reference  scene  problem  was 
performed  to  determine  what  did  or  did  not  make  a 
good  reference  scene,  and  no  definitive  answer  was 
obtained.  Better  reference  scenes  seemed  to  be  ob- 
tained early  in  the  growing  season,  but  no  criterion 
was  established  lor  I.ACTE.  This  problem  was 
handled  operationally  by  monitoring  the  acquisition 
history  of  cash  segment  and  flagging  the  abnormal 
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ones.  This  generally  occurred  after  much  of  the 
growing  season  had  been  completed  so  that  the 
changing  of  a reference  scene  generally  helped  only 
in  the  next  crop  year,  assuming  the  scene  was  not 
moved  for  the  next  year.  Hopefully,  the  new  full- 
frame  registration  system  being  implemented  at 
GSFC,  which  is  based  on  ground-control  points  in- 
stead of  reference  scenes,  will  eliminate  this  kind  of 
problem  from  future  large-scale  remote-sensing  in- 
ventory systems. 

One  additional  issue  is  related  to  the  acquisition  of 
Landsat  data;  i.e.,  what  are  the  right  data  to  order 
and/or  extract?  Initial  LACIE  plans  called  for  only 
the  first  acquisition  in  a biowindow  to  be  analyzed.  It 
was  quickly  realized  that  crop  growth  stages  could 
not  be  predicted  with  any  accuracy  in  real  time,  and 
the  set  of  acquisitions  needed  by  the  analyst  spanned 
the  entire  growing  season  plus  a month  or  so  before 
and  after.  For  areas  where  winter  crops  are  grown, 
data  were  collected  year-round;  for  spring  crops,  data 
were  collected  for  most  of  the  year.  It  was  shown  in 
Phase  II  and  Phase  III  that  the  early-season  or 
preseason  acquisitions  were  the  most  important  in 
observing  crop  rotation,  seedbed  preparation,  and 
initial  crop  emergence.  In  future  programs,  some 
changes  to  data  collection  might  be  considered.  First, 
the  data  collection  might  be  reoriented  to  earlier  ac- 
quisition dates  to  focus  more  attention  on  the  transi- 
tion period  between  crops.  This  would  provide  in- 
sight into  planting  dates,  changes  from  the  last  to  the 
current  year,  and  initial  growth  stages  of  the  current 
year’s  crop.  This  kind  of  strategy  might  entail  con- 
siderable overlap  between  two  crops,  whereas  the 
current  LACIE  approach  does  not  allow  much  over- 
lap due  to  data  base  storage  constraints.  Secondly, 
since  the  early-season  data  are  so  important — almost 
to  the  point  that,  if  they  are  not  obtained  for  a given 
segment,  the  segment  is  not  usable — a more  dense 
early-season  segment  population  might  be  planned, 
and  those  segments  which  do  not  have  an  adequate 
early-season  acquisition  history  could  be  eliminated 
from  the  system.  This  approach  would  be  imple- 
mented easily  for  winter  crops;  however,  because  of 
the  shortness  of  the  growing  season,  it  is  question- 
able whether  it  would  be  practical  for  spring  crops. 


Analyst  Contact  Time 

The  need  for  collecting  detailed  analyst  time  as  a 
function  of  task  was  established  early  in  LACIE. 
Midway  through  Phase  II  and  continued  through 


Phase  III,  a system  was  implemented  whereby  each 
analyst  would  record  the  start  and  stop  time  for  each 
major  step  in  the  analysis  procedure.  These  were  ac- 
cumulated and  analyzed  monthly.  A summary  of  a 
portion  of  the  data  collected  is  shown  in  table  IV. 
The  improvement  through  the  implementation  of 
P-I  is  clearly  shown.  The  reduction  in  interpretation 
time  resulted  in  the  ease  of  labeling  prelocated  dots 
compared  to  selecting  and  identifying  fields  and  in 
the  elimination  of  the  need  to  extract,  reformat,  and 
verify  field  coordinates.  These  improvements  saved 
the  analyst  almost  4 hours  of  time.  However,  P-1 
postclassification  results  included  cluster  maps, 
spectral  plots,  etc.,  which  required  slightly  more 
evaluation  time  and  time  to  perform  the  stratified 
areal  estimate.  The  net  decrease  in  analyst  contact 
time  was  3.3  hours.  This  saving  may  have  been  offset 
slightly  by  the  time  used  by  the  additional  clerical 
help  needed  to  assemble,  status,  and  distribute  the 
additional  products.  However,  since  analyst  skills  are 
critical  in  such  a project,  any  improvement  in  analyst 
contact  time  is  a significant  asset  to  the  project. 

Two  other  important  observations  can  be  made 
from  the  analyst  contact  time  and  should  be  kept  in 
mind  when  future  project  planning  is  considered. 
The  amount  of  analyst  contact  time  required  to  ini- 
tially process  a segment  depends  on  the  conditions  of 
that  processing;  i.e.,  how  many  and  which  acquisi- 
tions are  available  for  analysis  and  whether  the  seg- 
ment had  been  processed  in  a previous  year.  In  some 
situations,  the  analyst  may  take  twice  the  average 
time  to  initially  process  a segment.  It  may  be  neces- 
sary for  the  analyst  to  become  familiar  with  histori- 
cal data  for  the  area,  to  verify  the  location  of  the  seg- 
ment, to  establish  the  planting  date  for  crop  calendar 


Table  IV. — Analyst  Contact  Time “ 


Phase  II 

Phase  III 

Ira winn  Licit 1 Procedure 

ft  iH  VliWC  l 

Task  Time,  hr 

Task  Time.  In 

Interpretation  4.3 

Interpretation  1.9 

Del  Foster  .6 

Reformatting  .b 

Verification  .3 

Evaluation  .7 

Evaluation  1.3 

Total  6.5 

Total  3.2 
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us.tge,  or  to  perform  .1  \ arici>  of  housekeeping  ac- 
ti\ itics  such  as  scribing  aiul  taping  imagery 

1 low  over,  alter  the  initial  processing,  a general  reduc- 
tion  in  eoniaet  time  ol  ’0  percent  over  a .1-  or  4- 
montli  per  101I  would  not  he  unusual  because  the 
analyst  would  have  to  m ike  onh  minor  changes  to 
previously  labeled  Mgnaiun...  as  oppo.^wu  to  labeling 
the  entire  scene 

Secondly,  analyst  contact  time  (lor  I*- 1 1 is  related 
to  the  Held  si/e  and/or  signature  complexity  of  the 
scene,  for  example,  the  analyst  contact  time  in 
Phase  III  lor  II.S.S  R spring  wheat  segments,  which 
typically  consisted  of  very  large  fields,  required  only 

2 hours  as  opposed  to  spring  wheat  segments  in  the 
US  fir  eat  Plains  (typically  smaller  fields),  where  the 
analyst  contact  time  averaged  4 hours  l itis  variation 
from  country  to  country  will  have  a bearing  on  how 
analyst  resources  are  allocated  It  is  also  important  to 


future  system  evaluations  lor  which  testing  is  per- 
formed in  one  country,  inasmuch  as  the  results  may 
not  he  directly  applicable  to  other  areas  l uither- 
more,  since  most  of  the  areas  to  he  encountered  in  a 
global  inventory  program  tend  to  be  as  complex  as 
U S agriculture  01  more  complex,  such  as  India  and 
Uhina.  there  may  he  a need  for  additional  modifica- 
tions to  I’-l  to  reduce  contact  time 


System  Throughput  Time 

I he  throughput  goal  of  I U II  was  14  days  from 
1 atuLsat  acquisition  to  having  a proportion  estimate 
available  for  aggregation  This  was  predicated  on  an 
around-the-clock  operation  (24  hours  per  day . ''  days 
per  week)  I Ins  goal  was  met  only  in  isolated  cases 
when  data  were  processed  through  the  system  in  the 
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optimum  manner.  The  vast  mtyority  of  the  data  took 
about  29  days  to  flow  through  the  system.  This  was 
due  to  several  causes.  First,  the  LACIE  system  was 
fragmented,  with  most  major  components  being  lo- 
cated in  geographically  separated  areas.  In  addition, 
manual  methods  were  used  to  transfer  data  from  one 
component  to  another  (courier,  air  transportation, 
mail).  A second  problem  in  attaining  the  throughput 
time  was  that  a number  of  the  processing  compo- 
nents operated  only  8 hours  per  day,  5 days  per  week. 
Figure  1 shows  the  typical  flow  of  data  in  the  LACIE 
system.  As  indicated,  6 of  the  29  days  were  lost  due 
to  weekends  and  an  additional  10  days  resulted  from 
overnight  holds.  The  resulting  13-day  in-process  time 
indicates  that,  if  a 3-shift-per-day  7-day-per-week 
operation  could  be  implemented,  the  14-day  turn- 
around goal  could  be  attained. 

The  nominal  29-day  turnaround  time  was  ex- 
ceeded during  the  peak  processing  months  (May 
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FIGURE  2. — -LACIE  acquisitions  cspvctcd  at  JSC. 


through  August)  of  each  LACIE  phase.  As  shown  in 
figure  2,  the  data  acquisition  rate  for  the  LACIE  pro- 
ject was  not  uniform,  and  a large  influx  of  data  was 
encountered  each  summer.  Since  most  of  t te  LACIE 
components  were  staffed  for  the  average  loads,  the 
peak  loads  caused  major  backlogs  at  GSFC  and  in  the 
classification  subsystem.  These  backlogs  caused  the 
average  turnaround  time  to  reach  40  days  in  Phase  I, 
33  days  in  Phase  II,  and  SO  days  in  Phase  III  (because 
of  the  enormous  peak  encountered  in  Phase  111).  One 
major  reason  that  the  Phase  II  turnaround  time  was 
short  compared  to  Phases  I and  III  was  because  of 
the  implementation  of  the  no-change  and  hand- 
count  procedures,  which  produced  almost  70  percent 
of  all  the  uggregable  estimates  in  Phase  II.  These  pro- 
cedures did  not  require  the  additional  7 to  10  days 
necessary  to  submit  a batch  job  and  receive  and 
evaluate  the  results,  thereby  reducing  the  average 
turnaround  time  considerably. 

Several  situations  encountered  in  LACIE  will 
likely  be  encountered  in  future  systems.  First  of  all, 
the  reliability  of  each  of  the  various  system  compo- 
nents was  reasonably  good.  However,  with  so  many 
components  in  the  system,  the  chances  were  that  at 
any  given  lime  one  of  them  would  be  inoperable,  im- 
peding the  How  of  data.  Secondly,  if  the  data  process- 
ing cycle  is  greater  than  the  satellite  data  acquisition 
cycle,  incoming  imagery  will  very  likely  have  to  be 
placed  on  hold  while  previous  acquisitions  are  being 
completed,  thereby  impacting  the  turnaround  time 
of  the  incoming  acquisitions.  Finally,  unless 
resources  are  dedicated  around  the  clock  and  in  suffi- 
cient quantity  to  meet  peak  load  demands  in  a fully 
operational  remote-sensing  data  processing  system,  a 
project  having  sizeable  scope  will  most  likely  not 
maintain  a real-time  operation  during  peak  acquisi- 
tion periods. 
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Data  Processing  Systems  Design 


FOREWORD 

In  general,  the  remote-sensing  data  processing 
facilities  and  software  available  prior  to  LACIE  were 
primarily  suited  to  the  support  of  essentially  ‘'small" 
research  users  in  laboratory  environments.  In  dis- 
tinction, the  LACIE  requirements  for  a quasi-opera- 
tional  high-throughput  and  high-volume  data  system 
represented  a considerable  departure  from  the 
capabilities  of  these  predecessor  installations,  both  in 
capacity  and  in  organization.  The  objectives  of  this 
session  are  to  review  those  key  conceptual  and 
design  issues  identified  and  addressed  in  the  estab- 
lishment of  an  effective  and  economical  LACIE  data 
processing  system,  and  to  extrapolate  from  this  ex- 
perience toward  the  computational  support  of 
analogous  future  programs.  This  is  accomplished  in 
the  following  papers  through  focused  discussions  of 
specific  elements  of  several  ciitical  subsystems 
rather  than  an  attempt  to  be  exhaustive  or  complete 
in  the  presentation  of  all  system  elements.  The 
evolution  of  the  addressed  elements  from  the  pred- 
ecessor installations  (where  they  existed)  to  their 
ultimate  form  in  the  Applications  Evaluation  System 
(AES)  is  traced.  The  final  papers  in  the  session  draw 
inferences  for  future  study  and  development.  The 
reader  is  referred  to  two  papers  in  the  "Proceedings 
of  the  Plenary  Session"  for  the  context  in  which  the 
work  described  in  the  current  session  has  been  per- 
formed; viz,  “The  LACIE  Applications  Evaluation 
System:  A Design  Overview"  and  “Data  Processing 
Systems  in  Support  of  LACIE  and  Future  Agri- 
cultural Research  Programs."  Additional  descriptive 
papers  pertinent  to  the  operations  of  a number  of 
these  subsystems  can  be  found  in  the  System  Imple- 
mentation and  Operations  Session  elsewhere  in  this 
document. 

Several  of  the  papers  in  this  session  pertain  to  the 
development  of  the  Earth  Resources  Interactive 


Processing  System  (ERIPS),  which  was  used  as  the 
principal  computational  vehicle  for  the  Classification 
and  Mensuration  Subsystem  (CAMS)  of  the  AES. 
The  ERIPS  was  a representative  pre-LACIE  capa- 
bility designated  as  a facility  to  be  upgraded  for 
LACIE  support;  many  elements  of  the  original 
design  of  this  system  were  subject  to  revision  for 
satisfactory  LACIE  operations.  The  introductory 
paper  ‘‘LAC1E/ERIPS  Software  System  Summary” 
provides  the  necessary  background  for  the  other 
papers  by  outlining  the  historical  development  of  the 
total  system.  The  paper  entitled  “The  LACIE  Data 
Bases:  Design  Considerations"  discusses  the  founda- 
tions and  behavior  of  the  ERlPS-related  mass  disk 
data  base  on  which  all  crop-year  imagery  and  much 
related  ancillary  data  were  maintained.  The  paper  en- 
titled “Man-Machine  Interfaces  in  LACIE/ER1PS" 
treats  the  sometimes  difficult  conversion  from  the 
research-oriented  interactive  environment  to  the 
batch  production  requirements  of  LACIE.  Finally, 
the  paper  entitled  “Very  High  Speed  Processing:  Ap- 
plicability of  Peripheral  Devices  to  Pixel-Dependent 
Tasks"  outlines  the  solution  to  the  critical  problem 
of  processing  speed  characteristic  of  image  analysis 
as  implemented  in  the  ERIPS. 

The  two  papers  entitled  “Cartography:  LACIE's 
Spatial  Processor"  and  “Considerations  for  Design  of 
Future  Research  and  Development  Interactive  Im- 
age Analysis  Systems"  specify  other  significant  areas 
that  require  future  work  for  the  establishment  of  a 
satisfactory  crop  inventory  system. 

The  final  papers  in  the  session.  “A  Look  at  Com- 
puter System  Selection  Criteria"  and  “Cost  and  Per- 
formance Characteristics  of  Data  System  Configura- 
tions for  Processing  Remotely  Sensed  Data,"  assess 
requirements  for  subsequent  inventory  systems  and 
techniques  for  evaluating  and  satisfying  these  re- 
quirements in  computing  organizations. 
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Cartography:  LACIE’s  Spatial  Processor 

M.  L.  Rader a and  R.  R.  Vela a 


ABSTRACT 

The  Cartographic  Laboratory  of  the  NASA 
Johnson  Space  Center  Earth  Observations  Division 
is  responsible  for  satisfying  the  spatial  processing 
needs  of  LACIE.  These  needs  include  locating 
agricultural  test  sites  and  registering  ground-truth 
data  to  Landsat  imagery.  This  paper  discusses  the 
technical  aspects  of  the  LACIE  cartographic  support, 
the  unique  need  for  cartography  in  satellite  crop  sur- 
veys, and  proposed  improvements  which  would 
enhance  the  cartographic  support  of  future  pro- 
grams. 


TRANSITION  FROM  PHOTOGRAPHIC 
TO  DIGITAL  IMAGE  PROCESSING 

From  the  outset,  LACIE  was  constrained  to  use 
Apollo  resources  that  had  application  to  remote  sens- 
ing. The  cartographic  capability  was  one  resource 
which  did  have  significant  commonality  with  remote 
sensing;  it  was  therefore  merged  into  the  Earth  Ob- 
servations Division.  This  transition  into  remote 
sensing  created  several  problems  for  the  Car- 
tographic Laboratory.  Many  skills  needed  for  tradi- 
tional photographic  mapping  were  unnecessary  in 
digital  image  processing.  The  concept  of  an 
electronic  resolution  element  (a  pixel;  was  foreign  to 
the  cartographer  who  was  accustomed  to  continuous- 
tone  photographs.  Some  cartographers  did  have  ex- 
perience in  automated  data  processing;  however, 
computer  processing  of  digital  images  was  signifi- 
cantly different  from  Apollo  computer  cartography. 
Image  processing  was  a problem  of  processing  large 
data  sets,  whereas  mapping  was  primarily  a problem 
of  programing  equations. 

Another  problem  was  the  obsolescence  of  the 
electro-optical  equipment  used  in  processing  photo- 
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graphic  imagery.  Because  electronic  images  such  as 
those  from  Landsat  are  optimally  processed  in  the 
computer,  the  need  for  some  of  the  elaborate  car- 
tographic electro-optical  equipment  was  eliminated; 
however,  the  burden  on  the  NASA  computer 
facilities  was  increased. 

Notwithstanding  these  problems,  the  Car- 
tographic Laboratory  made  significant  contributions 
to  LACIE,  including  test  site  location  and  the 
measurement  of  LACIE  performance.  This  paper 
will  outline  these  contributions  and  explore  potential 
areas  for  future  contributions. 


THE  CARTOGRAPHIC  ROLE  IN  LACIE 

The  LACIE  process  flow  (fig.  1)  involves  three 
primary  tasks:  (I)  test  site  selection,  (2)  classifica- 
tion and  yield  computation,  and  (3)  performance 
measurement  (accuracy  assessment)  of  the 
classification  and  yield  computations.  The  Car- 
tographic Laboratory  has  primarily  supported 
LACIE  operational  tasks,  including  test  site  selection 
and  performance  measurement;  it  has  also  con- 
tributed to  supporting  research,  including  that  in 
yield  estimation. 
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Test  Site  Selection 

To  measure  world  wheat  production,  it  was 
necessary  to  select  a set  of  statistically  significant  test 
sites  lor  the  world’s  wheat-growing  regions.  The  Car- 
tographic Laboratory  performed  a significant  part  of 
this  task.  The  first  step  was  to  delineate  farming 
regions  on  1:1  000  000-scale  Operational  Navigation 
Charts  (ONC’s)  over  major  wheat-growing  regions 
such  as  the  U.S.S.R.,  Canada,  and  the  United  States. 
These  farm  regions  were  delineated  from  mosaics  of 
Landsat  full-frame  film  products  which  were  at  the 
ONC  1:1  000  000  scale. 

Crop  Reporting  District  (CRD)  boundaries  were 
also  transferred  to  the  ONC  map  base.  CRD's  are  the 
units  for  which  agricultural  statistics  are  compiled, 
such  as  the  county  system  in  the  United  States  where 
a county  agent  gathers  and  reports  crop  data  to  the 
U.S.  Department  of  Agriculture  (USDA).  Foreign 
countries  have  similar  systems,  the  boundaries  of 
which  were  also  transferred  to  the  ONC  map  base. 

These  data,  along  with  certain  meteorological  and 
soils  data,  were  then  used  to  determine  the  strategic 
location  of  test  sites  in  the  wheat-growing  regions. 
The  Cartographic  Laboratory  located  the  center  of 
each  test  site,  marked  it  on  the  ONC,  and  interpo- 
lated the  latitude  and  longitude  of  the  site  center.  The 
latitude  and  longitude  center  was  then  used  by  the 
NASA  Goddard  Space  Flight  Center  (GSFC)  to  strip 
out  a 5-  by  6-nautical-mile  Landsat  image  for  the  test 
site. 

Measuring  LACIE  Performance 

To  determine  the  accuracy  of  the  LACIE  wheat 
production  computations,  it  was  necessary  to  select  a 
limited  number  of  the  5-  by  6-nautical-mile  LACIE 
test  sites  where  ground  truth  could  be  gathered  for 
comparison  with  the  LACIE  results.  These  special 
ground-truth  sites  were  designated  “blind”  sites  and 
were  located  throughout  the  United  States  and 
Canada,  where  ground  truth  could  be  obtained  with- 
out diplomatic  problems.  The  ground  truth  is  ac- 
tually a set  of  aerial  photographs  annotated  in  the 
Held  by  USDA  Agricultural  Stabilization  and  Conser- 
vation Service  ( ASC'S)  agents  as  to  the  crop  or  cover 
types  for  each  agricultural  field  in  the  site.  An  exam- 
ple of  an  annotated  aerial  photograph  is  shown  in 
figure  2. 

The  LACIE  was  divided  into  three  crop  years— 
Phase  1,  Phase  11,  and  Phase  111.  The  Phase  I and  II 
blind  sites  were  processed  differently  from  the  Phase 
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Ill  sites.  The  wheat  area  was  measured  in  square 
inches  on  an  X and  Y measuring  table  on  the  ground- 
truthed  photographs,  which  were  unrectificd  but 
printed  at  an  approximate  scale  of  1:24  000.  The 
wheat  area  was  divided  by  the  total  blind  site  area, 
thereby  giving  a percentage  of  wheat  for  the  blind 
site.  This  percentage  of  wheat  was  then  used  to  check 
the  percentage  of  wheat  computed  by  the  LACIE 
system.  However,  the  error  sources  and  magnitudes 
were  unknown  because  the  photographs  were  unrec- 
tified.  This  lack  of  rectification  could  significantly 
affect  the  wheat  percentages.  A rectified  photograph 
is  one  in  which  the  geometric  distortions  caused  by 
the  aircraft  pitch  and  roll  have  been  removed.  It  is 
printed  at  a known  scale.  Figure  3 is  an  example  of 


FIG  I Rt  .1. — Fxainplr  imnparismi  of  nilifitd  and  immlifiid 
photographs. 
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how  an  unrectified  photograph  might  appear  with 
respect  to  a rectified  photograph. 

The  Phase  III  blind  sites  have  been  processed  in  a 
much  more  rigorous  manner  than  those  in  Phases  I 
and  II.  The  Phase  III  ground  truth  is  converted  to 
pixel-level  ground  truth;  i.e.,  each  pixel  has  a crop 
code  assigned  to  it  based  on  its  ground-truth  crop- 
cover  type.  Figure  4 illustrates  this  process,  which 
begins  by  obtaining  aerial  photography  of  the  site. 
The  aerial  photographs  are  enlarged  to  a 20-  by  20- 
inch  format  (approximate  scale  of  1:24  000).  The 
enlargements  are  carried  to  the  test  site  by  a county 
agent,  who  annotates  the  crop  type  of  each 
agricultural  field  on  the  photographs.  These  data  are 
sent  to  the  Cartographic  Laboratory,  where  the  fields 
are  delineated  as  polygons,  the  polygon  vertices 
measured,  and  the  polygons  registered  and  converted 
to  Landsat-type  pixels.  The  radiance  levels  assigned 


Util  Rf  6. — LACIE  classification  and  duster  maps,  (a)  l neon 
ditional  cluster  map  before  assignment  of  clusters  to  classes 
August  I/,  1977.  (b)  C additional  cluster  map:  black  — 
threshold,  IK),  III 1 ; sellow  - nonspring  small  grains;  green  — 
spring  small  grains:  other  “ conditional  clusters;  August  17. 
1*77.  (cl  Classification  map:  black  - threshold;  green  - spring 
small  grains;  orange  - nonspring  small  grains:  Juls  2*.  1*77, 
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Table  , Approved  Symbol  List 


Symbol 

Description 

Grayscale 

level 

Harvested 

Abandoned 

Strip  fallow 

Strip  fallow, 
abandoned 

Strip  fallow, 
harvested 

A 

Alfalfa 

90 

115 

140 

165 

190 

215 

B 

Barley 

101 

126 

151 

176 

201 

226 

BN 

Beans 

91 

116 

141 

166 

191 

216 

C 

Com 

92 

117 

142 

167 

192 

217 

CN 

Cotton 

111 

126 

161 

186 

211 

236 

FX 

Flax 

102 

128 

152 

178 

203 

228 

0 

Qrass 

105 

120 

155 

180 

205 

220 

H 

Hay 

106 

121 

156 

181 

206 

231 

i/cc 

Idle  cover  crop 

252 

— 

— 

— 

— 

— 

I/CS 

Idle  cropland 
stubble 

251 

— 

— 

t/F 

Idle  cropland  fallow 

254 

— 

— 

— 

— 

— 

l/RE 

Idle  cropland 
residue 

252 

— 

— 

— 

M 

Millet 

112 

127 

167 

187 

212 

237 

MT 

Mountains 

241 

— 

— 

— 

— 

NA 

Non-A| 

242 

— 

— 

— 

— 

— 

0 

Oats 

104 

129 

154 

179 

204 

229 

P 

Pasture 

107 

122 

157 

183 

207 

232 

PF 

Problem  field 

80 

— 

— 

— 

— 

— 

R 

Rye 

102 

127 

152 

177 

202 

227 

SB 

Sugar  beets 

98 

122 

1<8 

173 

198 

223 

SF 

Safflower 

92 

118 

143 

168 

193 

218 

SO 

Sudan  grass 

95 

120 

145 

170 

195 

220 

SR 

Sorghum 

96 

121 

146 

170 

196 

221 

SU 

Sunflower 

94 

119 

144 

169 

194 

219 

SW 

Spring  wheat 

100 

125 

ISO 

17S 

200 

225 

SY 

Soybeans 

97 

122 

147 

172 

197 

222 

T 

Trees 

108 

L'2 

158 

183 

208 

233 

TR 

Trilicalc 

109 

134 

159 

184 

209 

234 

VW 

Voluntary  wheat 

110 

135 

160 

185 

210 

235 

W 

Winter  wheat 

99 

124 

149 

174 

199 

224 

• 

Water 

240 

— 

— 

— 

— 

X 

Homestead 

250 

— 

— 

— 

— 

— 

to  the  pixels  within  a polygon  are  the  numerical  crop 
codes  assigned  to  each  cover  type  (tables  1 and  II). 
Thus,  one  has  a ground-truth  image  which  has  one- 
to-one  correspondence  with  the  Landsat  imagery 
used  by  LACIE  in  computing  wheat  area  for  that 
site.  A film  image  of  this  product  is  shown  in  figure 
5.  This  image  can  be  compared  to  the  LACIE 
classification  and  cluster  maps  of  the  same  test  site 
shown  in  figure  6. 

Other  Support 

The  Cartographic  Laboratory  has  also  provided 
many  other  support  items  to  LACIE.  The  LACIE 
Research,  Test,  and  Evaluation  (RT&E)  group  has  a 
set  of  intensive  research  sites.  Designated  Intensive 
Test  Sites  (ITS's),  these  sites  have  essentially  the 


same  characteristics  as  the  blind  sites  except  that 
they  are  formatted  in  different  sizes  (2  by  10  nautical 
miles,  5 by  6 nautical  miles,  etc.)  and  have  more 
ground  measurements.  The  Cartographic  Laboratory 
has  processed  the  ITS  data  and  produced  1:24  000- 
scale  maps  of  the  field  boundaries.  The  laboratory 
has  also  supported  the  LACIE  yield  team  by  supply- 
ing meteorological  and  agrophysical  data  in  map 
form.  Graphic  aids  for  photointerpretation  have 
been  constructed  for  the  LACIE  analyst  interpreters. 


THE  UNIQUE  NEED  FOR  CARTOGRAPHY 
IN  SATELLITE  CROP  SURVEYS 

Many  of  the  technical  problems  encountered  in 
LACIE  are  the  same  problems  which  occur  in  map- 
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Table  H— Approved  Special  Crop  Codes 


Gray- 

scale 

level 

Description  of  scene 
(a) 

Approximate 
relative  area 
proportions 

61 

Wheat  + small  grain* 

1:1 

62 

Wheat  4 small  grains  ( 2 or  more) 

1:2 

t 

Wheat  + other  annual  crop 

1:1 

(OAC) 

64 

Wheat + OAC 

1:2 

65 

Wheat  + OAC 

2:1 

66 

Wheat  4 small  grains  + OAC 

1:1:1 

67 

Wheat  4 small  grains  + OAC 

1:2:1 

68 

Wheat  4 small  grains  + OAC 

1:1:2 

69 

Wheat  + small  grains  -I-  fallow 

1:1:1 

70 

Wheat  4 small  grains  4-  fallow 

1:2:1 

71 

Wheat  + small  grains  4 fallow 

1:1:2 

72 

Wheat  4 OAC  4 fallow 

1:1:1 

73 

Wheal  4 OAC  4 fallow 

1:2:1 

74 

Wheat  4 OAC  4 fallow 

1:1:2 

75 

Small  grains  4 OAC 

1:1 

76 

Small  grains  4 OAC 

1:2 

77 

Small  grains  4 OAC 

2:1 

78 

Small  grains  4 OAC  4 fallow 

1:1:1 

79 

Small  grains  4 OAC  4 fallow 

1:2:1 

81 

Small  grains  4 OAC  4 fallow 

1:1:2 

aO«tt  definitions 

Wheat— *inter  or  spring 

Small  gram* — barley,  rye.  inmate,  oatv  millet 

Other  annual  cropi— beans,  sunflower,  safflower,  sudan  gnus,  corn,  soybeans, 
torghum.  flu.  potatoes,  peas,  mustard,  etc 
f allow— Idlc/fallow  or  idle  residue 


ping  applications.  An  example  is  the  problem  where 
there  were  a number  of  aerial  photographs  (as  many 
as  six)  for  a single  blind  site.  For  Phase  III,  this  cre- 
ated a problem  because  some  photographs  did  not 
have  sufficient  correctable  points  to  register  the 
photographic  data  to  the  Landsat  image.  The  Car- 
tographic Laboratory  applied  a photogrammetric 
solution  (ref.  1)  in  which  the  photographic  overlap 
was  used  to  adjust  all  photographs  simultaneously  to 
the  Landsat  image  without  requiring  control  points 
on  every  photograph.  The  mathematical  analysis  in- 
volved is  the  same  as  in  mapmaking  and  uses  a 
simultaneous  weighted  least  squares  adjustment. 

Cartographers  have  studied  the  shape  of  the  Earth 
to  improve  their  mapping  product  quality.  The  shape 
of  the  Earth  may  significantly  influence  the  data  set 
where  a “flat  Earth"  assumption  has  been  made. 
GSFC  has  begun  using  map  projections  in  resam- 
pling data  to  improve  the  geometric  quality.  GSFC 
also  analyzes  the  geometry  of  the  scanner  to  reduce 
nonlinear  systematic  error.  Figure  7 graphically  il- 


t- 


lustrates  the  scanner  and  Earth  geometry,  and  figure 
8 illustrates  a mapping  transformation  that  projects 
the  spheroidal  Earth  onto  a flat  plane.  The  geometric 
analysis  of  scanners  is  very  similar  to  the  analysis  of 
the  Apollo  17  panoramic  camera.  Both  sensors  have 
rotating  optics — the  only  difference  is  the  recording 
medium  (film  or  detector),  which  is  significantly 
different  in  physical  processing  but  similar  in  mathe- 
matical geometric  analysis. 

Another  area  in  which  cartographic  technology 
may  apply  to  remote  sensing  is  in  modeling 
geometric  error  and  determining  its  quantitative 
effect  on  classification.  This  technique  may  be  ap- 
plied to  the  satellite  crop  surveys  when  they  begin  to 
approach  classification  accuracies  of  95  percent  or 
better.  The  effect  of  geometric  errors  on  computed 
crop  production  may  become  significant  at  this  level. 


THE  FUTURE  CARTOGRAPHIC 
LABORATORY 


Hardware  Improvements 

The  Cartographic  Laboratory  has  greatly 
enhanced  its  services  for  the  Earth  Observations 
Division,  but  it  is  expected  that  much  more  can  be 
done.  In  particular,  the  machine  interface  problems 
created  by  physical  data  products  such  as  maps, 
photographs,  and  ground-truth  annotations  can  be 
reduced  by  improving  the  cartographic  hardware 
system.  Using  the  cartographic  subsystem  that  was 
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tl<>l  RE  R.— Mapping  transformation  projecting  the  spheroidal 
Karili  onlw  a flat  plane. 

designed  for  a proposed  NASA  Earth  Resources 
Data  System  (ERDS)  would  alleviate  many  of  these 
problems.  This  design  (fig.  9)  includes  high-speed 
raster  scanners  which  provide  a means  of  rapid  com- 
puter inputs  of  graphics,  maps,  and  photographs. 
High-resolution  black-and-white  cathode-ray  tubes 
(CRT’s)  also  provide  a means  of  accurate  interactive 
measurement  on  digital  images  created  from  physi- 
cal data  sources. 


Digital  Data  Baaa 

Another  area  in  which  the  Cartographic  Laborato- 
ry can  make  a significant  contribution  is  in  develop- 
ing and  implementing  a geographic  computer  data 
base  which  could  provide  digital  imagery,  soil  types, 
meteorological  history  or  condition,  political  mem- 
bership, and  all  the  other  data  needed  to  make 
satellite  crop  survey  decisions.  Because  these  data 
have  spatial  association  with  the  geoid  (Earth),  they 
can  be  organized  by  spatial  location  (latitude  and 
longitude).  Pointers  can  be  computed  to  locate  data 
in  the  computerized  data  base  as  a function  of  spatial 
position. 

Automation  of  Taat  Site  Location 

One  of  the  most  expensive  tasks  performed  by  the 
Cartographic  Laboratory  for  LACIE  was  test  site 
location.  For  future  programs,  this  process  could  be 
automated,  resulting  in  substantial  savings.  The 
digital  data  base  would  be  loaded  with  the  data  used 


t'ltit  RK  V.— Hardware  configuration  fur  rarlograpliie  sub- 
ssstcm. 


in  the  selection  strategy  and  the  selection  process 
could  be  machine  programed.  This  feature  would 
produce  a superior  product  and  allow  for  changes  in 
selection  strategy  without  significant  manpower  ex- 
penditures. It  would  also  provide  a means  of  testing 
different  sampling  algorithms. 


Automation  of  Blind  Bit#  Registration 

Another  area  to  be  improved  is  ihc  registration  of 
ground  truth  to  Landsat  imagery  for  the  blind  sites. 
The  current  process  requires  the  selection  of  control 
points  that  can  be  measured  on  the  aerial  photo- 
graphs and  the  Landsat  imagery.  Because  the  ground- 
truth  data  are  compiled  and  digitized  as  agricultural 
field  boundaries  (polygons),  it  would  probably  be 
better  to  register  the  data  using  an  edge-detection  cor- 
relator such  as  GSFC  uses.  Because  GSFC  has  Land- 
sat boundary  maps  for  the  LACIE  test  sites  (of 
which  the  blind  sites  are  a subset),  correlation  of  the 
ground-truth  boundary  maps  to  the  GSFC  boundary 
maps  derived  from  the  imagery  may  be  possible. 
This  automatic  correlation  would  then  provide  the 
information  necessary  to  generate  the  registration 
coefficients. 
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INTRODUCTION 

A primary  purpose  of  the  LACIE  was  to  test  the 
concept  that  large  amounts  of  Landsat  data  could  be 
analyzed  in  a real-time  environment.  Previous  Earth 
resources  computer  systems  relied  on  tape  media  for 
data  storage  and  retrieval,  and  the  volume  of  tapes 
necessary  to  support  a LACIF.-typc  system  seemed 
to  offer  insurmountable  physical  and  administrative 
problems.  Thus,  it  was  decided  that  LACIE  would 
implement  direct-access  devices  as  the  data  storage 
media.  In  this  case,  all  the  data  would  be  available 
immediately  without  prior  staging,  and  the  physical 
and  administrative  problems  should  be  minimal. 

This  paper  presents  some  of  the  design  considera- 
tions involved  in  implementing  direct-access  storage 
devices  for  LACIE.  The  concentration  is  on  the 
storage  and  retrieval  of  image  data  because  this  pre- 
sented the  most  significant  challenge  The  discussion 
will  include  a definition  of  the  problem,  th?  solution 
methodology  (or  design  decisions),  the  initial  opera- 
tional structure,  the  modifications  which  have  been 
incorporated,  some  conclusions,  and  projections  of 
future  problems  to  be  solved. 


THE  PROBLEM 

The  LACIE  was  initially  set  up  to  assess  the 
worldwide  production  of  one  agricultural  crop, 
wheat.  The  site,  or  sample  segment,  is  the  smallest 
unit  of  land  involved  in  the  assessment  process.  The 
analyst  or  interpreter  determines  a percentage  of 
wheat  contained  in  a sample  segment  by  using  the 
pattern  recognition  algorithms  available  in 
LACIE  software. 

To  begin  the  LACIE  process,  image  data  must  oe 
requested  and  received  from  the  NASA  Goddard 
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Space  Flight  Center  (GSFC).  The  data  are  input  to 
the  LACIE  system  from  the  GSFC  tape.  In  addition, 
field  and  dot  update  cards  are  input  to  the  LACIE 
system  to  define  or  update  certain  “fields”  and 
"dots”  for  each  image  to  be  analyzed. 

The  LACIE  process  allows  the  data  to  be 
statistically  analyzed  by  one  of  two  subsystems,  the 
interactive  subsystem  or  the  batch  subsystem.  The 
batch  subsystem  is  primarily  a card-mode  emulation 
of  the  interactive  subsystem  and  is  more  restrictive 
than  the  interactive  mode. 

At  the  outset,  it  was  planned  to  have  approx- 
imately 1 200  strata  containing  a total  of  4800  sample 
segments.  For  each  of  the  4800  sample  segments, 
there  couid  be  up  to  16  acquisitions  of  data. 


Data  Bata  Size 

An  acquisition  is  a subscene  extracted  from  a 
Landsat  scene  on  a given  day  and  composed  of  four 
spectral  bands.  Each  acquisition  consists  of  117  scan 
lines,  and  each  scan  line  contains  196  pixels,  with 
four  values  for  each  pixel.  To  be  represented  in  one 
band,  each  pixel  requires  a byte  (eight  bits).  Thus,  for 
each  acquisition  of  a site,  there  are  91  728  bytes  of 
data. 

1 17  lines 
x |%  pixels 

22  932  bytes  per  band 
x4  bands 

91  728  bytes  of  data  per  acquisition 

in  addition  to  the  actual  image  data,  there  is  infor- 
mation describing  the  Landsat  scene  from  which  the 
image  was  extracted.  This  information  must  be  re- 
tained with  the  image  for  the  data  to  be  useful  to  the 
analysts.  This  information,  the  header,  requires  an 
additional  3062  bytes  of  storage.  The  total  storage  re- 
quired to  retain  a single  image  is  at  least  94  790  bytes. 
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If  the  desire  is  to  store  ami  access  16  acquisitions 
for  each  of  4800  sample  segments,  the  total  storage 
requirement  is 

94  790  bytes  per  acquisition 
x 16  acquisitions  per  site  (maximum) 

1 516  640  bytes  per  site  (maximum) 
x4  800  sites  (maximum) 

7 279  872  000  in  storage  (maximum) 

In  broad  terms,  the  problem  can  now  be  stated: 
design  an  integrated  data  base  which  can  hold  up  to 
7,3  billion  bytes  of  image  dam  along  with  the  associ* 
sled  field  and/or  dot  definitions  and  alio*  access  in 
pieces  of  aboi  t 95  000  bytes.  However,  this  simple 
stai?;iK.v  doe*  not  fully  describe  the  problem 
because  there  were  additional  constraints  and  targets. 

Since  the  image  data  would  fiow  into  the  data  base 
over  a growing  season  and  since  not  every  site  would 
have  the  entire  16  acquisitions  (because  of  cloud 
cover  and  other  problems  relating  to  data  quality),  it 
was  not  practical  to  design  a data  base  which  would 
hold  the  entire  7.3  billion  bytes.  Additionally,  there 
was  no  way  of  knowing  beforehand  the  exact  dis- 
tribution of  acquisitions  to  sues.  Some  sites,  at  year 
end,  might  have  no  data  associated  with  them,  white 
other  sites  might  have  the  full  16  acquisitions.  Thus, 
it  was  necessary  to  design  a data  base  which  would 
allow  great  flexibility  in  terms  of  the  distribution  of 
acquisitions  to  sites. 

Cataloging  and  Croc a-Hafarandng 

The  time  dependency  of  the  image  data  acquisi- 
tion also  implies  ■ need  to  catalog  the  data  so  that 
analysts  can  determine  dau  availability. 

As  image  data  is  received,  the  analysts  must  deter- 
mine whether  the  dau  will  allow  the  pattern  recogni- 
tion process  to  be  performed.  Once  this  determina- 
tion is  made,  the  analysts  must  prepare  the  addi- 
tional inpuu  for  the  pattern  recognition  algorithms. 
These  inputs,  consis ting  primarily  of  field  and/or  dot 
definitions,  are  directly  related  to  the  specific  site 
under  investigation.  Thus,  when  the  field  and  dot 
definitions  are  stored,  they  must  be  correlated  with 
the  imagery  for  which  they  were  developed.  This  im- 
plies that  the  cataloging  scheme  must  allow  for  cor- 
relating the  image  dau  with  the  associated  field  data. 


Data  Security 

Another  consideration  was  the  protection  of  a 
large  cross-referencing  data  base.  The  data  contained 
on  the  LAC1E  data  bate  had  to  be  reasonably 
secured  from  the  inadvertent  deletion  of  data  by  the 
analysts  themselves  or  the  many  other  users  within 
the  NASA  Real-Time  Computer  Complex  (RTCC). 
It  was  deemed  critical  that  procedures  be  designed 
into  the  LACiE  data  base  accesses  such  that  human 
or  machine  errors  would  not  cause  complete  chaos. 
The  protection  had  to  ensure  either  that  errors  could 
not  occur  or  that,  if  they  did  occur,  the  data  base 
could  be  restored  to  iu  original  status. 

ProooaaHtg  Constraint* 

The  requirements  that  the  computer  analysis 
operations  performed  on  the  RTCC  computers  be  in- 
teractive operations  or  simulate  interactive  opera- 
tions to  give  a batch  capability  implied  additional 
constrainu  on  the  data  base  design.  It  was  necessary 
to  design  sufficient  rardom  access  capability  into  the 
data  base  such  that  requested  image*  and  fields  could 
be  retrieved  from  the  clau  base  in  a reasonable 
amount  of  time.  The  processing  throughput  targets 
were  30  to  40  segments  in  a 16-hour  period  in  the  in- 
itial configuration;  the  target  increased  to  120  seg- 
ments per  16  hours  ir,  subsequent  configurations. 
These  targets  meant  that  the  entire  analysis  opera- 
tion had  to  fiow  through  the  RTCC  computers  on  an 
average  of  one  every  21.5  minutes  initially,  and 
finally,  one  every  8 minutes.  Clearly,  retrieving  the 
images  and  fields  from  the  data  base  could  not  con- 
sume an  inordinate  amount  of  time. 

Coot  Objectives 

The  budget  allocated  to  the  accomplishment  of 
the  design  and  implementation  of  the  data  base  to 
support  LACIE  was  an  additional  constraint.  The  ac- 
tual budget  levels  arc  not  important  to  this  discus- 
sion. but  it  should  be  stated  that  cost  effectiveness 
was  an  important  consideration. 

The  initial  problem  cm  now  be  stated  thusly: 
design  and  implement,  «s  inexpens  vely  as  possible, 
an  errorproof  dau  base  structure  to  support  LACIE 
in  such  a manner  as  to  allow  any  given  segment  to  be 
processed  in  no  more  than  8 minutes. 
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SOLUTION  METHODOLOGY 

There  were  two  basic  designs  initially  presented  to 
solve  the  LACIE  data  base  problem.  One  design  pro- 
posed a large  multivolume  image  data  base  to  store 
the  image  data,  a history  data  base  to  hold  the  catalog 
data,  and  a field  data  base  to  hold  the  field  defini- 
tions. The  second  design  proposed  multiple  image 
data  bases,  each  a single  volume,  supported  by  the 
same  history  and  field  data  bases.  The  IMS- 360  was 
proposed  as  the  data  base  mana^r,  the  system  soft- 
ware support  for  the  data  bases.  The  IMS-360  was  an 
off-the-shelf  product;  thus,  the  cost  of  developing  a 
specialized  data  base  support  package  was  elimi- 
nated. The  IMS-360  also  offered  sufficient  data  base 
management  services  to  support  either  of  the  two  in- 
itial designs  completely. 

The  designers  realized  that  the  image  portion  of 
the  LACIE  data  base  was  the  most  critical  because  of 
the  potential  size.  As  a result,  efforts  were  begun  to 
determine  realistic  size  limits. 


Sizing  the  Data  Base 

The  first  major  result  of  the  sizing  effort  was  the 
understanding  that  the  3000-byte  header  was  largely 
duplicated  for  each  acquisition  of  a given  sample  seg- 
ment. Each  new  acquisition  after  the  first  one  for  a 
segment  really  required  only  24  bytes  of  storage  to 
record  the  differences.  This  understanding  led  to  the 
first  step  in  a logical  design,  which  is  illustrated  in 
figure  1. 


FIGURE  1. — Simple  view  of  image  data  base  logical  structure. 


Initial  Logical  Design 

The  logical  design  at  this  point  would  have  an  im- 
age data  base  with  the  key  portion  being  a single 
3000-byte  header.  Each  header  could  have  up  to  16 
acquisition  headers  associated  with  it,  and  each  ac- 
quisition header  would  have  one  data  segment  asso- 
ciated with  it.  The  resulting  size  would  be 

4 800  sites 
X3062  bytes 

14697  600  bytes  of  site  headers 
plus 

16  acquisitions 
X4  800  sites 

76  800  sites  per  acquisition 
x 24  bytes 

1 843  200  bytes  of  acquisition  headers 
plus 

16  acquisitions 
x4  800  sites 

76  800  sites  per  acquisition 
x91  728  bytes 

7 044  710  400  bytes  of  data 
equals 

7 061  251  200  bytes 

Reducing  the  number  of  site  header  records  would 
save  almost  220  million  bytes  of  storage  space.  The 
ITEL  disks  being  considered  as  the  storage  devices 
for  LACIE  would  each  hold  approximately  100 
million  bytes  of  data;  thus,  the  reduction  in  the  num- 
ber of  site  header  records  stored  would  cut  the 
requirement  by  two  packs. 

The  number  of  disk  volumes  required  to  retain 
the  image  data  had  been  reduced  from  73  volumes  to 
71  volumes,  a 3-percent  reduction.  This  savings  was 
not  considered  significant.  Obviously,  a better 
estimation  of  the  maximum  number  of  acquisitions 
to  be  retained  would  yield  a more  significant  reduc- 
tion in  the  maximum  data  base  size  The  LACIE 
planning  staff  determined  that  the  LACIE  image 
data  base  should  be  of  sufficient  size  to  store  up  to  4 
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acquisitions  of  data  for  3840  sites  and  up  to  16  ac- 
quisitions for  960  sites. 

The  new  requirement  reduced  the  maximum 
number  of  acquisitions  from  76  800  to  30  720.  The 
resulting  maximum  image  data  base  size  would  be 

4 800  sites 
x 3 062  bytes 

14697  600  bytes  of  site  headers 
plus 

30  720  acquisitions 
x24  bytes 

737  280  bytes  of  acquisition  headers 
plus 

30  720  acquisitions 
x91  728  bytes 

2 817  884  100  bytes  of  data 
equals 

2 833  318  980  bytes 

This  reduced  the  number  of  disk  volumes  required  to 
store  the  image  data  to  29. 


Security  Considerations 

The  next  step  in  the  process  was  to  determine  the 
implications  of  supporting  a 29-volume  data  base 
using  IMS- 360  as  the  data  base  manager.  The  design- 
ers were  aware  that  IMS-360  provided  for  data  base 
protection  via , extensive  recovery  utilities.  There 
were  IMS-360  utilities  available  for  checkpointing 
data  base's  by  copying  to  tape  either  the  physical  or 
the  logical  structure  of  the  data  base.  The  recom- 
mended I MS- 360  procedure  for  data  base  recovery 
was  to  copy  the  most  recent  checkpoint  tapes  to  the 
set  of  disks  comprising  the  data  base  and  then  to  read 
in  the  IMS  log  tapes  that  were  created  between  the 
time  of  the  checkpoint  tape  creation  and  the  error  oc- 
currence. 

The  error  recovery  procedures  implicit  in  I MS- 360 
had  significant  consequences  relative  to  the  two  in- 
itial data  base  designs.  If  a failure  occurred  on  a single 
disk  volume  of  the  multivolume  data  base,  the  time 
required  simply  to  restore  the  checkpoint  tapes  to  the 


29  volumes  would  exceed  4 hours,  given  the  max- 
imum possible  transfer  rate  from  the  tape  drives. 
And  there  was  no  hope  of  transferring  3 billion  bytes 
of  data  at  the  maximum  transfer  rate.  The  more 
probable  transfer  rate  of  89  600  bytes/sec  suggested  a 
recovery  time  of  at  least  8 hours.  The  same  single- 
disk failure  in  the  design  involving  multiple  image 
data  bases  would  necessitate  the  recovery  of  only  the 
disk  volume  that  failed  and  not  the  other  28 
volumes.  In  this  situation,  the  recovery  time  for  such 
a failure  would  be  reduced  to  a little  over  9 minutes  at 
the  maximum  transfer  rate  and  18  minutes  using  the 
realistic  rate  of  89  600  bytes/sec. 

Upon  consideration  of  the  time  requirements  for 
checkpointing  and  restoring  the  image  data  bases 
using  the  IMS-360  techniques,  the  design  proposing  a 
single  multivolume  image  data  base  was  rented. 
The  time  required  to  checkpoint  (copy  from  disk  to 
tape)  was  about  equal  for  both  proposals,  but  the  re- 
covery from  error  situation  clearly  favored  the  multi- 
ple image  data  base  proposal. 


Expanding  the  Multiple  Data  Base  Design 

At  this  point,  the  designers  began  to  refine  the  im- 
age data  base  structure.  They  had  to  define  a process 
which  would  divide  the  image  data  into  several  IMS 
data  bases,  one  data  base  per  disk  volume.  However, 
it  was  also  necessary  to  make  the  structure  look  like  a 
single  image  data  base  to  the  application  programs  or 
at  least  provide  a method  by  which  the  application 
programs  could  easily  get  to  the  proper  image  data 
bases. 

Distribution  of  the  data  over  the  packs. — A master 
catalog  or  index  appeared  to  be  the  solution  allowing 
the  application  programs  to  refer  to  the  desired  data 
bases.  Knowing  that  the  application  programs 
wanted  to  access  image  data  simply  by  supplying  the 
site  number  and  that  the  IMS  required  the  address  of 
a program  control  block  (PCB),  which  contained  the 
"key"  and  the  data  base,  the  designers  were  able  to 
build  a catalog.  The  application  program  would 
simply  call  a subroutine  passing  the  site  number,  and 
the  subroutine  would  return  the  address  of  the  re- 
quired PCB.  The  application  program  would  then 
issue  the  proper  IMS  request  for  the  desired  site. 

The  catalog  is  structured  such  that  allocation  of 
sites  to  data  bases  is  controlled  by  a 10  000-bytc  table, 
in  which  sites  are  represented  by  table  position  and 
the  position  content  specifies  the  data  base  to  which 
the  site  is  assigned.  Next  came  the  question  of  how 
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to  assign  the  sites  to  each  image  data  base.  It  seemed 
simple  merely  to  divide  the  4800  sites  by  29  packs  to 
get  the  number  or  sites  to  assign  to  each  pack,  then  to 
assign  the  sites  consecutively  to  a pack.  Thus,  image 
data  base  1 would  contain  sites  1 through  166;  image 
data  base  2 would  contain  sites  167  through  331;  etc. 

However,  this  very  simple  assignment  technique 
would  run  into  difficulty  when  “intensive  test  sites" 
were  introduced.  Intensive  lest  sites  arc  acquired 
year-round  with  no  break  in  the  acquisition  cycle,  as 
opposed  to  normal  production  segments  whose  ac- 
quisition windows  conform  to  the  crop  growing 
season.  The  intensive  study  sites  are  likely  to  have 
more  acquisitions  than  the  regular  sites,  and,  if  a 
large  number  of  intensive  test  sites  are  allocated  to  a 
single  pack,  there  is  a high  risk  that  tnat  pack  will  be 
unable  to  contain  all  the  required  acquisitions. 
Knowing  that  site  numbers  would  be  assigned 
consecutively  within  a country  and  that  some  coun- 
tries could  contain  many  intensive  test  sites,  data 
base  designers  were  led  to  a standard  IMS  randomiz- 
ing routine  that  would  randomly  assign  the  4800  sites 
to  the  various  image  data  bases.  The  use  of  this 
routine  would  make  the  proportion  of  intensive  test 
sites  and  the  proportion  of  sites  for  each  country 
about  the  same  on  each  pack  (fig.  2). 

Distribution  of  the  data  on  a pack. — The  same  con- 
cerns that  existed  for  the  distribution  of  the  data  over 
the  set  of  packs  also  applied  to  distributing  the  data 
on  each  pack.  For  the  sites  assigned  to  a given  pack, 
the  loading  of  data  acquisitions  would  be  uneven  and 
unpredictable.  Further,  the  intensive  test  sites  would 
require  more  space  than  the  normal  production  sites. 
The  same  logic  that  necessitated  randomly  distribut- 
ing the  sites  over  the  entire  set  of  image  data  bases 
applied  to  randomly  distributing  the  sites  over  the 
blocks  on  an  individual  image  data  base. 

Blocking  the  image  data. — At  this  point,  the  prob- 
lem of  allocating  the  sites  to  the  multiple  data  bases 
and  making  the  structure  independent  of  the  applica- 
tion programs  had  been  solved.  The  problems  of  effi- 
ciently blocking  the  data  for  IMS  and  the  physical 
devices  to  be  used  were  addressed  next.  The  simple 
logical  view,  shown  earlier  in  figure  I,  indicated  data 
records  of  almost  92  000  bytes.  Physical  records  of 
this  size  are  simply  impractical,  if  only  because  of  the 
buffer  size  required  to  transfer  the  record  from  disk 
to  computer  storage.  The  track  size  of  the  ITEL  disk 
allows  for  slightly  more  than  13  000  bytes  of  data, 
and, since  the  device  uses  track  addressing,  it  seemed 
logical  to  choose  a record  size  which  was  either  a 
multiple  or  a divisor  of  that  track  size. 
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FIGURE  2. — Master  Index  structure. 


INITIAL  OPERATIONAL  STRUCTURE 

The  designers  consulted  the  programers  to  deter- 
mine how  the  statistical  routines  would  be  accessing 
the  image  data  and  learned  that  the  programers  in- 
tended to  code  routines  that  would  compute  statistics 
for  up  to  four  acquisitions  at  a time,  line  by  line 
across  all  the  acquisitions.  A logical  data  base  struc- 
ture that  would  match  this  approach  is  shown  in 
figure  3. 


FIGURE  3.— The  logical  structure  to  match  processing  logic. 
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This  logical  structure  and  consideiation  of  buffer 
size  combined  with  the  knowledge  that  80  percent  of 
the  sites  would  have  only  four  acquisitions  allowed 
the  designers  to  develop  a physical  structure  which 
would  look  like  this: 


Acquisitions  14 


Line  1 I Line  2 
— > - 


Line  3 I 

-4, 


Line  H7  i Pad 


1 block  (6442  bytes) 

Approximately  30  tracks  of  7330  disk  space 


However,  for  those  sites  which  have  more  than 
four  acquisitions  associated  with  them,  the  physical 
structure  would  be 
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different  packs  (volumes),  the  contention  for  the 
arm  on  the  device  could  be  reduced,  thus  eliminating 
some  wait  time  required  when  the  device  arm  is 
moved  from  one  location  to  another.  This  was  done, 
and  the  resulting  image  data  bases  were  organized  as 
shown  in  Figure  4. 

While  the  structure,  both  logical  and  physical, 
matched  the  application  logic,  it  turned  out  to  be 
highly  inefficient  for  storage  and  retrieval.  The  line 
data  for  each  of  the  first  four  acquisitions  was  scat- 
tered over  59  blocks  of  storage,  and,  while  user  re- 
quests for  all  four  acquisitions  could  be  handled  as 
efficiently  as  requests  for  only  a single  acquisition,  in 
a worst-case  situation  (where  there  were  16  acquisi- 
tions available  for  a site  and  the  analyst  wanted  the 
last  four  in  inverse  order).  5850  accesses  were  re- 
quired. The  operational  average  for  image  retrieval 
requests  for  sites  having  16  acquisitions  turned  out  to 
be  760  accesses.  Because  there  is  a significant  amount 
of  central  processor  overhead  associated  with  each 
access,  the  designers  wanted  to  eliminate  any  un- 
necessary accesses.  They  proceeded  to  look  for  im- 
provements in  the  data  base  design  to  accomplish 
this.  The  current  image  data  base  structure  is  the 
result  of  the  overhead  reduction  study. 


Acquisition  16  . . . Each  additional  acquisition  requites  7.5  tracks 
of  7330  disk  space. 

The  block  size  of  6442  bytes  was  determined  to  be 
the  optimum  block  size  for  the  actual  image  data 
based  on  the  buffering  within  the  central  processor, 
the  characteristics  of  the  7330’s,  the  characteristics  of 
the  IMS  access  method,  and  the  data  characteristics. 

However,  the  blocking  factor  that  was  optimal  for 
the  image  data  was  not  optimal  for  storing  and 
retrieving  the  ancillary  data  (the  site  and  acquisition 
headers).  Combining  all  the  ancillary  data  associated 
with  a given  site  would  require  approximately  3500 
bytes  of  storage.  When  the  storage  requirement,  data 
base  manager  overhead,  and  data  pointers  were  con- 
sidered, the  optimum  block  factor  for  the  ancillary 
data  was  determined  to  be  4248  bytes  per  block. 

The  means  of  obtaining  the  optimum  blocking 
factor  for  each  type  of  data  was  to  define  two  data  set 
groups  for  each  image  data  base.  One  data  set  would 
contain  only  image  data,  and  the  other  data  set  would 
contain  the  ancillary  information.  Defining  two  data 
sets  for  each  image  data  base  also  offered  an  addi- 
tional advantage.  If  the  data  sets  were  placed  on 


CURRENT  OPERATIONAL  STRUCTURE 

The  new  design  differs  from  the  previous  one 
only  in  treatment  of  the  first  four  acquisitions.  The 
data  for  each  acquisition  are  placed  on  the  data  base 
by  line.  This  procedure  puts  the  lines  for  each  ac- 
quisition in  a set  of  15  contiguous  blocks. 


Acquisition  1 


Lines  1-8 

Lines  9-16 

Lines  113-117 

Acquisition  16 

15  blocks  (96  630  bytes) 

lines  1-8 

Lines  9-16 

Lines  113-11? 

1 5 blocks  (96  630  bytes) 


As  a result,  the  number  of  accesses  required  to 
retrieve  the  first  acquisition  of  the  site  is  only  16,  and 
it  drops  to  15  for  every  additional  acquisition  of  the 
same  site.  The  logical  structure  which  maps  the  new 
physical  structure  is  shown  in  figure  5. 
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FIGURE  4. — Dita  set  allocation  for  LACIE  Image  data  base. 


FIGURE  5.— Logical  structure  implemented. 


The  first  structure  incorporated  programer  plans 
into  the  data  base  design.  Then,  in  order  to  increase 
the  efficiency  of  accessing  the  data  base,  the  struc- 
ture was  modified.  The  modified  structure  did  not  fit 
with  the  programer 's  plans.  This  problem  of  incom- 
patibility between  the  retrieval  logic  and  the  applica- 
tion logic  was  resolved  by  setting  up  four  large-core 
storage  buffers  for  IMS  output  and  application  input 
and  by  modifying  the  call  sequence  to  IMS.  Instead 
of  retrieving  each  acquisition  in  full  before  going  to 
the  next,  the  application  requests  each  acquisition  in 
increments  of  eight  lines,  which  is  a full  block  of 
data.  This  allows  the  application  program  to  transfer 
the  image  data  to  a working  storage  area  on  a disk  in 
the  order  that  fits  the  processing  logic.  Only  the  first 
call  in  each  group  of  eight  is  fully  qualified  with 
cylinder,  track,  and  block.  The  following  seven  are 
sequential  calls,  which  are  most  efficient  for 
IMS-360. 


The  necessary  cross-referencing  is  accomplished 
by  using  the  site  number  in  alt  the  data  bases.  The 
site  number  is  the  key  for  the  application  program  to 
access  the  image  data  bases  as  well  as  the  history  data 
base  and  the  field  data  base.  The  analysts  work  with 
one  site  at  a time;  the  software  design  takes  advan- 
tage of  this  fact.  By  knowing  which  site  and  acquisi- 
tions are  being  worked  on,  the  programs  can  access 
all  required  records  for  processing. 


Security 

The  data  bases  are  also  recoverable  within  a 
reasonable  time  frame.  With  a multiple  data  base  set 
for  the  image  data,  a pack  failure  requires  the  recov- 
ery of  only  that  pack  and  not  the  entire  image  data 
base.  Because  the  data  on  each  image  data  base  are 
physically  organized  to  minimize  the  number  of  ac- 
cesses required  to  retrieve  each  record,  the  time  re- 
quired to  store  or  retrieve  all  records  is  minimal.  As  a 
result,  a full  image  data  base,  one  volume,  can  be 
completely  dumped  to  tape  or  restored  from  tape  in 
approximately  20  minutes. 

Data  base  security  is  provided  by  periodic  check- 
points, where  all  the  data  bases  are  dumped  to  tape. 
The  IMS  log  tapes  created  between  checkpoints  are 
retained.  If  an  error  occurs,  the  recovery  procedure 
begins  by  identifying  the  data  base  that  is  in  error. 
Then  the  checkpoint  tape  for  that  data  base  is 
restored  to  a disk  via  an  IMS  utility.  Another  utility 
reads  all  the  log  tapes,  except  the  one  which  con- 
tained the  error,  for  the  data  base  identified  as  being 
in  error.  In  this  way  the  data  base  is  recovered  up  to 
the  point  of  the  error.  The  updates  that  were  done 
after  the  error  must  be  redone. 


Expansion 

The  multiple  image  data  bases  also  offer  an  addi- 
tional benefit.  The  GSFC/JSC  Interface  Control 
Document  specifies  a maximum  of  4800  sites  with 
3840  sites  having  up  to  4 acquisitions  and  960  sites 
having  up  to  16  acquisitions.  These  dimensions  im- 
ply a maximum  data  base  size  of  31  volumes.  In  prac- 
tice, data  arrive  at  JSC  at  the  rate  of  0 to  120  acquisi- 
tions per  day.  Thus,  throughout  a crop  year,  a con- 
siderably smaller  data  base  can  be  used  to  contain  the 
image  data.  However,  with  such  a compact  image 
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data  base,  provisions  must  be  made  for  a data  set 
overflowing  its  volume.  The  provisions  are  twofold. 
First,  all  image  data  set  groups  are  cataloged  to  an 
overflow  volume  (1DBOVF)  in  addition  to  their  pri- 
mary volumes  (fig.  6).  Second,  a procedure  has  been 
constructed  to  add  volumes  and  redistribute  sites. 

Conceptually,  the  expansion  process  proceeds  in 
the  following  manner.  The  LACIE  image  data  bases 
consist  of  a collection  of  sites  and  associated  data 
whose  distribution  over  n volumes  is  specified  in  the 
master  index.  An  expansion  of  the  image  data  bases 
is  accomplished  by  adding  k new  volumes  to  the  ex- 
isting n volumes  and  redistributing  the  sites  over  the 
new  n +k  volume  configuration.  A new  master  index 
which  reflects  site  locations  in  the  new  data  base  con- 
figuration is  constructed  by  considering  number  of 
sites,  number  of  acquisitions,  and  balance.  Those 
sites  which  have  new  data  base  assignments  are 
unloaded  to  tape.  The  new  index  is  used  to  control 
the  reloading  of  the  unloaded  sites  at  their  new  data 
base  locations.  Finally,  the  reloaded  sites  are  deleted 
from  their  old  locations. 


Oaily  Use  of  LACIE  Data  Bates 

I 

Now  that  the  development  of  the  LACIE  data 
bases  has  been  examined,  the  day-to-day  use  of  the 
data  bases  will  be  discussed.  The  LACIE  data  bases 
are  in  use  approximately  8 hours  per  day,  S days  per 
week.  A typical  day  of  support  will  be  divided  among 
several  user  groups.  Usually,  the  first  portion  of  a 
production  period  will  be  used  for  data  base  updat- 
ing. After  the  updating  is  complete,  the  remaining 
time  will  be  spent  in  either  batch  production  or  in- 
teractive use. 

The  production  activities  are  fairly  consistent  and 
tend  to  follow  a daily  pattern,  at  least  as  far  as  the 
uses  of  the  data  bases  are  concerned. 

Image  data  are  requested  from  GSFC  to  start  the  ' 
LACIE  process.  This  is  accomplished  via  the  history 
update  job,  which  uses  card  inputs  to  update  the  I 
history  data  base  and  produce  a JSC  interface  tape,  a j 
history  data  base  query  report,  and  a listing  of  history 
data  base  updates.  The  JSC  interface  tape  is  sent  to  I 
GSFC  to  order  image  data. 

Receiving  the  data  from  GSf'C  is  the  second  step 
in  the  process.  The  data  are  input  to  the  LACIE 
system  from  the  GSFC  tape  and  are  processed  by  the 
composition  and  indexing  (C&I)  job.  The  C&I  job 


may  contain  card  inputs  which  define  the  sites  to  be 
processed  or  excluded  from  the  tape.  The  C&I  job 
updates  the  history  and  image  data  bases  and  gener- 
ates the  daily  report.  The  C&I  job  also  computes  a 
“green  number”  for  the  image  being  processed.  The 
green  number  is  stored  on  the  image  data  base  in  the 
variable  header  for  the  acquisition. 

In  addition,  the  LACIE  process  requires  that 
“fields”  and  “dots”  defined  for  each  image  be 
analyzed.  The  field  update  job  uses  card  inputs  to  up- 
date the  field  data  base  with  the  definitions  and  pro- 
duce a transaction  report,  a field  report,  and  a listing 
of  field  data  base  updates.  A field  overlay  tape  is  also 
produced  by  this  job. 

The  dot  data  base  utility  is  used  to  define  and  label 
209  pixel  locations  per  segment,  called  “dots.”  Dots 
are  used  as  starting  vectors  for  several  of  the  analysis 
routines.  The  category  (blank  if  not  specified)  and 
function  for  each  dot  are  defined.  The  dot  data  base 
utility,  which  has  both  complete  replacement  and 
partial  update  capabilities,  maintains  the  dots  on  the 
d^t  data  base.  The  dot  data  base  resides  in  segments 
oi  field  data  base.  Figure  7 charts  the  steps  in  the 
preparation  for  an  analysis  run. 

Now  the  data  is  ready  to  be  statistically  analyzed. 
There  are  two  subsystems  in  place  for  processing  the 
data,  the  interactive  subsystem  and  the  batch  sub- 
system. The  batch  subsystem  is  primarily  a card- 
mode emulation  of  the  interactive  subsystem  and  is 
more  restrictive  than  the  interactive  mode.  Figure  8 
charts  the  analytical  process  in  the  batch  mode. 
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FIGURE  7.— Preparation  for  an  analysis  run. 


CONCLUSIONS 

The  LACIE  data  bases  have  been  in  use  for 
almost  4 years.  Data  for  the  crop  years  1975,  1976, 
and  1977  have  been  received  and  processed;  cur- 
rently, the  1978  image  data  are  being  received  at  JSC. 
Throughout  this  support,  the  image  and  ancillary 
data  have  been  efficiently  stored  and  retrieved  for 
processing.  The  processing  throughput  has  been  con- 
sistent with  design  objectives. 

The  IMS  utilities  for  data  security  using  the  check- 
point/restore methodology  have  performed  as  ex- 
pected. Data  losses  have  been  negligible. 

By  taking  advantage  of  the  expandability  allowed 
by  the  multiple  image  data  bases,  the  most  effective 
use  of  the  7330's  has  been  made. 


FUTURE  DATA  BASES 

The  LACIE  data  bases  met  the  design  objectives 
for  the  limited  environment  for  which  they  were  in- 
tended. However,  there  are  some  limitations  to  the 
LACIE  solution  that  must  be  overcome  if  a data  base 
of  worldwide  and  multicrop  coverage  is  to  be 
developed. 

The  future  data  bases  to  support  the  new  environ- 
ments must  contain  more  data.  The  future  Landsat 
will  have  more  sensors  with  higher  resolution  to  pro- 
duce more  information  per  unit  area  of  land.  Addi- 
tionally, new  applications  such  as  air  quality,  water 
quality,  and  land  use  will  require  new  access  to  the 
data.  There  will  also  be  new  satellites  such  as  Seasat 
and  the  Soil  Moisture  Satellite,  and  analysts  may 
desire  to  combine  data  from  several  satellites  to  ad- 
dress a particular  problem.  All  these  concepts  in 
combination  imply  a global  data  storage  and  distribu- 
tion problem  that  the  LACIE  methodology  has  not 
begun  to  address. 

The  most  apparent  shortcomings  of  the  LACIE 
data  base  design  in  considering  the  future  applica- 
tions are  the  floorspace  requirements  for  the  direct- 
access  storage  devices,  the  overhead  required  by  the 
checkpoint  system  for  data  base  protection,  and  the 
overhead  associated  with  reorganizing  the  data  base 


FIGURE  8.— Processing  the  Image  in  batch  mode. 
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if  the  balance  or  data  to  be  retained  is  greater  than  an* 
ticipated. 

The  solution  to  the  floorspace  problem  is  available 
with  current  technology.  There  are  mass  storage 
devices  with  storage  densities  much  higher  than 
those  of  the  currently  used  direct-access  devices.  The 
use  of  these  mass  storage  devices  would  require  the 
trade-off  of  some  retrieval  time.  Possibly,  a combina- 
tion of  direct-access  devices  and  mass  storage  devices 
would  allow  rapid  retrieval  and  the  on-line  storage  of 
large  amounts  of  data.  Such  a combination  might  use 
the  mass  storage  devices  for  permanent  storage  of 
the  data  and  the  direct-access  devices  for  immediate 
access  to  the  data. 

This  same  combination  would  probably  also  yield 
some  benefits  in  the  areas  of  error  protection  and 
data  recovery.  If  a staging  device  were  the  primary 
source  of  data  available  to  the  analysts,  the  mass 
storage  devices  would  not  be  accessible  to  them.  Any 
errors  that  resulted  during  the  analysis  process  would 
be  confined  to  the  staging  device  and  only  it  would 
need  to  be  recovered. 

The  mass  storage  devices  may  offer  only  a partial 
solution  to  the  storage  problems  of  the  future.  To 


augment  the  capacity  of  the  mass  storage  devices,  it 
may  be  necessary  to  apply  data  compression  tech- 
niques to  the  data  before  storing  it.  The  total  solution 
may  even  involve  distributed  data  bases  with  a net- 
work capable  of  transporting  data  from  one 
geographic  location  to  another. 

Assuming  the  interest  in  determining  worldwide 
multiple  crop  production  continues,  the  next  data 
base  challenge  will  almost  certainly  involve  solving  a 
trillion-byte  data  base  problem. 
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Man-Machine  Interfaces  in  LACIE/ERIPS 

Barbara  B.  Dupreya 


INTRODUCTION 

The  Earth  Resources  Interactive  Processing 
System  (ERIPS)  supports  the  LACIE.  There  are 
three  major  man-machine  interfaces  in  this  system: 
the  use  of  “menus”  for  communication  between  the 
software  and  an  interactive  user;  the  check- 
point/restart facility  to  recreate  in  one  job  the  inter- 
nal environment  achieved  in  an  earlier  one;  and  the 
error  recovery  capability,  which  greatly  reduces  the 
impact  of  errors  which  would  normally  cause  job  ter- 
mination. This  interactive  system  has  also  been 
adapted  for  use  in  noninteractive  (batch)  mode. 

The  LACIE/ERIPS  software  system  is  a large 
computer  program  developed  by  IBM  Federal 
Systems  Division,  Houston,  Texas,  in  support  of 
NASA’s  Earth  resources  data  analysis  activities. 
ERIPS  executes  on  an  IBM  360/75  mainframe  in  the 
Real-Time  Computer  Complex  (RTCC)  in  Building 
30  of  the  NASA  Johnson  Space  Center  (JSC), 
Houston,  Texas.  A general  description  of  the 
development  and  capabilities  of  this  system  is  pre- 
sented elsewhere  in  these  proceedings  (C.  L. 
Johnson,  “LACIE/ERIPS  Software  System  Sum- 
mary”). One  of  the  most  important  aspects  of  the  in- 
teractive portion  of  the  system  is  the  way  in  which 
the  analysis  and  decisionmaking  capabilities  of  a 
human  being  are  integrated  with  the  speed  and  ac- 
curacy of  a computer  to  produce  a powerful  analysis 
system.  A basic  goal  of  the  design  of  the  system  was 
for  it  to  be  “man-rated” — easy  to  use,  amenable  to 
human  direction,  and  forgiving  of  human  error  (by 
both  users  and  programers).  This  paper  discusses  the 
techniques  used  in  ERIPS  to  reach  this  objective. 

The  broad  objective  of  “man-rating"  the  system 
led  to  the  development  within  ERIPS  of  three 
capabilities:  the  menu-style  user  control  interface, 
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the  checkpoint/restart  facility,  and  the  error  recovery 
facility.  The  menus  are  graphic  displays  which  are 
the  user's  principal  interface  with  the  system.  Each 
menu  presents  information  concerning  the  status  of 
the  system,  and  it  either  requests  necessary  informa- 
tion or  shows  current  options  so  that  the  user  can 
make  selections  as  he  would  from  a restaurant  menu. 
Checkpoint/restart  allows  for  recovery  from  total 
system  failures  and  for  return  to  a previous  point  in 
processing  after  a lapse  of  time,  as  from  one  day  to 
the  next.  The  checkpoint/rcstart  function  is  partially 
controlled  by  the  user  via  menu  inputs.  Error  r xov- 
ery  protects  the  results  of  previous  processing 
whenever  a program  failure  occurs  by  automatically 
returning  the  system  to  prefailure  status. 

The  ERIPS  user  interfaces  with  the  system  by 
means  of  a terminal.  This  terminal  includes  a 
graphics  or  conversational  screen  on  which  the 
menus  are  displayed,  two  image  screens  (one  of 
which  displays  16  levels  of  gray,  the  other  either  8 or 
64  colors),  a keyboard,  and  a “joystick”  which  drives 
a cursor  indicator  to  the  same  relative  position  on  all 
three  screens  The  physical  layout  of  the  hardware  is 
shown  in  figure  1 and  the  keyboard  layout  in  figure  2. 


FIGURE  I. — ERIPS  interactive  terminal. 
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Display  Hardware  Considerations 


FIGURE  2— F.R1PS  kr>bo»ri  layout. 


MENUS 


Purpose 

The  menu  interface  in  ERIPS  was  originally  estab- 
lished to  provide  an  interactive  user  with  a tutorial 
display  of  the  options  available  at  each  point  o ir- 
teraction  and  then  to  respond  to  selections.  The 
ERIPS  menus  also  display  various  types  of  messages 
and  dynamic  data.  They  thus  provide  the  user  with  a 
considerable  amount  of  status  information  to  aid  in 
the  intelligent  selection  of  a processing  option. 

This  is  in  sharp  contrast  to  the  philosophy  of 
those  inter  active  systems  which,  like  OS- 360.  depend 
on  the  user  to  request  functions  at  will,  with 
reference  to  some  external  documentation  such  a^  a 
user's  guide.  That  approach  tends  to  leave  the  user 
lied  to  at  least  three  sets  of  documentation:  his  own 
objectives  for  this  particular  session,  notes  and 
hardcopies  ol  what  has  happened  so  far,  and  the 
user’s  guide  to  determine  what  can  be  done  next. 
This  is  simply  too  large  a burden  to  place  on  the  user 
of  a system  with  as  many  capabilities  as  ERIPS 

Other  systems  have  used  the  menu  concept,  in- 
cluding the  DRAFT  (Display  Retrieval  and  Format- 
ting Technique)  and  Skylab  software  systems  from 
which  some  of  the  menu  and  display  logic  was  taken 
(see  ref  1),  but  an  unusual  feature  of  the  ERIPS  ap- 
proach is  the  way  the  menus  are  logically  linked  to 
form  what  is  in  essence  an  inverted  tree.  This  makes 
it  easy  to  show  the  user  only  what  is  needed  to  con- 
tinue along  the  path  he  has  chosen;  it  also  makes  it 
very  easy  to  b;*ck  up  and  choose  an  alternative  path 


The  DRAFT  Digital  Television  Equipment 
(DTE)  terminal  hardware  for  which  ERIPS  was 
developed  has  naturally  had  a significant  effect  on 
the  implementation  of  the  menu  concept.  This  ter- 
minal has  the  capability  to  display  lines  and  a wide 
range  of  symbols,  but  it  has  no  'ocal  editing 
capability — each  symbol  or  line  of  display  must  be 
specifically  directed  by  the  software.  Forward  and 
backward  paging  and  scrolling  arc  also  under  direct 
control  of  the  system  software.  All  these  functions, 
and  many  others,  arc  incorporated  into  some  of  the 
“intelligent"  terminal  systems  now  available  at  a 
relatively  low  cost;  use  of  such  terminals  could 
relieve  the  central  processor  of  much  of  the  display 
processing  overhead. 

Another  feature  of  the  terminal  displays  in 
general  is  the  split  screen  This  is  implemented  on 
current  hardware  in  limited  form.  The  conversa- 
tional screen,  for  instance,  is  segmented  into  the 
menu  display  area,  the  separately  maintained  overlay 
for  the  menu  display,  and  three  one-line  message 
areas.  Terminal  systems  with  far  more  extensive  seg- 
mentation capabilities  now  exist  and  might  profita- 
bly be  used  to  provide  a running  message  log,  opera- 
tor-generated messages,  a scratch  pad  area,  and  so 
forth. 

It  should  be  noted,  however,  that  the  transport- 
ability of  a menu-oriented  system  can  be  severely 
diminished  by  loo  great  a dependence  on  terminal  in- 
telligence, which  might  not  be  common  to  all  the  in- 
stallations for  which  the  system  is  intended  A 
‘‘lowest  common  denominator"  must  then  be  iden- 
tified, and  the  requirement  of  total  control  by  the 
central  processor  has,  historically,  been  very  suc- 
cessful (sec  Johnson) 


Menu  Definition 

Before  menus  can  be  used,  they  must  be  defined 
Since  display  of  a menu  involves  processing  over- 
head and  demands  user  interaction,  the  menu's 
scope  must  be  wide  enough  to  justify  its  existence 
This  makes  the  menu  concept  impractical  for  some 
systems.  On  the  other  hand,  the  menu  must  not  have 
such  a wide  scope  that  it  ove;  whelms  the  user  with 
its  complexity.  When  this  question  of  scope  has  been 
resolved,  menu  definition  can  begin 

. he  ERIPS  menus  consist  of  control  information, 
sialic  contents,  and  dynamic  contents  (See  reference 
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2 for  detailed  descriptions.)  The  control  information 
indicates  which  portions  of  the  screen  are  sensitive 
to  inputs  and  what  type  of  input  is  expected  for  each. 
For  instance,  it  might  specify  that  < particular  item  is 
expected  to  have  1 to  4 numeric  characters  input, 
with  a value  range  of  I to  2000  and  a default  value  of 
200  when  no  input  is  made. 

The  static  contents  of  the  menu  form  the  constant 
visible  display.  For  instance,  the  Held  mentioned 
above  may  be  underlined  and  preceded  by  the  word 
WEIGHT. 

The  dynamic  contents  of  the  display  vary  during 
the  course  of  the  run  and  are  maintained  as  an  over- 
lay of  the  static  contents.  Input  playback,  messages, 
and  various  types  of  formatted  output  are  included  in 
this  category.  For  instance,  suppose  the  purpose  of 
this  menu  is  to  show  those  items  within  some  table 
whose  weights  are  less  than  or  equal  to  the  input 
value  and  Hag  them  to  participate  in  some  later  pro- 
cess. As  the  user  types  in  the  value,  each  numeric 
character  is  displayed  in  the  appropriate  position  on 
the  screen.  When  the  user  signals  that  input  is  com- 
plete, the  system  formats  and  displays  the  appropri- 
ate lists,  such  as  the  names  and  weights  of  the  items 
identified.  There  might  also  be  some  message  dis- 
played, such  as  an  application  message  stating  “003 
PERCENT  QUALIFIED,  RETRY  OR  SELECT 
•ACCEPT*  » UNCTION."  This  message  might  cue 
the  user  to  retry  the  menu  with  a more  realistic 
value. 


Inputs 

The  intent  of  the  ERIPS  menu  input  scheme  is 
threefold: 

1.  Minimum  keystroke  input 

2.  Maximum  ease  of  entry 

3.  Maximum  ease  of  correction 

Minimum  keystroke  input  is  achieved  primarily 
by  establishing  defaults  which  correspond  to  the 
most  commonly  wanted  processing.  To  utilize  these 
defaults,  the  user  simply  confirms  that  they  are  to  be 
used  by  not  making  specific  inputs  before  signaling 
input  completion.  In  many  cases,  this  means  that 
only  a single  keystroke  is  required  for  the  menu. 

For  maximum  case  of  entry,  the  focus  is  on  those 
items  (fields)  which  require  typed  input.  When  the 
first  input  on  u menu  is  made,  the  current  menu  con- 
trol information  is  used  to  form  an  overlay  display. 
This  overlay  has  a special  character  at  the  beginning 
of  each  data  field  to  help  with  use  of  the  joystick.  The 


system  also  begins  to  maintain  another  special 
character  (called  the  alpha  cursor)  showing  where 
the  next  typed  input  character  will  go.  When  the  user 
signals  completion  of  input  for  one  field,  the  alpha 
cursor  moves  automatically  to  the  next.  This  greatly 
simplifies  the  task  of  entering  lists.  Figure  3 illus- 
trates a menu  which  has  been  filled  in,  entered,  and 
rejected  and  is  now  awaiting  correction. 

Ease  of  correction  is  provided  by  (1)  the  ap- 
pearance of  messages  from  the  various  error-check- 
ing functions  (to  be  discussed  in  the  next  section; 
while  the  input  is  still  displayed  and  available;  (2) 
the  use  of  devices  such  as  the  joystick,  the  toggle 
switch,  and  special  action  keys;  and  (3)  the  way 
menus  are  logically  linked  so  that  return  to  the  pre- 
vious one  can  be  requested. 

Altogether,  the  ERIPS  menu  inputs  include  fields, 
decision  boxes,  special  function  boxes,  special  action 
keys,  joystick  and  toggle  switch,  and  erasures. 

Fields. — Field  inputs  are  typed  data,  entered 
character  by  character  into  the  positions  indicated  by 
the  alpha  cursor.  Depending  on  the  application,  they 
may  be  alphabetic,  numeric  (with  or  without  decimal 
point),  or  some  other  form  natural  to  that  applica- 
tion. 

Decision  boxes. — When  a menu  requires  process- 
ing path  decisions,  there  are  boxes  in  the  static  part 
of  the  display  with  associated  text  to  describe  the 
paths.  The  user  makes  a decision  by  moving  the 
joystick  cursor  into  the  box  and  pointing  the  toggle 
switch  towards  the  conversationai  screen,  and  the 
system  displays  a plus  sign  in  the  box  as  playback. 

Special  Junction  boxes. — The  special  function 
boxes  are  a strip  of  decision  boxes  along  the  right 
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edge  of  (he  conversational  screen.  These  act  like  the 
special  function  keys  found  on  many  terminal 
systems,  except  that  here  it  is  possible  for  a key  to 
have  a different  function  on  each  menu.  They  always 
include  one  for  return  to  the  previous  menu  and 
alternatives  to  the  end-of-field  (EOF)  and  end-of- 
transmission  (EOT)  special  action  keys.  Depending 
on  the  applications,  there  may  be  up  to  IS  other  func- 
tions defined;  selection  of  one  of  these  normally 
brings  up  the  first  menu  for  that  function. 

Special  action  keys. — The  terminal  keyboard  has 
several  keys  which  are  special  menu  action  indica- 
tors. The  EOF  key  signals  that  input  for  the  current 
field  is  complete  (and,  if  there  is  no  input,  confirms 
the  default  values  for  the  field).  The  EOT  key  ends 
any  current  field  and  signals  that  all  menu  input  is 
complete  and  ready  to  be  processed.  There  are  for- 
ward and  backward  space  keys,  and  there  are  keys 
which  dear  the  areas  containing  the  supervisor  and 
application  messages  (which  also  clear  automatically 
on  a timed  basis).  There  are  other  special  action  keys 
which  deal  not  with  menus  but  with  management  of 
reports  and  with  debug  services;  these  will  be  dis- 
cussed later. 

Joystick  and  toggle  switch. — The  joystick/toggle 
switch  combination  i»  a very  powerful  portion  of  the 
ERIPS  input  scheme.  The  joystick  drives  cursors  on 
all  three  screens  of  the  user's  terminal;  the  toggle 
switch,  when  pointed  toward  a screen,  indicates  that 
the  cursor  has  reached  a significant  position  on  that 
screen.  The  user  points  the  toggle  switch  at  the  con- 
versational screen  to  make  decisions;  to  move  the 
alpha  cursor  to  the  beginning  of  a field,  overriding  its 
automatic  placement  (and  ending  any  current  field 
without  the  need  for  the  EOF  key);  and  to  identify 
coordinates  for  line  drawing  or  other  functions.  In 
these  cases,  the  match  to  the  points  identified  in  the 
menu  control  information  need  not  be  exact  but  only 
within  approximately  two  character  widths  in  any 
direction.  Image  screen  pointing  serves  similarly  for 
coordinate  identification  on  these  screens. 

Erasures. — Certain  menu  inputs  serve  as  erasures 
of  previous  actions.  Pointing  at  a previously  seivCted 
decision  box  cancels  the  selection  and  removes  the 
plus  sign.  Pointing  at  the  beginning  of  a field  and  typ- 
ing a blank  reestablishes  the  default  for  that  field, 
while  typing  alternative  characters  corrects  the  field. 
Backspacing  when  a line-drawing  function  is  under- 
way erases  the  most  recent  line  and  returns  to  its 
starting  point.  Even  the  menu  itself  is  effectively 
erased  by  use  of  the  "return"  special  function.  Thus. 


nearly  all  menu  actions  can  be  undone  if  the  user  so 
desires. 


Error  Checking 

The  input  error-checking  function  of  ERIPS  is  ex- 
tensive and  operates  at  several  levels.  The  system 
constantly  evaluates  current  status  and  displays 
messages  when  something  violates  the  rules  then  in 
effect  (or  follows  an  abnormal  path  of  which  the  user 
ought  to  be  notified). 

The  first  level  of  error  checking  on  menu  inputs 
occurs  as  they  are  entered,  when  they  are  compared 
with  the  current  set  of  expectable  actions.  Types  of 
errors  noticed  in  this  fashion  include  trying  to  type  in 
too  many  characters  lor  the  receiving  field,  pointing 
to  an  area  which  was  not  identified  in  the  control  in- 
formation, or  attempting  to  make  inputs  when  the 
keyboard  is  logically  locked  out.  Such  inputs  are  ig- 
nored, and  ,.n  appropriate  message  is  output. 

The  second  level  of  checking  occurs  when  the  user 
signals  end-of-input.  At  this  lime,  the  set  of  inputs 
made  for  the  menu  is  checked  to  see  that  all  are  in 
the  appropriate  form  and  do  not  violate  any  of  the 
limits  identified  in  the  control  information.  Errors 
noticed  at  this  level  include  attempting  to  make  in- 
compatible decisions,  making  no  choice  when  one  is 
required,  or  violating  the  numeric  range  associated 
with  a field.  Again,  appropriate  messages  are  issued, 
and  the  system  waits  for  corrected  inputs  to  be  made. 

Finally,  the  inputs  are  checked  against  the 
dynamic  condition  of  the  system.  For  instance,  if  the 
user's  choice  of  a processing  path  depends  on  the  ex- 
istence of  data  which  are  not  available,  a message  is 
generated  indicating  that  either  another  pa'.h  must  be 
chosen  or  the  data  must  be  supplied.  The  user  then 
has  to  respond  appropriately,  perhaps  by  returning 
through  the  menu  sequence  to  a point  at  which  the 
data  can  be  generated,  then  proceeding  in  the  re- 
quired fashion  until  he  again  reaches  the  point  at 
which  the  error  was  detected. 

The  intent  of  all  this  is  to  be  as  forgiving  of  human 
error  as  possible  without  allowing  such  errors  to 
jeopardize  the  integrity  of  the  results. 


Menu  Use  In  Nonlnteractive  Mods 

The  discussion  so  far  has  been  in  terms  of  interac- 
tive use.  with  an  analyst  at  a terminal  responding  to 
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the  displays  as  they  appear.  In  Tact,  though,  this  is 
not  the  mode  in  which  the  system  is  most  often  used. 
The  whole  menu  scheme  has  been  adapted  for  use  in 
noninteractive  (batch)  mode,  in  which  the  terminal 
hardware  is  not  even  connected  to  the  system. 
Though  the  resulting  system  is  certainly  unlike  any 
developed  strictly  for  batch  purposes,  it  produces 
comparable  output,  and  it  has  the  distinct  advantage 
that  software  modifications  ore  applied  to  one  set  of 
programs  rather  than  to  two  divergent  ones,  thus  en- 
suring synchronization  and  reducing  implementa- 
tion costs. 

There  are  actually  two  forms  of  batch  mode,  both 
depending  on  input  cards  to  control  system  process- 
ing. The  “process  control”  batch  mode  divides  the 
possible  system  actions  into  those  which  are  always 
(or  never)  to  be  done  and  those  for  which  user  inputs 
are  required.  Many  assumptions  based  on  common 
practice  have  been  made  in  order  to  reduce  the  num- 
ber of  actions  in  the  latter  set;  there  are  less  than  30 
types  of  input  cards,  corresponding  to  about  as  many 
menus,  and  nominal  processing  for  a particular 
geographic  site  requires  only  3 of  these.  The  user  cre- 
ates a control  deck  for  each  site  to  be  processed;  nor- 
mally, many  such  decks  are  combined  to  form  a job's 
input.  The  off-line  process  control  system  checks  the 
syntax  and  logical  compatibility  of  the  cards.  It  Jl 
goes  well,  it  constructs  a scenario  for  the  run,  draw- 
ing from  user  inputs  as  required.  It  produces  a data 
base  for  the  on-line  system,  along  with  an  abbrevi- 
ated version  of  the  scenario  on  a hardcopy  printer. 
(Figure  4 shows  a typical  batch  deck  and  its 
scenario.)  When  the  on-line  system  uses  the  data 
base,  it  repeats  the  hardcopy  scenario  as  a record  of 
activities,  appending  a status  line  which  indicates 
success  or  failure.  This  hardcopy,  together  with  other 
outputs  su  h as  Him  products,  is  returned  to  the  user. 

The  other  form,  “regular"  batch,  demands  very 
detailed  knowledge  of  the  menus  and  their  flow,  as 
the  inputs  must  all  be  specified  at  the  same  level  as 
for  an  interactive  run.  Because  of  this,  it  is  seldom 
used,  but  it  can  provide  for  accurate  repeatability  of 
complex  input  sequences  which  do  not  conform  to 
the  assumptions  made  for  process  control. 

One  aspect  of  running  in  cither  batch  mode,  of 
course,  is  that  nobody  is  in  a position  to  act  on  the 
“displayed"  messages,  so  the  system  cannot  wait  for 
corrections  as  it  would  in  interactive  mode.  Instead, 
the  messages  as  defined  are  divided  into  those  after 
which  no  meaningful  results  for  the  site  can  be  pro- 
duced (“fatal  errors")  and  those  which  can  be  treated 
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FIGURE  4.— Batch  roo4»  Input  carts  aat  hartcopy  scenario. 

as  warnings.  It  should  be  noted  that  a fatal  error  en- 
countered in  processing  for  one  site  does  not  stop  the 
job;  work  continues  with  the  next  site. 


Activity  Tracing 

The  ERIPS,  like  any  other  complex  system,  must 
provide  for  activity  tracing  to  verify  that  the  in- 
tended processing  path  was  taken  or  to  show  where 
deviations  from  the  path  occurred  and  how  that 
affected  the  outcome.  The  major  menus  of  activity 
tracing  in  ER1PS  are  reports  and  logging,  though 
various  other  products  exist  which  are  outside  the 
scope  of  this  discussion. 

Many  of  the  ERIPS  functions  generate  reports, 
which  are  formatted  for  display  on  the  conversa- 
tional screen.  (Figure  5 shows  an  ERIPS  report.) 
During  batch  mode  operation,  these  reports  go  to  a 
microfiche  tape.  The  interactive  user  can  view  them 
whenever  he  wants  by  use  of  the  report  - -*ecial  action 
keys  (enter/exit  report  mode,  page  forward,  and  page 
backward)  and  the  report  menu.  In  either  mode,  and 
whether  they  arc  displayed  or  not.  report  pages  arc 
written  to  a log  tape  when  they  are  formatted. 

Also  on  the  tog  tape  are  the  various  application 
menu  displays  at  the  times  of  significant  change — 
that  is. excluding  partial  input  playback  but  including 
initial  appearance,  final  input  playback,  and  system- 
generated  messages  and  dynamic  data. 
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FIGURE  5.— F.RIPS  report. 


When  detailed  activity  tracir^  .s  deseed,  the  log 
tape  can  be  processed  to  create  a hardcopy  version  of 
how  the  screen  appeared  (or  wouid  have  appeared) 
at  various  limes.  (See  figure  6 for  a typical  page  of 
delog.)  This  processing  is  generally  done  for 
troubleshooting  and  development  testing,  not  opera- 
tionally. 


CHECKPOINT/RESTART 

The  complementary  functions  of  checkpointing 
and  restarting  the  system  are  mainly  automatic,  in- 
volving the  maintenance  of  disk  data  sets  by  ERIPS. 
Checkpoints  are  taken  at  predefined  points  in  a pro- 
gram, such  as  upon  entry  to  or  normal  exit  from  an 
application,  and  only  one  checkpoint  disk  data  set 
exists  for  a terminal  at  any  one  time.  The  data  saved 
for  a terminal’s  checkpoint  represents  its  complete 
environment,  so  the  data  can  be  retrieved  to  restart 
the  terminal. 

One  user  interface  with  the  restart  function  is  at 
sign-on  time.  As  soon  as  the  user  signs  on,  a menu 
appears  which  asks  whether  to  restart  the  terminal  or 
disregard  any  restart  data.  If  restart  is  requested,  and 
the  data  exist,  the  user  is  in  effect  returned  to  the 
system  environment  at  the  time  the  last  checkpoint 
was  taken.  This  means  that  if,  for  instance,  a ter- 
minal session  is  interrupted  to  acquire  a full  dump 
for  use  in  debugging,  the  job  can  be  restarted  without 
having  to  recreate  data. 

The  other  user  interface  with  restart  is  the  writing 
of  a restart  tape,  which  can  then  be  specified  during  a 
later  sign-on  as  the  source  of  the  restart  data.  This 
capability,  which  is  provided  as  a special  function  of 


certain  application  menus,  can  be  used  to  save  restart 
data  which  would  otherwise  be  lost  when  the  next 
checkpoint  occurred. 


ERROR  RECOVERY 

The  error  recovery  function  is  one  of  the  most  im- 
portant and  unusual  features  of  ERIPS.  Most  com- 
plex software  systems  respond  to  serious  error  by  ab- 
normally terminating  (“abending”)  the  whole  job, 
leaving  the  user  with  the  need  to  start  all  over  again 
(and  maybe  again  and  again  as  the  same  error  is  en- 
countered in  different  disguises).  Through  its  error 
recovery  procedures,  ERIPS  drastically  reduces  the 
impact  on  the  user. 

When  a serious  system  error  is  encountered,  the 
abending  process  is  intercepted  by  ERIPS  software. 
If  error  recovery  is  desired,  as  it  generally  is,  a partial 
dump  is  produced  for  debugging  later.  In  interactive 
mode,  the  interrupted  application  is  notified  that 
recovery  is  needed  and  it  restarts  itself.  Thus,  if  the 
user  can  deduce  the  cause  of  the  error  and  avoid  it  in 
further  processing,  the  session  can  proceed  normally. 
In  batch  mode,  the  supervisor  finds  the  inputs  for 
the  next  site  and  proceeds  from  there. 
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FIGURE  6. — Delog  output  from  log  tape. 
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The  error  recovery  function  can  be  turned  off  if 
more  information  is  needed  to  solve  the  problem.  In 
this  case,  the  abending  process  goes  to  completion, 
producing  a full  dump  and  bringing  down  the  job. 
Even  then,  of  course,  an  interactive  user  can  still 
make  use  of  the  restart  data  when  he  signs  on  again. 

Error  recovery  can  also  be  specifically  forced  by 
the  user  by  enabling  and  depressing  the  “reset" 
special  action  key.  This  is  generally  done  when  the 
user  realizes  that  he  has  accidentally  started  some 
process  which  cannot  or  should  not  complete  (such 
as  specifying  the  number  of  a read-only  tape  for  a 
write  operation). 


CASE STUDY 

To  see  how  the  man-machine  interfaces  work  in 
practice,  let  us  follow  an  interactive  user  through  a 
terminal  session.  The  purpose  of  this  session  is  to 
define  some  fields  based  on  an  image  screen  display, 
to  classify  the  image,  and  when  the  results  look  good, 
to  note  the  field  definitions  and  get  a film  product  of 
the  classification  map.  Figure  7 illustrates  the  menu 
flow. 

First,  the  user  signs  on  for  a cold  start  (no  restart 
data  to  be  used)  and  enters  the  Pattern  Recognition 
application.  The  image  selection  menu  appears,  and 
he  selects  the  IM  (Image  Merge)  special  function 
bos  The  IM  menu  appears  and  asks  for  a site  num- 
ber and  for  the  acquisition  dates  to  be  merged.  For 
purposes  of  demonstration,  suppose  that  the  user  has 
some  trouble  entering  these  data.  He  tries  to  type  the 
first  character  of  an  acquisition  date  without  having 
shown  that  the  site  number  is  finished,  so  he  gets  a 
terminal  control  message  from  the  first  level  of  error 
checking.  After  correcting  this,  he  enters  the  dates 
and  selects  EOT,  signaling  the  end  of  the  inputs. 
Now  the  second  level  of  error  checking  discovers 
that  a required  field  (the  name  to  be  used  in  referenc- 
ing this  image)  was  omitted.  This  causes  the  ap- 
pearance of  a supervisor  message;  the  user  makes  the 
correction  and  selects  EOT.  Finally,  when  the  ap- 
plication software  tries  to  retrieve  the  data  from  the 
Image  Data  Base,  it  discovers  that  one  of  the  acquisi- 
tion dates  was  invalid,  and  an  application  message  is 
displayed.  (The  screen  now  looks  like  figure  3.)  The 
user  corrects  (his  error  and  selects  EOT;  this  time, 
the  terminal  control  message  says.  “Menu  input  ac- 
cepted." The  image  is  merged,  and  the  menu  reap- 
pears in  its  initia'i  state,  ready  for  another  image  to  be 
defined.  In  this  case,  (he  user  selects  (he  IMD  (Image 


Manipulation  and  Display)  special  functior  box, 
uses  the  IMD  menu  to  cause  display  of  the  image  on 
his  grayshadc  screen,  then  selects  the  RET  (return) 
special  function  until  the  Pattern  Recognition  (PR) 
image  selection  menu  reappears.  The  user  then  en- 
ters the  image  name,  and  an  EOT  causes  display  of 
the  PR  process  selection  menu. 

The  user  chooses  the  Field  Selection  process  and 
defines  fields  using  the  grayshadc  screen  to  connect 
the  points  he  selects.  Occasionally,  the  field  defini- 
tion is  rejected  because  it  has  too  many  vertices  (in 
which  case  the  user  can  backspace,  erasing  lines  so 
that  a simpler  field  can  be  drawn)  or  because  it  is  an 
illegal  shape  (for  instance,  it  docs  not  close  and  must 
be  redefined).  In  each  case,  the  appropriate  messages 
arc  output.  When  all  the  desired  fields  have  been 
defined,  the  user  requests  a Field  Definition  Report, 
which  he  then  views  via  the  report  mode  special  ac- 
tion keys. 

The  next  step  should  be  the  computation  of 
statistics  for  these  fields,  but  suppose  the  user  forgets 
and  requests  classification.  He  soon  gets  a message 
that  the  statistics  arc  not  available;  he  backs  up  to  re- 
quest them,  then  comes  back  to  classification.  When 
he  forgets  to  make  his  a priori  value  inputs  on  the  ap- 
propriate menu  a id  returns  to  it  to  input  them,  he 
encounters  what  turns  out  to  be  an  application  soft- 
ware error.  His  application  abends  (as  reported  by  a 
message  on  his  screen),  an  abbreviated  dump  is  pro- 
duced. and  the  PR  process  selection  menu  reappears. 
He  tries  classification  again  the  same  way.  and  again 
error  recovery  occurs  when  he  retries  the  a priori 
menu.  At  this  point,  the  user  disables  error  recovery 
(which  requires  both  the  “enable"  and  the  "switch 
recovery  mode"  special  action  keys. so  that  it  will  not 
happen  accidentally)  and  goes  through  the  sequence 
again.  This  time,  the  whole  job  abends  when  the  er- 
ror is  encountered,  creating  a full  dump.  (Typically, 
the  user  would  just  write  up  the  problem,  and  the 
programcr  responsible  for  solving  it  would  recreate 
the  situation  if  the  abbreviated  dump  was  insuf  ’ 
cient.)  The  user  then  asks  the  computer  opera'  . to 
feed  the  job  in  again. 

When  the  user  signs  on  again,  he  requests  a restart 
using  the  existing  data.  The  merged  image,  field 
definitions. and  statistics  are  retrieved  automatically, 
and  the  user  can  simply  enter  the  image  name  tor  PR 
and  select  classification.  (This  time,  he  is  careful  to 
make  his  a priori  inputs  when  the  menu  first  ap- 
pears.) Classification  proceeds  normally,  and  the 
user  requests  and  views  a classification  summary  re- 
port. Since  everything  looks  good,  he  selects  the  class 
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FIGURE  7. — Case  study  menu  flow. 
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map  function,  generates  a tape  from  which  the  film 
product  can  be  produced,  and  finishes  the  job  nor- 
mally. 

This  case  study,  while  it  is  by  no  means  typical, 
has  illustrated  the  way  in  which  the  user  and  the 
ER1PS  software  interact.  At  all  times,  the  user  has 
been  given  both  the  tools  needed  to  do  the  work  and 
the  information  needed  to  control  the  process.  He 
has  recovered,  with  very  little  pain,  from  his  own  er- 
rors and  (even  more  significantly)  from  a software 
error. 


SUMMARY 

In  summary,  the  mttjor  man-machine  interfaces  in 
ERIPS  are 

1.  Menus  to  display  system  status  and  processing 
path  options  and  to  request  necessary  information 

2.  Checkpoint/restart  to  save  results  between  ter- 
minal sessions,  reducing  reworks 

3.  Error  recovery  to  minimize  the  impact  of 
serious  errors 

In  implementing  these  interfaces,  several  lessons 
of  general  interest  have  been  learned.  First,  a highly 
interactive  system  can  be  made  easy  to  modify  by 
using  the  menu  concept  described  here.  The  initial 
co«»  of  such  a scheme  is  high,  since  it  must  include 
generalized  routines  for  menu  definition,  manage- 
ment, and  data  field  input/output  formatting.  Once 
this  has  been  done,  however,  the  alteration  of  exist- 
ing menus  and  the  addition  of  new  ones  are  simple. 
A bonus  is  the  localization  of  a terminal-dependent 
code  into  a small  set  of  routines,  allowing  software 
transportability  and  hardware  upgrading 

Second,  such  a system  can  be  compatible  with 
batch  mode  operations.  There  are  two  keys  to  suc- 
cess here:  careful  selection  of  a set  of  significant  in- 
put types  and  development  of  generalized  software 
to  merge  the  static  and  dynamic  information  and 
feed  it  into  the  system. 


Third,  provision  for  a batch  mode  of  operation  is 
not  optional  but  imperative  when  frequently  used 
functions  involve  a large  number  of  interactive 
menus.  A possible  improvement  in  the  current 
system  would  be  a means  by  which  an  interactive 
user  could  indicate,  at  the  beginning  of  each  m^jor 
process,  whether  the  path  assumptions  made  for 
batch  mode  operations  are  applicable.  If  so,  the  num- 
ber of  required  interactions  could  be  substantially 
reduced. 

Finally,  high-level  compiler  languages  such  as 
FORTRAN  and  PL/1,  while  they  offer  some  advan- 
tages in  ease  of  implementation,  are  only  marginally 
compatible  with  error  recovery  as  described  here;  in 
addition,  the  bulky  modules  they  tend  to  produce  can 
seriously  interfere  with  mulliterminal  interactive 
use.  We  have  found  that  an  assembler  language  with 
macro  capabilities  (in  our  case,  Assembler-360  with 
the  High-  Level  Assembler  Language  (HLAL)  struc- 
tured programing  macros)  is  almost  as  easy  to  imple- 
ment and  avoids  these  problems  (ref.  3). 

Throughout  the  ERIPS  system,  a primary  concern 
is  to  require  minimum  input  from  the  user  after  sup- 
plying him  with  maximum  information,  whiie  being 
as  forgiving  as  possible  of  human  error.  Success  at 
this  goal  plays  a major  role  in  the  success  of  LAC1E 
as  a whole,  since  the  usefulness  of  a system  depends 
largely  on  its  usability. 
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LACIE/ERIPS  Software  System  Summary 

C.  L.  Johnson a 


ABSTRACT 

The  Earth  Resources  Interactive  Processing 
System  (ERIPS)  software  supports  the  Large  Area 
Crop  Inventory  Experiment  (LACIE)  with  the 
analysis  of  agricultural  data  sensed  by  the  Landsat 
spacecraft.  Its  primary  function  is  the  classification 
of  the  data  on  the  basis  of  statistical  similarity  to 
those  portions  which  have  been  identified  by 
analysts.  Since  its  original  definition  in  1971,  ERIPS 
has  been  used  to  develop  analysis  tools  related  to  that 
process.  This  is  a summary  of  the  development  and 
capabilities  of  the  ERIPS  software  system. 


INTRODUCTION 

The  ERIPS  software  was  developed  by  the  IBM 
Federal  Systems  Division,  Houston,  Texas,  to  sup- 
port NASA  in  its  Earth  resources  activities.  LACIE/ 
ERIPS  executes  on  an  IBM  360/75  mainframe  with 
an  attached  Goodyear  STAR  AN  S-500  special-pur- 
pose processor  (SPP)  at  the  NASA  Johnson  Space 
Center  (JSC)  Real-Time  Computer  Complex  (JSC 
Building  30)  in  Houston.  It  is  used  to  process  the 
Landsat  data  to  estimate  the  wheat  growing  area  in 
several  countries.  From  these  estimates,  the  analysts 
develop  their  production  predictions. 

The  LACIE/ERIPS  is  a large  program  (approx- 
imately 240000  lines  of  code  on  the  360/75  and 
19  470  on  the  SPP)  the  development  of  which  ac- 
tually began  in  1971,  on  the  basis  of  the  algorithms 
used  by  the  Purdue  University  Laboratory  for  Ap- 
plications of  Remote  Sensing  System  (LARSYS) 
(ref.  1 );  it  was  not  associated  with  LACIE  until  1974. 
During  its  development,  ERIPS  has  evolved  from  an 
interactive  system  used  as  a research  tool  into  a 
system  that  is  primarily  used  nonintcractivcly  on  a 
production  mode  basis.  Many  capabilities  have  been 
added  to  the  original  ones,  and  significant  lessons 
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have  been  learned  about  the  implementation  of  this 
type  of  system. 


SYSTEM  REQUIREMENTS 

Originally,  ERIPS  was  planned  as  a local  version 
of  LARSYS  which  would  use  available  computer  and 
display  hardware  (refs.  2 to  4).  LARSYS  was  chosen 
as  the  base  system  because  it  was  a powerful,  opera- 
tionally proven  system  for  the  analysis  of  remotely 
sensed  multispectral  data.  Among  its  capabilities 
were 

1.  Statistical  computations  of  means,  standard 
deviations,  covariance  matrices,  and  correlation 
matrices  for  data  classes 

2.  Separability  measurements  for  distributed 
classes  and  the  use  of  these  measurements  to  select 
features  (spectral  channels)  for  further  processing 

3.  Classification  of  data  by  a Gaussian  maximum 
likelihood  algorithm 

4.  Performance  evaluations  of  classification 
results 

In  addition  to  requiring  these  capabilities,  the 
ERIPS  definition  called  for  use  of  the  available  Dis- 
play Retrieval  and  Formatting  Technique  (DRAFT) 
digital  television  equipment  (DTE)  terminals  as  im- 
age analysis  stations,  supporting  independent  users 
concurrently.  ERIPS  would  also  provide  an  error 
recovery  capability  to  reduce  the  impact  of  software 
failures  (see  the  paper  by  Duprcy  entitled  “Man- 
Machine  Interfaces  in  LACIE/ERIPS").  Finally, 
ERIPS  was  required  to  operate  in  a multijobbing  en- 
vironment, since  the  Real-Time  Computer  Complex 
resources  were  also  needed  for  development  of 
manned  space-flight  programs  (Skvlab  and  Apollo- 
Sovuz).  This  latter  requirement  led  to  a budget  of  250 
kilobytes  for  ERIPS,  in  a 300-kilobytc  region  to  en- 
sure some  future  flexibility. 

Investigation  of  this  system  definition  showed 
that  it  would  not  be  cost  effective  simply  to  modify 
LARSYS.  First,  the  existing  LARSYS  was  pro- 
gramed in  FORTRAN.  The  FORTRAN  language  is 


well  suited  lor  implementation  of  mathematical 
algorithms  but  not  for  the  logical  operations  that 
were  bound  to  result  from  interactive  multiple-ter- 
minal use.  The  error  recovery  requirement,  in  partic- 
ular, needed  to  operate  on  an  interrupt-handling 
basis  for  retreat  to  an  earlier  system  environment; 
this  capability  is  not  present  in  FORTRAN. 

Another  difficulty  was  that,  like  most  compiler- 
generated programs,  the  LARSYS  modules  were  ex- 
pensive in  terms  of  core  usage.  This  characteristic 
was  incompatible  with  the  250-kilobyte  budget. 

Finally,  LARSYS  fields  could  be  defined  only  as 
rectangles  with  two  sides  parallel  to  the  aircraft 
(satellite)  flightpath.  Although  this  configuration 
was  acceptable  for  aircraft  imagery  taken  over  large 
agricultural  fields,  it  imposed  unacceptable  restric- 
tions on  imagery  taken  over  smaller  fields  from  the 
much  higher  altitudes  of  the  satellites. 

After  all  these  differences  were  considered,  it  was 
decided  to  develop  ERIPS  as  a new  system,  indepen- 
dent of  LARSYS,  which  could  better  use  available 
resources  (fig.  1).  This  highly  interactive  system. 


written  primarily  in  assembler  language,  was  subse- 
quently adapted  to  accommodate  noninteractive 
users  also,  as  LACIE  entered  its  production  phase 
(fig.  2). 

Thus,  some  of  the  design  objectives  that  have 
helped  to  shape  LACIE/ERIPS  are  (1)  use  of  the 
LARSYS  algorithms;  (2)  use  of  existing  hardware, 
preferably  in  such  a manner  as  to  allow  transpor- 
tability to  other  systems;  (3)  support  of  multiple  ter- 
minal users  simultaneously;  (4)  operation  in  a multi- 
jobbing environment;  and  (5)  ability  to  recover  from 
errors  with  minimal  impact. 

Another  major  decision  was  made.  Since  this  was 
to  be  an  experimental  program  used  by  analysts  with 
varying  amounts  of  experience  with  computer  data 
processing,  "menus"  were  chosen  as  the  primary 
man-machine  interface  (see  the  paper  by  Duprey). 
Because  of  their  tutorial  activity-prompting  ap- 
proach, menus  do  not  require  extensive  training 
before  production  use  can  be  made  of  the  system. 

Finally,  the  LACIE  environment  (refs.  8 and  9) 
placed  considerable  emphasis  on  the  problem  of  han- 
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tiling  the  huge  volumes  of  data  involved.  This  led  to 
the  data  base  construction  which  is  discussed  later 
here  and,  in  more  detail,  in  the  paper  by  Westberry 
entitled  "The  LACIE  Data  Bases:  Design  Considera- 
tions.” 

LACIE/ERIPS  HARDWARE 

The  IBM  360/75  on  which  LACIE/ERIPS  ex- 
ecutes has  a million  bytes  of  main  core  storage  and  4 
million  bytes  of  large  core  storage.  As  functions  have 
been  added  to  the  system.  ERIPS  has  outgrown  the 
original  300-kilobyte  region.  It  now  requires  about 
500  kilobytes  of  main  core  and  1200  kilobytes  of 
large  core  to  support  two  terminals  at  peak  load. 

The  LACIE/ERIPS  configuration  (fig.  3)  includes 
two  pairs  of  DRAFT  II  DTE  terminals,  one  in  JSC 
Building  30  and  one  in  JSC  Building  17.  Only  one 
pair  can  be  active  at  a time.  Each  terminal  (fig.  4)  has 
(l)a  black-and-white  conversational  screen;  (2)  an 
image  screen  that  can  display  16  discrete  shades  of 
gray;  (3)  an  8-color  image  screen  that  can  display  in 
64  colors  if  the  other  color  screen  is  not  being  used; 
(4)  a keyboard  (fig.  5)  with  97  alphabetic,  numeric. 


and  special  action  keys;  and  (5)  a joystick  and  toggle 
switch  to  control  a cursor  on  the  screens.  (The  origi- 
nal terminals  shared  an  eight-color  screen  and  used  a 
Grafacon  tablet  with  an  associated  pen  and  a footpad 
switch  for  cursor  control.)  Each  pair  of  terminals 
also  has  a conversational  screen  hard-copy  device 
and  another  device,  not  available  originally,  which 
can  produce  photographic  prints  of  the  contents  of 
any  screen  in  64  shades  of  gray. 

Other  equipment  in  the  configuration  includes 
eight  tape  drives  (nine-track,  800  bits/in.);  an  IBM 
1403  printer,  an  IBM  1443  printer;  an  IBM  2314-1 
disk  storage  facility;  and,  since  1975,  the  ITEL  7330 
disks  that  contain  the  Information  Management 
System  data  bases.  Finally,  there  is  the  SPP  (fig.  6). 
This  special-purpose  parallel  processor  was  added  in 
1976.  It  performs  almost  all  the  computations  re- 
quired by  several  critical  applications  (statistics, 
clustering,  and  classification).  Since  it  operates  in 
parallel,  not  serially  like  the  360/75.  the  SPP  has  sig- 
nificantly improved  system  throughput.  For  exam- 
ple, a classification  which  took  10  minutes  dropped 
to  30  seconds  with  the  addition  of  the  SPP.  The 
benchmark  case,  classifying  a lour-channcl  full- 
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frame  image  of  data  from  Landsat,  takes  less  than  8 
minutes. 

Other  components  of  the  system  besides  the  SPP 
have  had  a significant  effect  on  LAC1E/FRIPS 
development.  For  instance,  careful  management  of 
the  allocation  of  core  is  required.  First,  the  speed  at 
which  large  core  operates  is  only  about  one-third  that 
of  main  core.  This  feature  can  cause  high-access  ap- 
plications like  class  summary  and  feature  selection  to 
consume  far  too  much  time  unless  their  storage  re- 
quirements are  taken  almost  entirely  from  main 
core  Second,  there  is  always  the  possibility  of  frag- 
menting the  available  storage  into  noncontiguous 
segments  that  cannot  be  used  to  satisfy  a core  re- 
quest Finally,  both  terminals  can  be  concurrently 
executing  a large  application.  This  condition 
becomes  even  more  significant  if  more  terminals  are 
to  be  supported,  as  is  intended  for  the  future.  It  is 
very  helpful  to  have  a monitoring  function,  like  the 
advanced  statistics  collector  utility  used  with 
LACIE/ERIPS.  This  utility  runs  in  the  background, 
recording  information  about  storage  requests  and  so 
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on  Analysis  of  these  recordings  can  provide  warning 
of  impending  problems  so  that  corrective  action  can 
be  taken 

The  terminals  have  also  been  important  in 
LACIE/ERIPS  development.  Since  they  have  very 
little  local  intelligence,  the  TRIPS  software  includes 
a considerable  amount  of  overhead  for  display  proc- 
essing Each  user  action  (typing  a character  for  a 
menu  field,  for  instance)  must  be  interpreted  by 
ER1PS  The  software  must  maintain  all  the  data 
being  displayed  on  the  conversational  and  image 
screens,  including  any  which  have  been  temporarily- 
displaced  so  the  screen  could  be  used  for  another 
purpose  (like  viewing  a report) 

SYSTEM  FUNCTIONS 

The  LACTE/ERIPS  software  is  composed  of  a 
supervisor  and  various  application  programs.  This 
modular  structure  enables  easy  addition  of  new  ap- 
plications as  the  system  grows  LACTE/ERIPS  ex- 
ecutes under  a locally  modified  OS/'MVT  (operating 
system/multitasking  with  variable  number  of  tasks). 
Originally,  this  was  the  Real-Time  Operating  System, 
which  has  significant  modifications  and  locally 
added  features  required  to  support  manned  space- 
flight  activities.  To  meet  its  transportability  goal, 
TRIPS  did  not  use  any  of  the  real-time  extensions  to 
the  standard  operating  system;  LACTE/ERIPS  now 
operates  under  the  Extended  Operating  System, 
which  has  fewer  local  modifications. 

The  supervisor  is  a sophisticated  executive 
routine  that  controls  all  sy  stem  activities  It  provides 
the  interfaces  between  the  user,  his  equipment,  and 
the  application  programs  It  ensures  minimal  impact 
when  adding  applications  to  LACTE/ERIPS  Basic 
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system  initialization,  error  recovery,  and  computer 
sharing  by  application  programs  are  the  sy  stem  func- 
tions supplied  by  the  supervisor  The  supervisor  iso- 
lates the  application  routines  from  the  characteristics 
of  the  external  hardware  devices.  This  hardware  in- 
dependence has  proved  to  be  efficient  in  transporting 
TRIPS  to  other  locations  with  different  equipment 
Versions  of  TRIPS  have  been  installed  on  IBM 
370/158  and  370/ 1 1>8  models  for  the  Earth  Resources 
Laboratory  (ERL/ERIPS)  and  the  ERMAN  project, 
using  different  configurations  of  disks  and  terminals. 

Some  common  services  are  located  in  the  super- 
visor and  used  bv  each  application  that  needs  them 
These  services  include  menu  control,  image  display 
to  screen,  image  data  access  service,  dynamic  space 
and  device  allocation,  data  logging  and  delogging,  and 
batch  input  control. 

To  give  the  application  programs  the  capability  to 
read  image  data  from  disk,  the  supervisor  incorpo- 
rates a specialized  high-speed  image  data  access 
method  This  method  is  a very  efficient  storage  space 
manager,  providing  direct  retrieval  of  the  imagery 
data  from  the  disk  (see  ref.  10). 

Application  programs  perform  various  image 
processing  functions.  The  ultimate  function  is  to 
group  statistically  analyzed  imagery  data  according  to 
specified  criteria  Before  this  process,  the  image  is 
made  available  and  prepared  for  the  user  as  needed. 

The  load  application  handles  image  data  generated 
from  various  sensors  It  is  designed  to  accept  data 
from  tapes  in  the  LARSYS,  Lands.it  multispedral 
scanner  (MSS),  or  universal  formats  and  put  those 
data  into  a standard  formal  I*  r use  by  the  other  ap- 
plications The  image  data  from  these  tapes  can  be 
put  on  the  sy  stem  disk  packs  In  addition  to  imagery 
loading,  load  can  scroll  a tape-resident  image  directly 
onto  an  image  screen,  unload  a disk-resident  image  to 
tape,  and  display  reports  on  the  contents  of  the  im- 
age tapes 

Image  manipulation  anil  display  (IMDi  is  the  ap- 
plication for  image  manipulations  and  image  dis- 
plays This  application  can  display  an  image  on  a dis- 
play screen  in  a maximum  of  l*>  shades  of  gray  or  8 
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or  64  colors.  The  user  can  view  a currently  loaded  im- 
age by  supplying  information  such  as  image  name, 
first  pixel,  initial  line,  and  channel  number  of  the 
data  to  be  displayed.  IMD  places  one  image  line  on 
the  screen  at  a time  beginning  at  the  top,  shifting 
down  one  line  as  a new  line  appears.  This  procedure 
continues  until  the  screen  is  full.  Since  most  images 
arc  larger  than  the  amount  of  data  allowed  on  the 
screen,  the  scroll  capability  allows  the  additional  im- 
age data  to  be  viewed.  Image  data  lines  arc  added  to 
the  top  of  the  screen  line  by  line  when  the  SCROLL 
key  is  depressed.  The  IMD  application  provides 
other  services  such  as  an  Available  Image  Report, 
which  displays  the  names  and  characteristics  of  all 
images  resident  on  disk,  and  a Latitudc/Longiludc 
Report,  which  gives  the  latitudc/longitudc  coordi- 
nates for  the  image  elements. 

The  registration  application  enables  user  control 
of  the  geographic  relationship  of  picture  elements 
(pixels)  within  an  image.  It  provides  two  capabilities. 
First,  registration  is  capable  of  conforming  an  input 
image  of  a given  scene  to  a reference  imagr  of  the 
same  scene.  Second,  it  maps  an  input  image  onto  a 
predefined  latitudc/longitudc  grid.  These  capabilities 
arc  important  for  several  reasons.  Registration  can 
be  used  to  remove  image  distortion  introduced  by  the 
remote  sensor  and  by  the  curvature  of  the  Earth. 
Moreover,  two  images  of  the  same  scene  produced 
by  two  entirely  different  sensor  devices  can  be 
registered  together.  This  capability  makes  it  possible 
to  correlate  data  from  a satellite  image  with  data 
from  an  aircraft  scanner.  Images  of  the  same  scene 
produced  at  different  times  can  be  registered 
together  to  permit  multitcmporal  analysis  of  the 
scene. 

For  image-to-image  registration,  the  user  iden- 
tifies one  of  the  two  loaded  images  as  the  reference 
image,  the  other  as  input.  The  input  image  is  mapped 
to  conform  to  the  reference  image.  The  user  then 
selects  a point  on  each  image  which  he  has  deter- 
mined is  common  to  both.  An  identification  (ID) 
number  is  assigned  to  the  point  pair.  This  step  is 
repeated  for  the  desired  number  of  points.  Once  the 
point  set  is  determined,  map  generation  processing 
takes  place. 

In  imagc-to-Univcrsal  Transverse  Mercator 
(UTM)  registration,  the  user  defines  the  geographic 
boundary  of  the  output  image  deseed.  Each  picture 
element  is  assigned  a latitudc/longitudc.  The  map 
generation  process  takes  place  after  all  assignments 
are  made  for  the  desired  number  of  points. 

Once  the  user  has  decided  to  generate  a mapping 


polynomial,  the  processing  for  image-to-image  and 
image-to-UTM  registration  is  similar.  The  image 
position  of  each  input  point  is  compared  to  the  cor- 
responding reference  point.  The  least  squares  coeffi- 
cient for  a bivariate  ,V  th-ordcr  polynomial  is  calcu- 
lated. In  image-to-UTM  registration,  the  reference 
point  is  a geographic  coordinate.  (See  the  section  en- 
titled “Registration"  in  the  appendix  Tor  the 
algorithm  description.) 

The  image  creation  application  provides  the  user 
with  three  different  methods  of  creating  an  image: 
image  composition,  image  difference,  and  image 
merge. 

Image  composition  allows  the  user  to  combine 
two  images  containing  the  same  number  of  lines  and 
pixels  into  one  Image.  The  most  common  use  of  this 
application  is  to  combine  two  registered  images  into 
one.  This  combined  imag'  can  then  be  used  for  ex- 
periments in  multitcmporal  analysis. 

Image  difference  allows  the  user  to  take  the 
difference  of  two  images  and  form  a third  image.  The 
two  images  must  contain  the  same  number  of  lines 
and  pixels.  For  example,  if  two  four-channel  images 
of  the  same  scene,  taken  at  two  different  times,  were 
differenced,  the  resulting  image  would  be  the  area 
that  changed  during  the  timelag. 

Image  merge  allows  the  user  to  juxtapose  as  many 
as  four  separate  images  with  different  acquisition 
dates  to  create  one  new  image.  Resident  images  from 
the  image  data  base  which  arc  117  lines  and  1%  pix- 
els in  si/e  arc  used  in  the  process.  The  resulting  im- 
age can  be  as  many  as  468  lines  long  with  1%  pixels 
per  line.  The  channels  in  this  image  are  numbered 
consecutively  beginning  with  I.  This  capability  was 
not  available  until  the  LACIE  phase  of  ERIPS. 
LACIE/ERIPS  also  has  a delog  application  which 
allows  the  user  to  receive  a printed  copy  of  the 
menus  and  reports  generated  during  his  terminal 
run. 

Remaining  to  be  discussed  is  the  pattern  recogni- 
tion application.  It  utilizes  numerous  programs  to  ac- 
quire classified  image  data.  and.  because  of  its  si/e 
and  its  importance  to  the  LAC  IE /FRIES  system,  a 
separate  section  is  devoted  to  it. 

THE  PATTERN  RECOGNITION  CONCEPT 

Pattern  recognition  is  the  largest  application  in  the 
system.  Its  function  is  to  classify  the  picture  ele- 
ments (pixels)  or  group  elements  (fields)  of  an  im- 
age into  classes.  Several  processing  steps  are  per- 
formed before  the  actual  classification.  First,  the  user 


322 


analyst  defines  areas  of  the  loaded  image  to  be  proc- 
essed and  identifies  the  materials  belonging  to  each 
area.  These  materials  can  be  of  an  agricultural  nature 
(such  os  wheat,  corn,  or  soybeans)  or  any  other  type 
of  ground  data  (trees,  roads,  water,  etc.).  After  the 
fields  of  an  image  have  been  established,  statistical 
analysis  can  be  done  and  the  selected  area  of  the  im- 
age classified.  If  the  analyst  finds  the  results  un- 
satisfactory. he  may  return  to  any  point  of  the  pat- 
tern recognition  process  to  redefine  and/or  recom- 
pute the  data.  The  analyst  can  also  choose  an  un- 
supervised  type  of  classification  without  having  to 
train  the  classifier.  The  clustering  algorithm  ex- 
amines all  data  elements  in  the  area  to  be  classified 
and  assigns  those  elements  that  are  spectrally  similar 
to  the  same  class  or  cluster.  The  user  input 
parameters  control  the  processing  and  specify  the 
degree  of  closeness  repuired.  The  output  clusters  can 
then  be  used  by  classification. 

The  subapplications  in  pattern  recognition  per- 
form separate  functions  which  together  accomplish 
the  pattern  recognition  task.  Many  of  the  algorithms 
were  taken  from  LARSYS.  but  other  features  and 
algorithms  have  been  added. 

The  field  selection  subapplication  gives  the  user 
the  option  of  determining  fields  from  an  image  and 
assigning  attributes  to  these  Felds.  A field  can  be 
defined  with  a minimum  of  2 vertex  points  (a  1- 
dimensional  tine  field)  and  a maximum  of  10  vertex 
points.  Field  vertices  are  entered  on  the  image  screen 
via  cursor  or  are  typed  onto  the  conversational 
screen  as  line/pixel  values.  The  user  can  view  all  cur- 
rent field  definitions  by  displaying  the  Field  Defini- 
tion Report. 

The  statistics  program  computes  the  means,  stan- 
dard deviations,  and  covariances  for  each  class 
defined  ip  the  system  that  contains  at  least  one  train- 
ing field.  It  also  performs  other  statistical  manipula- 
tions. such  as  the  combining  of  several  classes’ 
statistics  into  one  class.  Sun  angle  correction,  mean 
level  adjustments,  deletion  of  class  or  field  statistics, 
reassignments  of  fields  from  one  class  to  another, 
and  the  changing  of  a field's  status.  (See  the  sections 
entitled  “Statistics,"  “Sun  Angle  Correction.”  and 
“Mean  Level  Adjustment"  in  the  appendix  for 
algorithm  descriptions.) 

Clustering  is  a method  for  grouping  data  into 
homogeneous  sets,  In  LACIE/ERIPS,  the  clustering 
subapplication  partitions  a collection  of  pixels  into 
subsets  which  have  similar  spectral  signatures.  The 
primary  uses  of  clustering  arc  to  assist  in  defining  the 
boundaries  of  fields,  to  evaluate  fields  according  to 


homogeneity  of  data,  to  collect  homogeneous  data 
for  fields  from  non  homogeneous  areas,  and  to  act  as 
a nonsupervised  classifier  of  multispectral  data.  Two 
algorithms  arc  used  to  achieve  the  tcsults,  adaptive 
and  iterative.  The  user  has  the  option  to  use  the  one- 
pass  adaptive  algorithm  or  the  multipass  iterative 
algorithm  or  a sequential  combination  of  the  two. 
(See  the  sections  entitled  “Adaptive  Clustering." 
“Iterative  Clustering."  and  “Clustering  Report  Func- 
tions" in  the  appendix  for  algorithm  descriptions.) 

In  feature  selection,  an  optimal  subset  of  channels 
with  which  to  classify  an  image  can  be  determined 
This  feature  selection  process  utilizes  a separability 
measure  involving  the  Bhattacharyya  distance. 
Classification  time  can  be  greatly  lowered  with  this 
reduc  tion  of  dimensionality  of  the  data,  although  use 
of  the  SPP  for  classification  has  made  this  charac- 
teristic far  less  important.  The  optimal  channel 
subset  retains  a significant  percentage  of  the  sepa- 
rability inherent  in  ail  channels  of  the  image.  After 
every  execution  of  feature  selection,  a resultant  best- 
channel  subset  is  made  available  to  classification. 
The  user  has  the  choice  of  several  processing  paths, 
including  the  original  ERIPS  divergence  function. 
(See  the  sections  entitled  “Divergence”  and  “Feature 
Selection”  in  the  appendix  for  algorithm  descrip- 
tions.) 

The  classification  processor  in  LACIE/ERIPS 
assigns  each  pixel  of  a given  field  to  the  candidate 
class  the  statistics  of  which  that  pixel  most  nearly 
represents.  Classification  is  now  done  in  a mixed  en- 
vironment of  likelihood  density  functions,  with 
some  summations  performed  to  the  class  level  and 
others  to  the  category  level.  (See  the  sections  entitled 
“Classification  (ERIPS)”  and  “Classification 
(LACIE)"  in  the  appendix  for  algorithm  descrip- 
tions.) 

Pattern  recognition  produces  outputs  representing 
the  results  of  the  various  programs  The  report  out- 
puts are  in  the  form  of  imagery  data,  graphic  data,  or 
digital  data. 

Classification  maps  and  cluster  maps  are  outputs 
of  the  classified  and  clustered  image,  respectively. 
Each  data  element  is  represented  by  a symbol  on  a 
character  map  Each  clement  can  also  be  represented 
by  a gray  shade  or  a color.  The  gray  level  or  the  color 
is  associated  with  the  class  that  the  element  has  been 
classified  or  clustered  into. 

Other  major  reports  produced  by  pattern  recogni- 
tion are  Bias  Correction.  Spectral/Trajectory  Plots, 
and  Green  Number  Bias  correction  computations 
are  done  for  classification  results  and/or  clustering 
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results.  (See  the  section  entitled  “Bias  Correction"  in 
the  appendix  Tor  algorithm  description.)  Plots  «>c 
produced  for  spectral  plots,  tabulations,  and  trajecto- 
ry plots.  The  spectral  plots  provide  a two-dimen- 
sional plot  of  the  dots  per  acquisition  with  their 
classification  symbols  and  user-assigned  labels.  The 
first  two  Kaulh/Thomas  coordinates,  greenness  and 
brightness,  are  the  axis  values.  The  two  tabulations 
show  as  many  as  four  acquisitions  per  chart,  ordered 
by  dot  number  and  by  the  first  Kaulh/Thomas  coor- 
dinate (greenness  value).  The  trajectory  plot  is  a plot 
of  the  Kauth/Thomas  coordinates  for  each  dot  in 
each  of  the  acquisitions.  The  Green  Number  Report 
displays  the  Kauth/Thomas  green  number  (green- 
ness minus  average  soil  greenness)  associated  with 
dusters  and  dots.  It  is  used  by  Classification  and 
Mensuration  Subsystem  (CAMS)  analysts  to  moni- 
tor wheat  emergence  and  drought  conditions.  A 
greenness  and  brightness  value  is  displayed  for  each 
cluster  and/or  dot  for  all  acquisitions  that  are  proc- 
essed. 

The  LACIE/ERIPS  provides  several  independent 
subsystems  that  utilize  imagery  data  and  application 
results.  The  CAMS/Crop  Assessment  Subsystem 
(CAS)  interface  subsystem1  gives  the  user  the 
capability  of  using  pattern  recognition  results  on 
other  systems  that  can  receive  the  tape  inputs.  Input 
data  as  well  as  data  generated  from  composition  and 
indexing,  statistics,  feature  selection,  clustering,  and 
classification  are  collected  onto  disk  and  then  saved 
on  tape.  These  interface  tapes  arc  designed  to  be  as 
compatible  with  other  systems  as  possible.  The  data 
on  the  tapes  is  in  American  Standard  Code  for  Infor- 
mation Interchange  (ASCII)  formal,  with  record 
lengths  of  720  bytes. 

With  the  various  applications  processing  large 
amounts  of  data,  the  need  for  operating  speed 
became  a primary  concern  in  LACIE/ERIPS.  There 
were  hardware  and  software  constraints  which 
governed  time  requited  to  complete  execution  Th. 
addition  of  the  Goodyear  special-purpose  processor 
as  a parallel  processor  significantly  reduced  the  com- 
puter time  required  for  LACIE/ERIPS  operations. 
To  support  this  function,  hardware  and  software  in- 
terfaces between  the  computers  had  to  be  satisfied 
(ref.  II).  Five  software  modules  were  implemented 
in  the  SPP  to  interface  with  the  software  in  the  IBM 
360:  the  statistics  processor,  the  maximum  iikeli- 
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hood  classifier,  the  mixture  density  classifier,  itera- 
tive clustering,  and  adaptive  clustering  (fig.  6).  Five 
distinct  logical  units  of  input/output  (I/O)  data  arc 
specified  by  these  applications  for  communication 
between  the  host  (IBM  360)  and  the  SPP.  Three  logi- 
cal data  units  are  transferred  from  the  host  to  the 
SPP:  the  Interface  Control  Record,  the  input 
parameters,  and  the  input  vectors  (imagery  data). 
The  remaining  two  logical  units,  the  output  vectors 
and  the  output  parameters,  are  transferred  from  the 
SPP  to  the  host. 

The  result  of  this  system  interaction  was  a sub- 
stantial reduction  in  total  execution  time.  Improve- 
ments in  performance  times  for  classification  and 
clustering  were  most  important  (ref.  12). 

LACIE/ERIPS  DATA  BASES 

The  LACIE/ERIPS  uses  several  data  bases  as 
storage  facilities  for  the  large  volumes  of  information 
needed  to  support  the  system  functions.  The  largest 
system  update  came  during  the  P76  crop  year.  It 
contained  six  functional  Information  Management 
System  data  bases:  history,  image,  fields,  process 
control,  status  tracking,  and  results.  These  data  bases 
eliminated  the  previous  ones  developed  in  support  of 
crop  year  197$.  During  that  time,  image  tapes  from 
the  NASA  Goddard  Space  Flight  Center  (GSFC) 
were  first  introduced.  GSFC  tapes  were  multifile, 
universal  format  tapes  containing  prcproccsscd  im- 
agery data  acquired  from  Landsat-I  scanners.  Since 
many  tapes  were  needed  to  support  LACIE/ERIPS. 
the  concept  of  an  image  and  a field  data  base  was 
developed  to  handle  the  data  on  the  GSFC  tapes.  The 
system  stored  and  processed  data  collected  for  4193 
sample  segments  with  an  average  of  4.5  acquisitions 
per  segment.  A sample  segment  corresponds  to  a 
ground  area  of  about  6 by  $ nautical  miles,  or  117 
lines  by  196  pixels. 

The  history  data  base  contains  sample  segment 
identification.  GSFC  controlling  information,  and 
acquisition  history.  The  identification  and  GSFC 
controlling  information  includes  the  sample  segment 
ID.  type,  country,  crop  type,  biological  window,  film 
(lags,  and  color  codes.  The  acquisition  history  in- 
cludes data  quality  information,  a tape  index,  and  an 
image  data  base  index.  This  data  base  is  updated  by 
the  composition  and  indexing  subsystem.  It  contains 
a maximum  of  7 million  bytes  of  data 

The  image  data  base  contains  the  data  on  the 
GSFC  imagery  tapes  lor  an  entire  growing  season. 
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Both  header  and  imagery  arc  stored  so  that  images 
can  he  reconstructed  into  the  universal  format  when 
unloading  to  tape.  A maximum  of  4 four-channel  im- 
ages is  stored  for  an  ordinary  sample  segment,  and  a 
maximum  of  16  four-channel  images  is  stored  for 
training  and  intensive  study  sample  segments.  The 
-mage  data  base  consists  of  many  physical  data  bases 
because  of  its  sire,  which  is  2.9  billion  bytes  ot  data, 
maximum.  Composition  and  indexing  also  main- 
tains this  data  base. 

The  field  definitions  for  the  LACIE/ERIPS  sam- 
ple segments  arc  stored  on  the  field  transaction  data 
base.  Information  stored  on  this  data  base  describes 
field  locations,  field  types,  and  the  category  and  class 
and/or  subclass  associated  with  each  field.  The  data 
base  also  contains  default  a priori  and  threshold 
values.  All  data  can  be  retrieved  during  a L ACIE  ex- 
ecution on  a pcr-scgmcnt  basis  to  provide  data  for 
statistics  computations  and  classification  processing. 
The  field  transaction  data  base  subsystem  consists  of 
the  physical  data  Ease  along  with  the  programs 
necessary  to  create,  update,  maintain,  and  report  on 
this  data  base. 

The  process  control  data  base  contains  informa- 
tion which  defines  the  processing  to  be  done  in  batch 
mode  on  each  site.  These  data  arc  written  out  to  the 
data  base  by  the  process  control  subsystem  and 
deleted  after  the  site  is  processed  by  the  batch  pro- 
duction system.  The  batch  production  system  reads 
the  process  control  data  base  and  generates  a series  of 
simulated  menu  inputs  in  the  manner  specified  by 
the  process  control  information.  These  inputs  arc 
then  passed  to  the  LACIE/ERIPS  supervisor  for  in- 
put to  the  pattern  recognition  application.  The  max- 
imum si/e  of  this  data  base  is  1.6  million  bytes 

The  status  tracking  and  mensuration  results  data 
bases  arc  not  maintained.  After  their  development,  it 
was  determined  that  they  did  not  meet  the  design 
goals  set  for  them  Status  tracking  was  developed  to 
store  a history  or  tracking  of  the  LACIE/ERIPS  pro- 
duction jobs  and  the  products  used  or  produced  by 
the  system.  The  mensuration  results  data  base  was 
designed  to  contain  the  results  obtained  each  time  a 
site  was  processed  through  the  interactive  or  batch 
system 

1 he  most  recent  data  base  to  be  developed,  the  dot 
data  base,  contains  the  data  for  each  fixed  set  of  20 9 
pixels  or  dots  lor  each  sample  segment  (Dots  repre- 
sent every  tenth  pixel  on  every  tenth  line  of  a 
E ACH:  image  I Information  stored  on  this  data  base 
indicates  the  location  of  the  dot,  the  category,  and 
the  usage  Dots  are  used  in  pattern  recognition  as 


starting  vectors,  labeling  vectors,  or  bias  correction 
vectors. 

CONCLUSION 

The  ERIPS  has  undergone  many  changes  since  its 
original  implementation  in  1972,  as  can  be  seen  in 
figures  1 and  2.  During  the  process,  much  experience 
has  been  gained  in  handling  large  volumes  of  data 
and  providing  analysis  aids. 

A fundamental  design  conclusion  reached  very 
early  in  the  process  wxs  that  an  assembler  language 
w ith  macrocode  capabilities,  particularly  when  com- 
bined with  a preassemblcr  that  recognizes  structuring 
macrocodes  and  processes  them  befoie  assembly, 
has  almost  all  the  advantages  and  none  of  the 
drawbacks  of  the  compiler  languages.  Both 
FORTRAN  and  PL1  were  eliminated  from  con- 
sideration as  primary  implementation  languages, 
mainly  because  of  the  large  module  sizes  they  pro- 
duced. Also,  in  these  languages,  the  interfaces  with 
the  nonstandard  I/O  packages,  error  recovery,  and 
the  use  of  lar^c  core  storage  capacity  arc  cumber- 
some Thus,  the  use  of  these  languages  has  been 
restricted  to  certain  computational  subroutines 
(FORTRAN)  and  formatting-intensive  routines 
(PLI );  the  rest  of  ERIPS  is  coded  in  High  Level  As- 
sembler Language  (I1LAL). 

Another  basic  design  decision  which  has  proved 
very  successful  was  the  development  of  the  nonstan- 
dard I/O  packages.  The  image  direct  access  method 
provides  efficient  access  to  the  multispcctral  imagery 
data  on  whatever  basis  it  happens  to  be  needed — 
whole  segments,  specific  channels,  particular  image 
lines,  even  line-skipping  and  pixel-skipping  patterns. 
The  extended  access  method  provides  the  protocol 
required  for  communication  with  the  terminal  hard- 
ware Both  packages  have  been  successfully 
transported  with  ERIPS  to  other  mainframe/ 
disk/terminal  configurations  ERL/ERIPS,  for  in- 
stance. has  been  implemented  on  several  different 
models  of  IBM  .160  and  .170.  under  various  operating 
systems,  using  .1.130  and  3330-11  disks,  and  com- 
municating with  Ramick  terminal  hardware  (ref. 
13) 

During  the  life  of  the  system,  not  all  the  applica- 
tion changes  have  been  related  to  implementation  of 
new  functions.  Several  applications  have  disap- 
peared. fallen  into  disuse,  or  become  of  less  impor- 
tance. Bhattacharyya  chaining,  tor  instance,  which 
used  a Bhattacharyya  distance  function  to  form 
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“chains"  of  clusters  having  similar  distance  charac- 
teristics, proved  useless  for  analytic  purposes.  The 
divergence  technique  for  reducing  the  number  of 
channels  to  classify  was  largely  replaced  by  feature 
selection,  which  has  itself  been  eliminated  from  the 
common  processing  path,  since  the  time  it  consumes 
outweighs  the  time  saved  in  classification  with  use  of 
the  SPP.  Feature  selection  has  thus  become  primarily 
a reporting  tool  the  analyst  can  use  to  refine  his  in- 
puts to  the  rest  of  the  system.  Mensuration  has  dis- 
appeared from  ERIPS,  as  has  status  tracking.  All  this 
is  further  proof  of  the  need  for  modularity  of  system 
structure,  which  in  ERIPS  is  a natural  consequence 
of  the  menu  concept. 

An  important  shift  of  emphasis  in  the  mode  of 
processing  has  also  occurred.  Originally,  ERIPS  was 
purely  an  interactive  system.  By  1975,  however,  a 
noninteractive  mode  had  become  imperative,  and 
most  of  the  system's  work  is  now  of  this  type. 

Though  the  experimental  phase  of  LACIE  is  now 
over,  with  transition  to  a multicrop  analysis  require- 
ment beginning,  the  history  of  ERIPS  gives  some 
clue  to  the  future  of  large-scale  image  processing 
systems.  There  has  been  a continual  development  of 
ways  in  which  the  computer  can  be  used  to  provide 
analysis  tools,  and  there  is  no  reason  to  believe  that 
the  last  such  tool  has  been  found. 
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Appendix 

LACIE/ERIPS  Algorithm  Descriptions 


The  descriptions  of  algorithms  used  in  LACIE/  5.  Class  means 
ERIPS  contained  in  this  appendix  are  from  reference  Adding  field  statistics  to  a class 


STATISTICS 

1.  Field  means  for  a channel  / and  training  field/ 


Nr 


'V.I  V.  £ xf.k.i 
t k l 


where  P,  is  the  population  of  the  field,  /*,  is  the 
population  of  the  resulting  class,  and  Pt  is  the 
population  of  the  current  class. 

Deleting  field  statistics  from  a class 


where  A/, , is  the  mean  of  channel  / data  in  field  7; 
■V,» , is  the  Ath  pixel  value  for  channel  / in  field  / 
and  \,  is  the  number  of  elements  in  f.  Means  are 
computed  for  each  channel  in  the  image. 

2.  Field  covariance  matrix  element.  An  element 
of  the  covariance  matrix  for  field  7 represents  the 
covariance  between  a pair  (/../)  of  channels  for  data 
taken  over  all  elements  of  7;  These  elements  are  com- 
puted 


Mr 


b.  Class  covariances. 
Adding  a field  to  a class 
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-V  Standard  deviation.  The  standard  deviation  «r, 
of  channel  i for  field  7 is  given  by 


!J 


J'.i.i 


t. : 


4.  Correlation  matrix.  The  correlation  matrix 
(normalized  covariance)  element  is  computed  for 
each  covariance  element  for  all  fields: 


Deleting  a field  front  a class 


DIVERGENCE 

..  . »M, 

|(i  Thc  divergence  calculation  utilizes  a class  distance 

measure  found  in  Kullback's  “Information  Theory 
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and  Statistics."  For  classes  ,v  and  y in  an  //-channel  vector  or  pixel  values,  and  M,  is  the  mean  vector  for 
environment,  class  A. 


^ii\  [>>'»  <>/*] 

i«i  /*i  i 
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FEATURE  SELECTION 


Given  subclasses  / and ./with  mean  vectors  M,and 
M,,  covariance  matrices  I',  and  F,  . and  a priori 
values  (/,  and  u,  . the  Bhattaeharyya  distance  between 
subclasses  / and  j is  given  as 


Bu  • cxr 
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where  M„  - M,  - M,,  and  //  — number  of  channels 
used.  The  associated  separability  between  subclasses 
/ and ./  is 


V W ‘w</ 

CLASSIFICATION  (ERIPS) 

Maximum  likelihood  classification  evaluates  the 
quadratic  form 


CLASSIFICATION  (LACIE) 

The  density  function  used  in  the  assignment  deci- 
sion  for  classifying  pixels  to  a subclass  involves  the 
following  contribution  from  a given  subclass 


u i 

.cx,.[  ,X  Ml  t,  <X  M4  |j  ] 


where  X is  the  vector  representation  of  the  pixel  to 
be  classified,  A is  the  category  identifier,  / is  the  class 
identifier,./  is  the  subclass  identifier,  <r  is  the  a priori 
fraction  corresponding  to  category  A.  S\  is  the  num- 
ber of  classes  in  category  A.  <)  is  the  number  of 
subclasses  in  class  / of  category  A.  </is  the  dimension 
of  pixel  vector  X,  M*  (/  is  the  mean  vector  of 
subclass  (A./,/ ),  and  \\  l t is  the  covariance  matrix  of 
subclass  (A./,/). 

In  the  nominal  default  classification  to  the  catego- 
ry level,  the  following  sum  is  computed  for  each 
category  k: 


v*  li 
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The  pixel  is  then  assigned  to  the  subclass  within  that 
category  which  has  the  maximal  l\X.k,tj). 

The  subclass  assignments  are  saved,  together  with 
the  following  associated  likelihood  value: 


cm/,  - o > in  |r,|  ♦ Jjx  m,V  (r,  0(x-m,  ']  min  J.-t 
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where  jl,|  is  the  deierminant  ol  the  covariance  lor  whec  I is  a preset  tabular  conversion  factor, 

class  A.  I , 1 is  the  inverse  covariance  matrix,  X is  the  For  any  category  A for  which  classification  to  the 
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class  level  has  been  specified,  N(  sums  of  ihe  form 


t 

l\XXi)  - £ A x, *././) 
/-  i 


would  enter  the  overall  consideration  for  largest  den* 
sity  sum  as  staled  previously.  The  specific  subclass 
assignment  and  associated  likelihood  value  would  be 
derived  in  the  same  manner. 


REGISTRATION 

Two  bivariate  polynomials  relate  the  points  in  a 
reference  (output)  image  ( V.)  ) to  corresponding 
preimage  points  in  an  input  image  ((.'.  I ): 


v = F(A\n  = £ EW/ 

I 1 0 / - 0 


i-O  /- 0 


The  least  squares  method  is  used  to  compute  the 
coefficients  </„./>„  from  the  set  of  user-designated 
points  of  coincidence  in  the  input  and  reference 
images. 


ADAPTIVE  CLUSTERING 

The  adaptive  clustering  algorithm  generates 
clusters  by  cycling  once  through  all  fields  to  be 
clustered,  assigning  pixels  in  small  homogeneous 
strips  to  cluster  centers,  which  arc  continually  being 
modified  by  statistically  merging  the  pixel  strips  into 
the  clusters.  After  analyzing  all  pixels,  the  cluster 
mean  vectors  arc  frozen  and  the  data  are  passed 


again  to  assign  the  pixels  to  these  fixed  cluster  cen- 
ters generating  the  final  statistics. 

1.  Strip  formulation.  If  V,(/ ) equals  the  /th  com- 
ponent of  the  yth  vector  to  be  assigned,  and  .V  is  a 
strip  refinement  parameter,  then  the  local  group  or 

strip  is  defined  by  vectors  V/+/,  / -=  0, 1 where 

L is  the  last  5 for  which 


V//)  V+/o)|  < A 


is  valid  for  all  values  of  /.  After  generating  the  local 
subgroup,  its  mean  is  computed. 

2.  Sequential  search. 

* } refinement  parameters 

The  sequential  search  computes  the  distance  be- 
tween the  mean  of  the  local  subgroup  and  each  of  the 
cluster  means.  The  search  terminates  whenever  this 
distance  is  less  than  R3  **  M\*R\  (0  < A/1  ss  1). 
where  A/1  is  a control  parameter.  The  cluster  means 
are  searched  in  the  order  of  their  populations.  Three 
outcomes  are  possible:  (a)  the  subgroup  is  assigned 
to  the  first  cluster  for  which  the  distance  is  less  than 
R}\  (b)  the  subgroup  is  assigned  to  the  nearest 
cluster  when  the  distance  to  the  nearest  cluster  is 
greater  than  Ri  but  less  than  R ; or  (c)  the  subgroup 
is  used  to  begin  a new  cluster;  that  is,  the  distance  to 
the  nearest  cluster  is  greater  than  R After  assign- 
ment of  the  strip — cases  (a)  and  < b) — the  mean  and 
population  count  arc  updated. 

3.  Cluster  merging.  The  cluster  merging  process 
operates  by  computing  the  distance  between  the 
nearest  pair  of  cluster  means.  If  this  distance  is  less 
than  a threshold  C\  then  the  two  means  are  averaged 
into  one  The  nearest  distance  between  clusters  is 
recomputed,  and  the  merging  process  continues  until 
all  the  clusters  arc  separated  by  ( or  more.  The  merg- 
ing operation  is  performed  when  the  counter  VV/C 
of  the  number  of  clustered  points  since  the  last 
merger  exceeds  the  threshold  VI//'  (system 
parameter). 

4.  Deleting  clusters.  The  test  for  deleting  clusters 
is  made  when  the  counter  <V£(’exceeds  the  threshold 
NET  (system  parameter);  NEC  is  the  number  of 
clustered  points  since  the  last  deletion  process.  All 
clusters  with  less  than  N.\flN\  points  (system 
parameter)  are  deleted. 
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ITERATIVE  CLUSTERING 

The  Iterative  clustering  algorithm  generates 
clusters  by  cycling  a variable  number  of  times 
through  all  fields  to  be  clustered  On  each  pass 
through  the  data,  the  cluster  centers  are  fixed  and  all 
pixels  now  in,  small  homogeneous  strips  are  assigned 
to  the  fixed  clusters  on  a closeness  basis,  the  pixels 
assigned  to  a certain  cluster  form  new  statistics  for 
the  cluster  to  be  held  as  fixed  for  the  next  pass.  Only 
between  passes  may  the  number  of  clusters  fluctuate. 
Fewer  clusters  may  result  if  “combine"  logic  is  exer- 
cised; more  if  “split”  logic  is  performed  Either  split 
or  combine  is  done  after  each  pass  until  the  last  pass 
which  determines  final  statistics. 

1.  Cluster  splitting.  In  splitting  a cluster,  the  chan- 
nel with  the  largest  variance  (aj)  is  determined  If 
the  standard  deviation  a,  exceeds  the  threshold  T\ 
(system  parameter),  the  cluster  is  split  along  channel 
J alone  into  two  subclusters.  Assuming  an  Akhannel 

vector  space,  let  M,,  / — I N denote  the  mean 

vector  for  the  initial  cluster;  Ml,,  /*- 1, . . . , A/ denote 
the  mean  vector  for  the  first  subcluster;  and  M2,.  / » 
I, .'  . . , N denote  the  mean  vector  for  the  second 
subcluster.  SEP  denotes  a user-specified  system 
parameter  defining  the  separation  of  the  new  cluster 
means  from  that  of  the  original  cluster.  Then  the 
splitting  process  generates  the  two  subclusters  M 1 
and  M2  in  a manner  such  that 


Ml,  = Af,  /*/ 


taace  computations  and  thresholding  continue  until 
all  the  original  clusters  are  tested. 


CLUSTERING  REPORT  FUNCTIONS 

1.  Intercluster  Distance  Report  uses  the  following 
distance  formulas  to  calculate  the  distance  between 
each  pair  of  clusters; 

Adaptive  ’ 

For  clusters  / and  J and  channel  L 


' u 


NC 

E I«h.  "A 

L~l 


where  NC  is  the  number  of  channels. 
Iterative 


U=t 


CII°JL 


where  M is  the  mean  and  «x  is  the  standard  deviation. 

2.  Cluster  match  routine.  The  nearest  subclass 
defined  in  the  subclass  statistics  table  to  a cluster  is 
determined  by  using  the  match  formula 


Ml,  *Mt*SEP-Oj  /*/ 

M2,  = JW,  /*/ 

M2,  = Aff  - SEP'Oj  i-f 

2.  Cluster  combining.  On  a combining  iteration, 
each  cluster  is  limited  to  combining  with  at  most  one 
other  cluster.  The  process  begins  with  computing  the 
weighted  distance  between  a cluster  and  each  of  the 
remaining  cluster  means.  When  the  weighted  dis- 
tance is  less  than  a threshold  T 2,  the  two  respective 
means  are  averaged  (weighted  average)  together.  The 
mean  averaging  effectively  combines  two  clusters 
into  one  cluster  for  the  next  pass  of  the  data.  Hie  dis- 


0.y  -..V* 

L?1  °n.°Ni 


where  Mtt  is  cluster  / mean  for  channel  /..  MSJ  is 
the  mean  of  the  Mh  subclass  in  channel  /..  and  «r„  is 
the  standard  deviation.  The  distance  is  calculated 
over  all  channels  and  the  nearest  subclass  is  found. 
The  final  distance  is  the  square  root  of  the  smallest 
D.  (Note:  The  clustering  algorithms  described  here 
are  those  originally  implemented  on  the  IBM  360/75 
Some  changes  have  been  made  to  take  advantage  of 
the  parallel  SPP  processing,  but  the  effect  remains 
the  same.) 
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8 UN  ANGLE  CORRECTION 

If  the  user  has  selected  the  Sun  angle  correction 
option,  the  subclass  means  m and  covariance 
matrices  I'  will  be  modified  as  follows: 


m'1*  A{h  + A2 

r'«/tlr/t,r 

where  n'  is  the  modified  mean  of  the  subclass,  m is 
the  mean  of  the  subclass,  r is  the  modified 
covariance  matrix,  I'  is  the  covariance  matrix  of  tl.d 
subclass,  and  A\  and  /f2  are  diagonal  constant 
matrices  the  elements  of  which  are  functions  of  the 
training  segment  and  recognition  segment  Sun 
angles. 


MEAN  LEVEL  ADJUSTMENT 

If  mean  level  adjustment  has  been  specified  by  the 
user,  the  following  computations  must  be  performed. 
Compute  the  mean  vector  p,  for  the  segment  / to  be 
processed,  and  the  mean  vector  for  the  training  seg- 
ment J.  If  Sun  angle  correction  for  J has  not  been 
performed,  then  compute 

My  sAltij  + /1 2 

The  mean  level  adjustment  vector  is  then 
AM  * My  ~ My 


and  the  resultant  corrected  mean  vector  is 
M;*M/  Am 

BIAS  CORRECTION 

The  following  indicates  the  calculations  used  m 
bias  correction. 


(Jiitiimiv  Mt'aninn 

l>  The  total  number  of  blai  correction  categories. 

H The  number  of  pixels  considered.  For  classification 

bias  correction,  this  is  the  total  number  of  pixels 
used  in  classification,  minus  the  number 
thresholded-out.  minus  the  number  in  the  catego- 
ry under  consideration  ("A").  For  cluster  bias 
correction,  it  is  the  number  of  pixels  in  clusters 
not  labeled  "A.” 

( The  label  of  the  «th  bias  correction  category. 

\ The  number  of  pixels  classified  (for  classification 

1 bias  correction)  or  clustered  (for  cluster  bias  cor- 

rection) into  category  ( 

V-H  The  number  of  pixels  classified  (or  clustered)  into  a 
’ category  other  than  “A  ” or  the  bias  correction  set. 

n The  number  of  bias  correction  vectors  classified  (or 

clustered)  into  category  ( </. 

«.)+l  The  number  of  bias  correction  vectors  classified  (or 

clustered)  into  a category  other  than  "A”  or  the 
bias  correction  set. 

in  The  total  number  of  bias  correction  vectors. 

mq , The  number  of  bias  correction  vectors  labeled  < (/  by 

’ the  user  and  classified  (or  clustered)  into  the  ith 

bias  correction  category. 

m /(+|  The  number  of  bias  correction  vectors  labeled  ( v by 
the  user  and  classified  (or  clustered)  into  a 
category  other  than  “A"  or  the  bias  correction  set. 

■I  f w , divided  by  u,  <0  if  n , - 0).  Here,  i varies  from  I 

to  />  + I. 

I\ ( ) Bias  corrected  classified  (or  cluster)  percentage  for 

category  r. 


A*  I 

P (C  ) = IOo£  {NJB)  (v) 
r=l 


l\(  /l4|)  Bias  corrected  classified  (or  cluster)  percentage  for 
categories  other  than  “A"  or  the  bias  correction 
categories. 

P 

r^t)  =ioo  E'W 

<pi 


Beta  value  lor  category  < 


V (loaves)  -e(c,) 
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Quantity 


Meaning 


Quantity 


Meaning 


V*'  The  nh  component  nl  ihc  variance  for  category  ( tu  i , r , 

</  varies  from  I to  /.  + i);  o if  n,  < 2.  Otherwise!  " ^ c«e«  « ^ RCd  (or  clus,er)  Percentage  for 

'-■N ’(■,,)('  ",,)/(»,  ■)  o,c,|.v^a 

‘ ‘"U  <(>  Variance  for  category  < „ (unreliable  in  any  < 2)  , , , . 

Uncorrected  classifted  (or  duster)  percentage  for 
categories  other  than  “A“  or  the  bias  correction 
categories. 


/P  + 1 

Vm  (c«)  \.a 


u< cp.,I  -"> <wp*,/a 


i / Bi#s  Percentage  range.  The  lower  value  is 


<"IH>  Percentage  classified  "designated  other"  (classifica- 
tion bias  correction  only). 


and  the  upper  value  is 


V 


%DO  - 1 00  X number  of  DO  pixels  + B 


Uncorrected  percentage  of  unidentifiable  pixels 
(classification  bias  correction  only). 

flf»£ ) “ 100  x (number  of  pixels  classified  IH 
+ number  of  pixels  thresholded-out 
+ number  of  pixels  classified  or 
clustered  into  category  “A")  -t  22  922 
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Considerations  for  Design  of  Future  Research  and 
Development  Interactive  Image  Analysis  Systems 

T.  B.  Wilkinson a 


INTRODUCTION 

The  Earth  Observations  Division  (EOD)  at  the 
NASA  Johnson  Space  Center  (JSC)  will  be  shir,<ng 
emphasis  from  the  quasi-production  programs  sue 
as  LACIE  to  a more  basic  research  and  development 
(R&D)  role  to  provide  the  necessary  technology  for 
the  1980's.  Current  workload  forecasts  for  future 
programs  show  significant  increases  in  the  amount 
of  imagery  data  that  will  have  to  be  processed  and 
analyzed. 

An  interactive  approach  to  image  analysis  pro- 
vides the  responsiveness  and  adaptability  required 
by  the  multiuser,  multidiscipline  programs  of  the 
1980's.  An  interactive  system  puts  the  “man  in  the 
loop"  and  provides  for  almost  immediate  human 
judgment  decisions  relative  to  spatial  data.  A human 
observer  typically  makes  many  decisions  during  an 
interactive  session  based  on  his  visual  perception. 
The  same  decisions,  however,  may  become  cumber- 
some to  make  by  way  of  machine  processing.  When 
cost,  flexibility,  and  throughput  are  considered,  it  is 
apparent  that  the  interactive  image  analysis  approach 
best  meets  future  requirements.  To  meet  these  re- 
quirements, the  Earth  observations  interactive  image 
analysis  capability  must  be  significantly  enhanced. 

The  design  of  future  interactive  image  analysis 
systems  must  consider  the  changing  nature  of  the 
problem.  An  R&D  environment  requires  a highly 
flexible  system  as  opposed  to  the  more  limited  flex- 
ibility of  a production  system. 

Design  considerations  must  include  the  increased 
processing  requirements  imposed  by  the  addition  of 
a thermal  channel  to  Landsat-3  and  the  increased 
number  of  spectral  channels  with  significantly  high- 
er spatial  resolution  provided  by  the  Landsai-D 
thematic  mapper.  Other  design  considerations  must 
include  the  rapidly  changing  technology  in  memories 


"Lockheed  Electronics  Company.  Houston,  Texas. 


and  special-purpose  processors.  The  analyst-machine 
interface  and  the  human  factors  involved  are  often 
overlooked;  however,  they  are  considered  to  be  sig- 
nificantly important  for  future  systems. 

Consideration  of  these  and  other  factors  has 
evolved  to  a basic  conceptual  approach  for  the  design 
of  future  image  analysis  systems  for  the  1980's. 


CURRENT  ENVIRONMENT  IN  EOD 

The  JSC  overall  capability  for  processing  and 
analyzing  remotely  sensed  data  is  currently  based  on 
a number  of  special-purpose  stand-alone  systems. 
Each  of  these  hardware/software  systems  was  ini- 
tially implemented  to  provide  some  aspect  of  the  fast 
growing  technology  for  processing  and  application  of 
remotely  sensed  Earth  resources  data.  The  two  exist- 
ing interactive  image  analysis  systems  at  JSC  are 
categorized  as  special-purpose  stand-alone  systems. 

The  two  systems  are  the  Earth  Resources  Interac- 
tive Processing  System  (ERIPS)  and  the  Image-100 
system.  ERIPS.  a multiuser  system,  provides  both  a 
batch  and  an  interactive  capability.  ERIPS  provides 
the  analyst  with  2 high-resolution  black-and-white 
displays  (16  shades  of  gray)  and  their  associated  con- 
versational monitors  for  interactive  control  via  a 
command/prompt  menu  structure.  The  analyst  is 
also  provided  with  2 color  displays  providing  a max- 
imum of  64  colors.  Color  imagery  display  and  control 
are  also  provided  via  a conversational  monitor.  The 
basic  image  analysis  hardware  is  a modified  digital 
television  (TV)  equipment  (DTE)  cluster  originally 
designed  for  display  and  control  applications  in  the 
JSC  Mission  Control  Center.  The  computational 
capability  for  ERIPS  is  provided  by  an  IBM  360-75 
computer  and  a STAR  AN  special-purpose  processor. 
Large-volume  storage  required  by  the  imagery  and 
ancillary  data  base  is  provided  by  as  many  as  42  high- 
density  disk  drives.  ERIPS  uses  a large  number  of 
special-purpose  software  modules  to  provide  the 
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clustering,  classification,  and  display  manipulation 
capabilities  required  by  LACIE.  ERIPS  is  the  pri- 
mary LACIE  “production”  system  and  is  designed  to 
achieve  a high  throughput  rate.  The  software  used  by 
ERIPS  for  LACIE  production  is  primarily  structured 
for  one  specific  set  of  tasks  or  algorithms  and  is  cum* 
bersome  to  use  as  an  R&D  mol.  The  basic  hardware 
capabilities  of  ERIPS  are  limited  when  compared  to 
more  current  systems  designed  specifically  for  in* 
teractive  image  analysis. 

The  other  major  stand-alone  image  analysis 
system  is  the  General  Electric  Image-100.  The  Im* 
age-100  computational  capability  is  provided  by  a 
Digital  Electronics  Corporation  (DEC)  programed 
data  processor  (PDP)  11-45  computer.  The 
Image-100  is  a 5-channel  interactive  system  that  pro- 
vides a color  display  of  512  by  480  picture  elements 
(pixels)  with  as  many  as  256  intensity  ic -els  per 
channel.  The  normal  configuration  is  to  use  four 
channels  for  video  and  the  fifth  channel  for  graphics. 
The  fifth  channel  consists  of  eight  1-bit  graphics 
planes  and  therefore  provides  eight  graphics  or 
theme  tracks.  The  lmage-100  provides  some  of  the 
more  sophisticated  display  manipulation  capabilities 
not  found  in  ERIPS  but  does  not  provide  the 
throughput  rate  required  for  production.  The  Im- 
age-100  has  had  only  limited  use  in  LACIE  and  has 
been  more  of  an  R&D  tool  in  LACIE.  The  lmage-100 
has  been  heavily  used  in  the  development  of  new 
procedures  or  techniques,  such  as  an  interactive 
maximum  likelihood  classification  procedure.  The 
lmage-100  is  also  being  used  for  other  programs  such 
as  the  Forestry  Applications  Program  (FAP)  and  the 
Regional  Applications  Project  (RAP).  FAP  and 
RAP  are  essentially  pilot  projects  with  heavy 
emphasis  on  R&D  and  have  no  major  production  re- 
quirements. Although  providing  a configuration 
more  conducive  to  R&D,  the  lmage-100  is  only  a 
single-user  system  and  cannot  easily  be  expanded  to 
accommodate  the  increased  number  of  spectral  chan- 
nels that  will  be  used  in  Landsat-3  and  in  the  Land- 
sat-D  thematic  mapper.  The  lmage-100  is  also 
limited  in  its  computational  power;  thus,  clustering 
and  classification  processes  are  time-consuming. 

In  summary,  the  two  existing  interactive  systems 
are  adequate  for  their  current  tasks,  but  this  will  not 
be  the  case  for  the  1980’s. 


FUTURE  ENVIRONMENT  IN  EOD 

Forecasts  for  the  next  10  years  indicate  that  six 
programs  will  comprise  the  major  workload:  LACIE, 


LACIE  Transition,  FAP,  RAP,  the  Food  Multicrop 
Program,  and  the  Joint  Soil  Moisture  Experiment. 
Starting  with  fiscal  year  1979,  these  programs  will  be 
integrated  into  one  unified  program:  the  Global  Food 
and  Fiber  Information  System. 

At  first  glance,  it  may  not  appear  that  the 
workload  is  increasing  significantly,  but  this  is  far 
from  being  the  case.  The  Landsat-3  will  have  an  addi- 
tional channel  for  thermal  data,  whereas  the  Land- 
sat-D  thematic  mapper  will  also  have  the  thermal 
channel  and  an  additional  spectral  channel  or  possi- 
bly even  two  additional  spectral  channels.  The  sig- 
nificant change  in  the  Landsat-D  thematic  mapper 
will  be  increased  resolution— almost  three  times 
greater  than  the  resolution  of  the  current  Landsat.  If 
the  number  of  Landsat  acquisitions  is  assumed  to  re- 
main constant,  the  increase  in  the  number  of  chan- 
nels and  resolution  represents  a sevenfold  increase  in 
data  alone.  It  should  be  rather  obvious  that  the  batch 
processing  and  manual  photograph  interpretation  of 
film  segments,  such  as  in  the  LACIE  program,  could 
not  accommodate  the  future  environment.  The 
analyst  of  the  1980's  must  be  provided  with  a more 
efficient  means  of  performing  his  work,  and  the  im- 
plementation of  a highly  flexible  interactive  image 
analysis  system  will  provide  the  necessary  means. 

A future  interactive  image  analysis  system  must 
provide  a multiuser,  multiprogram  capability.  It  must 
be  a totally  integrated  system  with  all  users  having 
full  capability  to  acquire  common  imagery  data  bases 
and  to  perform  clustering  and  classification  pro- 
cedures quickly.  Th<  system  should  provide  the 
capability  to  perform  R&D  work  and  to  handle 
quasi-production  work  efficiently.  The  hardware  and 
software  must  not  be  program-dependent;  i.e.,  one 
en.iysis  console  should  not  be  for  LACIE,  another 
for  FAP,  another  for  RAP,  etc.  The  system  should 
be  capable  of  handling  the  variable  size  data  sets  of 
the  various  programs  and  should  allow  the  analyst  to 
select  parametric  classification  techniques  such  as 
maximum  likelihood  and  mixture  density  or  non- 
parametric  techniques  such  as  parallelepiped,  deci- 
sion tree,  and  table  look-up.  The  future  environment 
wilt  be  one  in  which  virtually  all  image  analysis  will 
be  performed  interactively  with  the  key  considera- 
tions being  flexibility  and  speed. 

It  is  projected  that  future  programs  will  have  a 
much  higher  involvement  by  the  acaiemic  com- 
munity. The  increased  involvement  will  probably  in- 
clude the  use  of  JSC  facilities  by  members  of  the 
academic  community.  The  future  interactive  image 
analysis  systems  must  be  designed  to  accommodate 
the  divergent  backgrounds  of  the  users.  Future 


systems  must  have  a simplified  analyst-machine  in- 
terface so  that  the  user  may  work  with  the  system 
after  minimal  training.  It  should  be  possible  for  the 
user  who  has  little  computer  experience  to  work 
effectively  on  digital  image  processing  and  analysis 
tasks. 


IMPROVED  TECHNOLOGY 
Microprocessors 

Some  of  the  recent  image  analysis  system  designs 
are  using  microprocessors  as  . n integral  part  of  the 
system.  Microprocessors  such  t s the  LSI-1 1 are  being 
used  to  provide  interactive  control  and  processing 
and  thereby  to  free  the  host  computer  for  more  com- 
plex operations  such  as  classification  and  clustering. 
Microprocessors  are  also  being  used  as  the  nucleus  of 
an  image  array  processor  in  which  entire  images  in- 
stead of  single  pixels  can  be  manipulated  and  dis- 
played arithmetically  at  video  frame  rates.  It  appears 
that  the  relatively  low  cost  of  microprocessors, 
coupled  with  faster  memory  cycle  times,  will  make 
the  microprocessor  extremely  attractive  for  use  in 
future  image  analysis  systems. 

Memory 

Perhaps  the  mcs!  significant  improvements  have 
been  in  the  area  of  refresh  memory  devices.  Initially, 
most  image  analysis  systems  used  video  disks  to 
refresh  the  display.  The  electromechanical  video 
disks  normally  rotated  at  1800  or  3600  rpm,  which 
provided  a memory  latency  of  33  or  16.6  milli- 
seconds. In  addition  to  being  slow,  the  memory  was 
subject  to  wear  and  alinement  problems.  The  vi  leo 
disk  refresh  memory  typically  uses  fixed  read/write 
heads  which  "fly"  a few  microinches  above  the  disk 
surface.  The  heads  have  a finite  life  and  must  be 
replaced  periodically. 

The  next  major  improvement  in  refresh  memory 
was  the  use  of  solid-state  memory  for  the  charge- 
coupled  device  (CCD).  The  CCD  memory  is  a serial 
shift  register  and  configuration  designed  to  be  plug- 
compatible  with  disk  refresh  memory  systems.  The 
CCD  refresh  memory  overcomes  the  problems  of  an 
electromechanical  device  but  still  has  the  same 
memory  latency  characteristics. 

The  current  state  of  the  art  is  the  random  access 
memory  (RAM),  which,  like  the  CCD.  is  a solid- 
state  device  but  does  not  have  the  latency  con- 
straints. RAM  refresh  memory  is  very  high  speed 


with  access  times  typically  Iras  than  500  nano- 
seconds. The  cost  of  RAM's  was  initially  quite  high; 
however,  costs  have  continued  to  decrease  with  in- 
creased availability  to  the  point  that  RAM  refresh 
memory  is  on  the  order  of  0.5  cent  per  bit  and  is  pro- 
jected to  go  to  0.2  cent  per  bit  within  the  next  year  or 
two.  RAM  refresh  memories  are  rapidly  approaching 
the  point  at  which  the  card,  connectors,  and 
switches,  rather  than  the  memory  chips,  become  cost 
considerations. 

Display  Daviess 

For  many  years,  the  limiting  factor  in  an  image 
analysis  system  nas  been  the  color  cathode-ray  tube 
(CRT).  The  conventional  shadow  mask  color  CRT 
has  improved  over  the  years  in  terms  of  brightness 
and  more  consistent  colorimetry  because  of  the  new 
phosphors;  however,  spatial  resolution  has  remained 
about  the  same.  The  resolution  limitations  posed  by 
the  shadow  mask  design  have  prompted  other  ap- 
proaches to  high-resolution  color  such  as  the 
multilayer  beam  penetration  tube.  The  beam 
penetration  CRT  involves  the  switching  of  different 
anode  voltages  to  excite  a particular  phosphor  layer. 
This  approach  has  had  only  limited  success  because 
of  the  problems  encountered  when  switching  15-  to 
20-kilovolt  pulses. 

The  biggest  improvement  has  been  the  introduc- 
tion of  a high-resolution  shadow  mask  color  CRT.  A 
gravure  quality  shadow  mask  CRT  with  a 0.3- 
millimeter  pitch  (triad  spacing)  is  currently  being 
produced.  This  shadow  mask  provides  approx- 
imately five  times  the  dot  density  of  conventional 
shadow  mask  CRT's  and  is  being  used  to  provide 
1024-  by  1024-pixel  color  displays.  The  closer  spacing 
of  the  color  triads  also  provides  a display  free  from 
moire  patterns. 

Monitor  circuitry  has  also  been  improved  to  keep 
pace  with  the  improved  CRT's.  This  improvement 
has  included  the  use  of  individual  operational 
amplifiers  in  the  convergence  circuitry  to  minimize 
interaction  and  to  increase  stability.  A major  im- 
provement has  been  in  the  area  of  stabilizing  color 
temperature  by  a beam-controlled  feedback  circuit, 
which  automatically  adjusts  the  monitor  color  tem- 
perature to  a fixed  reference  during  each  vertical 
blanking  interval. 

ANALYST-MACHINE  INTERFACE 

The  LACIE  has  provided  only  a limited  amount 
of  experience  with  interactive  image  analysis 
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systems;  however,  the  knowledge  gained  is  sufficient 
to  provide  guidelines  Tor  future  systems.  The  analyst- 
machine  interface  appears  to  be  an  area  that  is  fre- 
quently overlooked  or  that  is  given  a relatively  .ow 
priority  in  the  overall  systems  considerations.  This 
interface  must  be  carefully  considered  in  future 
system  designs  because  lack  of  attention  to  this 
aspect  will  adversely  affect  both  the  productivity  and 
accuracy  of  the  image  analysis  and  the  classification 
process. 


Human  Factor* 

Perhaps  the  most  frequently  expressed  concern 
for  future  interactive  systems  is  in  the  area  of  human 
factors.  The  analyst  is  not  only  concerned  with  the 
implementation  of  the  imagery  display,  the  ancillary 
information  display,  and  the  interactive  controls  but 
also  with  the  work  environment.  Consideration 
should  be  given  to  reducing  the  fatigue  of  an  analyst, 
who  may  typically  work  at  an  interactive  display 
console  for  periods  of  a*  long  as  8 hours.  Viewing 
conditions  are  of  paramount  concern  and  will  be  ad- 
dressed first. 

The  television  broadcasting  industry  has  long 
been  concerned  wi,h  viewing  conditions  in  control 
rooms  where,  for  critical  evaluation  of  picture 
quality,  all  factors  affecting  the  perception  of  colors 
and  brightness  should  be  closely  controlled.  The 
analysis  of  multispectral  sensor  data  usually  involves 
the  display  of  the  data  in  pseudocolor  or  false  color 
rather  than  the  true  color  used  in  the  broadcast  in- 
dustry. The  maintenance  of  the  colors  and  brightness 
perceived  by  the  analyst  is  equally  as  critical  because 
of  the  decisions  that  must  be  made  in  the  classifica- 
tion process  based  on  the  color  perceived.  The  Cana- 
dian Broadcasting  Corporation  has  proparsd  several 
papers  dealing  with  the  control  room  environment 
which  have  been  published  in  the  Journal  of  the 
Society  of  Motion  Picture  and  Television  Engineers 
(ref.  I).  The  recommendations  made  are  largely  ap- 
propriate for  consideration  in  the  design  of  future  in- 
teractive image  analysis  systems.  Some  of  the  perti- 
nent considerations  are  as  follows. 

1 . Chroma ticity  of  color  picture  monitor  screen  at 
reference  white:  The  screen  chromaticity  at 
reference  while  should  be  D6500  (X  "0.31 3, 
Y -0.329)  with  a tolerance  of  ±200  K along  the 
daylight  locus. 

2.  Luminance  of  color  picture  monitor  screen  at 
reference  white:  The  screen  luminance  at  reference 
white  should  be  20  ± 2 footlamberts. 


3.  Viewing  distance:  The  viewing  distance  re- 
quirements differ  from  those  of  the  broadcasting  in- 
dustry and  will  vary  depending  on  the  type  of 
analysis  being  performed. 

a.  In  general,  the  analyst  should  be  positioned 
so  as  to  view  the  display  from  a distance  of  not  less 
than  one  nor  greater  than  four  times  the  height  of  the 
monitor  picture. 

b.  The  analyst  should  be  placed  so  that  his 
angle  of  view  is  no  greater  than  30*  from  a line  nor- 
mal to  the  face  of  the  monitor. 

4.  Light  surround:  Light  surround  is  defined  as 
the  light,  visible  to  the  analyst,  from  a plane  or  from 
behind  a plane  coincident  with,  and  surrounding  but 
not  including,  the  viewing  screen.  Light  surround  re- 
quirements are  as  follows. 

a.  Light  surround  should  be  provided  outside 
the  monitor  screen  mask  and  over  an  area  at  least 
eight  times  the  area  of  the  monitor  screen. 

b.  Light  surround  should  have  a luminance  of 
3 ± 1 footlamberts. 

c.  Light  surround  should  have  a chromaticity 
matching  the  color  monitor  screen  reference  white, 
thus  providing  a fixed  color  reference  for  the  analyst. 

5.  Monitor  screen  mask:  The  monitor  screen 
should  be  framed  by  a narrow  black  matte  mask. 

6.  Analysis  console  room  decor.  The  viewing 
room  in  which  the  analysis  console  is  located  should 
have  a decor  that  gives  a generally  matte  impression 
without  the  use  of  dominant  colors. 

a.  The  ambient  light  on  the  monitor  screen 
should  be  kept  to  the  lowest  possible  level.  Specular 
reflections  must  be  avoided. 

b.  Light  sources  within  the  room  should  be  of 
a similar  color  temperature  to  that  of  the  color  moni- 
tor reference  white. 

c.  Desk  surfaces  used  by  the  analyst  should  be 
illuminated  with  lighting  of  the  cool  white  fluores- 
cent type  and  adjusted  so  that  the  luminance  of  white 
paper  on  the  desk  falls  between  the  limits  of  6 to  10 
footlamberts. 

d.  The  desk  surfaces  should  give  a generally 
neutral  matte  impression  without  the  use  of  domi- 
nant colors. 

Another  problem  often  encountered  by  the 
analyst  is  that  of  noise.  The  efficiency  and  accuracy 
of  an  analyst  is  greatly  diminished  when  in  a noisy 
environment  for  significant  periods  of  time.  Back- 
ground noise  sources  generally  fall  into  two  basic 
categories,  machine-induced  and  human.  Machine- 
induced  noise  sources  may  include  airconditioning 
systems,  blowers,  and  computer  peripherals  such  as 
line  printers  and  disks.  Human  noise  sources  include 
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conversations  or  other  workers,  telephone  converse* 
tions,  and  general  office  noise. 

It  is  recommendea  that  the  analysis  console  be 
placed  in  a room  sufficiently  isolated  from  noise 
sources  so  that  a noise  criterion  of  no  greater  than 
NC*3S  can  be  obtained.  The  NCOS  level  is  based  on 
studies  performed  by  Beranek  (ref.  2). 

The  analyst  should  have  control  over  the  tem- 
perature of  the  room  in  which  the  analysis  console  is 
located  so  that  he  may  adjust  it  to  suit  his  personal 
preference  without  affecting  other  areas. 


Simplified  Operator  Interaction 

A second  major  consideration  in  the  analyst- 
machine  interface  is  the  simplification  of  operator 
interaction.  In  looking  at  future  RAD  interactive  im- 
age analysis  system  users,  one  sees  a more  diverse 
group  representing  numerous  disciplines  with  heavy 
involvement  from  the  academic  community.  As  the 
multiuser,  multiprogram  environment  projected  for 
the  1980's  approaches,  it  becomes  apparent  that  the 
interactive  image  analysis  system  must  be  a “tooi" 
which  can  be  easily  used  with  a minimum  amount  of 
training.  The  interactive  image  analysis  system 
should  not  require  the  analyst  to  have  extensive  pro- 
graming skills  or  a detailed  knowledge  of  computer 
architecture.  The  interaction  between  the  analyst  and 
the  system  should  not  require  the  typing  of  each 
operator-initiated  command.  The  use  of  typed  com- 
mands should  only  be  required  in  certain  special 
cases  such  as  program  development. 

A more  simplified  approach  to  operator  interac- 
tion is  the  use  of  an  interactive  processing  monitor  to 
govern  all  terminal  interactive  processing.  In  es- 
sence. the  analyst  should  have  a monitor  at  each  dis- 
play console  that  presents  a menu  from  which  the 
analyst  can  select  options  and  control  the  processing 
sequence.  The  monitor  on  which  the  menu  is  dis- 
played should  be  considered  as  a conversational 
monitor  because  it  will  prompt  the  analyst  and  serve 
as  the  primary  communication  link  with  the  image 
analysis  system.  The  conversational  monitor  should 
be  either  of  the  storage  tube  type  or  a high-resolution 
raster  scan.  The  conversational  monitor  is  primarily 
an  alpi:  inumeric  (A/M  display,  and  special  atten- 
tion must  be  given  to  the  display  if  a raster  scan 
system  is  employed.  Flicker  becomes  a problem  in  a 
standard  system  of  52$  lines  per  frame  and  30  frames 
per  second  because  of  the  high-contrast  display 
(typically  black  and  white)  and  the  rate  at  which  the 
display  is  refreshed.  Flicker  can  be  reduced  to  an  ac- 


ceptable level  by  employing  a repeat-field  display  so 
that  a given  pixel  is  repeated  in  both  fields  (odd  and 
even),  thus  providing  a 60-hertz  refresh  rate  rather 
than  a 30-hertz  refresh  rate.  A repeat-field  approach, 
although  attractive  from  the  flicker  reduction  stand- 
point, means  that  vertical  resolution  must  be 
sacrificed.  A repeat-field  display  has  one-half  the 
vertical  resolution  of  a non-repeat-field  display  so' 
that  if  there  are  480  active  lines  in  a 525-line  display, 
one  would  have  a vertical  resolution  of  only  240 
lines,  which  is  insufficient  for  the  display  of  small 
alphanumeric  characters.  It  is  therefore  recom- 
mended that  a repeat-field  display  of  1024  lines  per 
frame  be  used  because  it  has  significantly  reduced 
flicker  and  the  necessary  vertical  resolution. 

The  analyst  should  have  the  capability  to  select 
menu  option*  via  such  devices  as  a graphacon  ublet, 
a trackball,  a lightpen,  a joystick,  or  similar  devices. 
The  menu  should  have  a hierarchical  processing 
structure  that  uses  monitor  programs,  processing 
programs,  and  subprograms.  The  system  should  in- 
clude an  interactive  processing  monitor  that  controls 
menu  generation,  interrogation  and  editing,  and  im- 
age display  and  manipulation. 

The  design  of  the  system  should  allow  the  analyst 
to  override  the  computer  control  of  the  processing 
and  image  display  and  manipulation  from  the  con- 
sole. All  manual  cont/ols  provided  to  an  analyst 
should  provide  a positive  indication  of  their  status 
such  as  pushbutton  indicators  which  illuminate 
when  depressed.  Visual  indication  must  also  be  pro- 
vided at  the  console  to  ensure  that  the  analyst  knows 
whether  he  is  operating  under  computer  control  or  in 
the  manual  mode. 

Any  action  initiated  by  the  analyst  should  result 
in  some  positive  visual  indication  within  15  seconds 
that  the  command  has  been  accepted  and  that  proc- 
essing is  in  progress.  If  the  system  is  “busy”  and  can- 
not process  the  request  entered  by  the  analyst,  there 
should  be  a visual  indication  that  the  command  has 
been  accepted  but  that  processing  will  be  delayed. 


CONCEPTUAL  SYSTEM  DESIGN 

The  development  of  an  optimum  design  con- 
figuration for  interactive  image  analysis  would  re- 
quire extensive  modeling  of  each  candidate  con- 
figuration At  this  point  in  time,  the  “hard”  require- 
ments for  a future  interactive  analysis  system  are  not 
sufficient  to  allow  any  meaningful  modeling.  The 
main  emphasis  of  this  paper  is  to  present  certain  con- 
cepts that  should  be  addressed  rather  than  a detailed 
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design  of  such  a system.  Two  basic  system  sp* 
proaches  that  illustrate  these  design  concepts  are  de- 
scribed in  the  following  subsections:  a “centralized 
system"  and  a "distributed  system."  Several  factors 
suggested  their  consideration:  the  projected 
workload,  the  probable  availability  of  a large-scale 
computer  such  as  an  IBM  360-75,  the  number  of  im- 
age analysis  terminals  designed  to  interface  with 
midrange  computers  such  as  the  PDP 1 1-70,  and  the 
evaluation  of  existing  image  analysis  systems. 

Studies  performed  by  the  MITRE  Corporation 
(ref.  3)  indicate  that  a maximum  of  12  image  analysis 
terminals  would  be  required  in  the  future.  However, 
an  initial  configuration  of  six  image  analysis  ter- 
minals would  meet  the  workload  requirements  of  the 
early  19S0's.  The  conceptual  approaches  are 
therefore  based  on  a 6-image-analysis-terminal 
(1AT)  configuration  with  possible  expansion  to  as 


many  as  12  image  analysis  terminals.  The  conceptual 
approaches  also  assume  the  availability  of  the 
STAR  AN  parallel  processor  or  equivalent  device. 


Centralized  Syetem 

The  centralized  system  concept  as  shown  in  figure 
1 employs  a direct  interface  between  the  large-scale 
computer  (IBM  360-75  class)  and  the  image  analysis 
terminals.  In  this  configuration,  the  image  analysis 
terminals  are  connected  directly  to  the  selector  chan- 
nels via  model  2701  data  adapter  units.  The  mass 
dau  storage  facility  and  the  STAR  AN  processor  are 
interfaced  to  the  selector  channels  with  the  low- 
speed  peripherals  connected  to  the  multiplexer  chan- 
nels. Each  selector  channel  (SC)  is  capable  of  max- 
imum data  rales  ranging  from  1.3  to  i .85  million 


FIGURE  I.— C«*trsUwS  lyttm. 
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bytes  per  second.  Each  selector  channel  attaches  as 
many  as  8 input-output  control  units  and  can  address 
as  many  as  256  input-output  devices.  Only  one  input- 
output  device  per  selector  channel  can  transmit  data 
at  any  given  time,  and  no  other  input-output  device 
on  the  channel  can  transmit  data  until  all  data  are 
handled  for  the  selected  device.  The  centralized 
system  could  easily  interface  to  as  many  as  12  image 
analyst?  terminals;  however,  system  throughput 
becomes  a matter  of  concern.  Although  selector 
channel  bandwidth  appears  to  be  adequate,  a single 
central  processing  unit  (CPU)  is  used  to  perform  all 
computations,  to  control  data  movement  to  and  from 
the  array  processor,  and  to  control  data  movement  to 
and  from  the  image  analysis  terminals.  Normal  data 
movement  on  the  selector  channels  does  not  appear 
to  be  a problem,  but  the  situation  in  which  two 
analysis  terminals  may  simultaneously  request  max- 
imum likelihood  classification  or  any  other  time-con- 
suming computational  process  may  affect  the  in- 
teractive capability  of  the  system.  It  may  well  be  that 
this  configuration  would  be  adequate  to  provide  the 
desired  interactive  image  analysis  capability;  but  this 
adequacy  can  only  be  determined  by  development  of 
an  accurate  model  of  the  projected  workload,  which 
is  beyond  the  scope  of  this  paper. 

The  centralized  configuration  offers  the  following 
attractive  features. 

1.  It  is  straightforward. 

2.  It  is  the  least  expensive  to  implement. 

3.  All  software  is  in  one  computer. 

4.  Lower  maintenance  and  operational  costs  are 
involved. 

Some  of  the  disadvantages  of  the  configuration  in- 
clude the  following. 

1.  No  stand-alone  capability  exists;  it  is  com- 
pletely dependent  on  main  CPU  and  associated  com- 
munications channels. 

2.  Software  modifications  are  required  as  image 
analysis  terminals  are  added. 

3.  Loading  may  slow  system  throughput  to  the 
point  of  not  being  truly  interactive. 


Distributed  System 

The  distributed  system  concept  as  shown  in  figure 
2 employs  the  use  of  a “control  computer”  between 
the  large-scale  computer  (IBM  360-75  class)  and  the 
image  analysis  terminals.  Interface  of  the  mass 
storage  facility  and  the  array  processor  to  the  large- 
scale  computer  is  similar  to  that  in  the  centralized 


FIGURE  2.— Distributed  system. 


system.  The  control  computer  provides  the  interface 
to  the  image  analysis  terminals  and  to  its  own  tem- 
porary local  mass  storage  and  peripherals.  The  func- 
tioning of  the  distributed  system  can  best  be  under- 
stood by  a more  detailed  analysis  of  the  control  com- 
puter. 


Control  Computer 

The  control  computer  is  envisioned  as  a midrange, 
highly  interactive  machine  such  as  a PDP  11-70.  Its 
primary  functions  are  interacting  with  the  analysis 
consoles,  moving  data  between  the  other  system  ele- 
ments, storing  and  retrieving  bulk  data,  and  monitor- 
ing the  overall  functioning  of  the  system.  The  con- 
trol computer  also  provides  minimal  computational 
support  for  the  image  analysis  system,  but  it  is 
assumed  that  most  bulk  data  processing  will  be  ac- 
complished in  the  special-purpose  processor  and  the 
large-scale  computer. 

In  concept,  most  of  the  actions  by  the  control 
computer  will  be  in  response  to  inputs  from  an 
analysis  console.  The  inputs  from  the  analysis  con- 
soles may  be  made  via  alphanumeric  terminals  asso- 
ciated with  the  analysis  consoles.  The  emphasis  in  all 
cases  will  be  to  provide  a near-real-time  response  to 
inputs.  In  practicality,  this  means  that  a response 
must  be  available  to  the  analyst  within  a few  seconds 
after  initiation.  To  be  a truly  interactive  system,  the 
control  computer  should  execute  most  tasks  within 
30  seconds.  Complex  tasks  that  may  require  more 
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than  3 to  5 minutes  for  completion  should  be 
scheduled  for  deferred  or  background  execution 
while  the  console  is  made  available  for  further  opera- 
tions. 

The  control  computer  will  provide  the  capability 
to  input  data  directly  from  normal  computer  tape  or 
high-density  digital  tape  (HDDT).  The  primary  data 
source,  however,  wilt  be  the  data  base  and  data  base 
management  system  resident  in  the  CPU's  of  the 
main  computational  facility.  Through  the  data  base 
management  system,  the  control  computer  will  have 
access  to  imagery  data  bases.  Transmission  band- 
width limitations,  however,  make  real-time  access  to 
imagery  data  impractical.  In  fact,  the  size  of  the  im- 
agery data  base  is  so  large  as  to  make  on-line  storage 
impractical.  The  EOD  systems  and  facilities 
workload  requirements  forecast  prepared  in  April 
1977  shows  a LACIE  Phase  II  imagery  data  base  of 
8.5  x 10s  bytes  and  a LACIE  Phase  III  imagery  data 
base  of  15.5  x 10*  bytes  (ref.  4).  The  numbers 
should  not  be  considered  absolute;  they  are  given 
only  to  indicate  the  magnitude  of  the  imagery  data 
base. 

To  operate  in  a truly  interactive  mode,  the  analyst 
will  know  in  advance  imagery  data  requirements  in 
an  interactive  session  and  will  submit  a request  for 
the  required  data  to  be  available  for  the  session.  The 
control  computer  will  acquire  the  appropriate  data 
bases  and  download  the  data  into  local  mass  storage 
on  a non-real-time  schedule.  During  the  interactive 
session,  data  will  be  acquired  from  the  temporary 
local  mass  storage  and  transferred  in  real  time  to  the 
displays  and  the  special-purpose  processor.  The  siz- 
ing of  temporary  local  mass  storage  is  expected  to  be 
sufficient  to  support  l-day  (24-hour)  transactions. 
The  availability  of  temporary  local  mass  storage  also 
makes  the  system  semi-independent  and  much  less 
sensitive  to  communications  failures.  Additionally, 
most  of  the  interactive  image  analysis  system  de- 
mands on  other  elements  of  the  data  processing 
system  can  be  scheduled  at  periods  of  low  activity 
and  more  available  communications  channels. 

No  cu  ent  system  configuration  exactly  dupli- 
cates the  proposed  distributed  system.  However,  the 
Atmospheric  and  Oceanographic  Information  Proc- 
essing System  ( AOIPS)  at  the  NASA  Goddard  Space 
Flight  Center  (GSFC)  has  some  similarities  (ref.  5), 
and  the  performance  achieved  at  GSFC  may  be  used 
as  a baseline.  The  GSFC  AOIPS  uses  an  IBM  360-91 
for  large-scale  computational  capability  and  a DEC 
PDP  11-70  interfaced  to  two  image  analysis  ter- 
minals. Each  IAT  contains  five  channels  of  RAM 


refresh  memory  and  has  a dual  interface  to  the  PDP 
11-70  in  which  high-volume  data  are  transferred  on 
the  high-speed  (H/S)  bus  and  low-volume  data  and 
control  signals  are  transferred  on  the  unibus.  GSFC 
has  determined  that  the  current  AOIPS  configura- 
tion us  depicted  in  figure  3 will  provide  from  50  to  70 
percent  of  the  total  theoretical  system  input-output 
bandwidth  of  5.8  megabytes  per  second.  If  it  is 
assumed  that  each  refresh  memory  will  be  loaded 
with  a 512-line  by  512-pixel  image,  there  will  be  a 
data  transfer  of  262  144  bytes  per  channel  since  a pix- 
el is  1 byte.  If  it  is  further  assumed  that  the  data  are 
formatted  and  stored  on  a disk  such  a>  *n  RP06 
(which  has  a data  transfer  speed  of  1.2  micro- 
seconds per  byte)  a single  channel  (512  by  3.  ’>  re- 
quires a bandwidth  of  approximately  0.325  megabyte 
per  second.  A transfer  of  four  channels  into  refresh 
memory  requires  a bandwidth  of  approximately  1.3 
megabytes  per  second.  The  10-channel  configuration 
of  AOIPS  requires  a bandwidth  of  3.25  megabytes 
per  second  and  nominally  provides  for  the  transfer 
of  approximately  2 full  television  images  per  second. 
An  extrapolation  of  this  capability  can  be  extended 
to  future  systems.  Assume  a system  having  12  image 
analysis  terminals,  each  with  5 refresh  memories 
(512  by  512).  The  theoretical  bandwidth  excluding 
overhead  would  be  12  consoles  by  1.625  megabytes 
(5  channels),  or  19.5  megabytes.  If  a nominal  band- 
width of  3.5  megabytes  is  assumed,  the  system 
throughput  would  be  reduced  by  a factor  of  6.  The 
throughput  Figures  assume  that  one  console  requests 
at  a given  time  and  that  all  data  transfers  to  the  first 
requesting  terminal  are  completed  before  servicing  a 
second  requesting  terminal.  It  should  be  made  clear 
that  the  throughput  addressed  thus  far  is  only  for 
movement  of  data  and  does  not  address  the  time  re- 
quired for  computational  processes  such  as  classifica- 
tion and  clustering. 

The  distributed  system  is  similar  to  the 
centralized  system  in  that  the  repetitive  operations 
required  for  classification  and  clustering  will  be  ac- 
complished by  an  array  processor  such  as  the 
STAR  AN.  A detailed  analysis  of  array  processor  per- 
formance in  image  analysis  applications  is  not  con- 
sidered necessary  because  several  papers  addressing 
this  topic  have  been  presented  in  the  past  3 years. 

Based  on  performance  achieved  with  the  AOIPS, 
it  would  appear  that  a single  control  computer  such 
as  a PDP  1 1-70  could  support  six  image  analysis  ter- 
minals in  an  interactive  mode.  The  system 
throughput  rate  would  be  approximately  six  times  as 
slow  but  would  still  provide  acceptable  response 
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times.  It  is  questionable  whether  a PDP  11-70  could 
support  12  image  analysis  terminals,  and  the  use  of  a 
second  control  computer  to  share  the  load  would 
have  to  be  considered.  As  with  the  centralized 
system,  thorough  modeling  of  the  proposed  con- 
figuration is  required. 

This  configuration  offers  several  attractive 
features. 

1.  Analysis  terminals  can  be  implemented  in 
phases  without  affecting  existing  interfaces. 

2.  Analysis  terminals  can  operate  in  an  off-line 
mode;  they  are  not  completely  dependent  on  large- 
scale  CPU  and  associated  communications  networks. 

3.  Most  control  functions  are  offloaded  to  the 
control  computer,  thus  freeing  the  large-scale  com- 
puter for  computationr.l  tasks  and  reducing  bus 
traffic. 


Some  of  the  disadvantages  of  such  a configuration 
include  the  additional  capital  expenditure  for  control 
computer  and  increased  maintenance  and  operations 
costs. 


Special-Purpose  Processor 

In  general,  the  highly  repetitive  computational 
tasks  such  as  classification,  clustering,  and  other  im- 
age processing  algorithms  will  be  performed  by  a 
special-purpose  processor.  The  special-purpose  proc- 
essor is  envisioned  to  be  a high-speed,  parallel  array 
processor  such  as  the  STARAN.  This  type  of  device 
is  very  efficient  when  a given  algorithm  must  be 
repetitively  applied  to  bulk  data.  Typically,  the 
special-purpose  processor  will  be  loaded  with  data 
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from  the  main  CPU,  set  into  operation,  and  left  to 
run  independently  until  the  results  are  available.  The 
results  are  downloaded  into  the  control  computer 
and  then  routed  to  the  appropriate  display  console. 
When  two  or  more  consoles  request  operations  re- 
quiring the  special-purpose  processor  simul- 
taneously, the  requests  will  be  queued  on  a priority 
basis.  In  all  other  cases,  requests  will  be  processed  in 
the  order  received.  It  is  unlikely  that  a sufficiently 
large  number  of  analysts  simultaneously  initiating 
contending  requests  will  cause  an  unacceptable  wait- 
ing time. 


Image  Analysis  Terminals 

The  image  analysis  terminals  will  consist  of  two 
basic  components:  a display  generator  and  display 
and  analysis  consoles.  The  image  analysis  terminal 
concept  is  identical  for  both  the  centralized  and  the 
distributed  systems  and  differs  only  in  the  display 
generator/computer  interface. 


Display  Generator 

The  centralized  display  generator  that  services  all 
the  consoles  is  the  core  of  the  interactive  image 
analysis  system.  The  display  generator  will  contain 
the  refresh  memories  for  all  consoles  and  the  cursor 
generation,  hardware  classifier,  character  generation, 
and  digital-to-analog  (D/A)  conversion  circuitry. 
The  logical  and  physical  architecture  of  the  display 
generator  will  be  highly  modular  and  will  use  stan- 
dardized building  blocks.  As  currently  envisioned, 
the  standard  building  block  will  be  a module  that 
contains  between  1 and  8 RAM  refresh  memories 
(512  by  512  by  8),  computer  interface,  hardware 
classifier,  and  video  processing  circuitry.  The  exact 
configuration  of  the  display  generator  module  will  be 
determined  by  the  configuration  cf  the  interfacing 
console.  The  centralized  display  generator  configura- 
tion is  illustrated  in  figure  4. 

The  centralized  approach  has  several  advantages. 

I.  Each  individual  terminal  need  not  be  imple- 
mented with  a refresh  memory  for  the  maximum 
number  of  channels  it  is  ever  expected  to  handle. 
The  centralized  system  can  instead  be  implemented 
with  the  most  likely  number  of  channels  that  can  be 
expected  to  be  in  use  at  any  given  time.  Channels  can 
then  be  added  or  deleted  to  a given  terminal,  based 
on  the  requirements  of  the  task  to  be  performed. 


2.  The  centralized  system  is  also  easier  to  main- 
ttun  and  service.  If  any  individual  channel  should 
fail,  modules  from  another  channel  can  be  “bor- 
rowed" while  diagnosis  and  repair  are  performed  on 
the  faulty  unit, 

3.  The  centralized  concept  also  simplifies  the  data 
distribution  problem  to  the  console  displays.  The  dis- 
play generator  will  output  video  signals  that  will  be 
routed  to  the  console  displays  via  a coaxial  cable.  The 
consoles  can  be  located  as  far  as  several  hundred  feet 
from  the  display  generator  without  any  degradation 
to  the  signal.  For  cable  runs  of  less  than  1000  feet, 
simple  unbalanced  cable  equalizers  can  be  used  at  the 
console  to  restore  a flat  frequency  response  and  re- 
ject any  low-frequency  noise  that  may  be  induced  in 
the  cable  run.  For  longer  cable  runs,  balanced 
transmission  techniques  can  be  used. 

4.  All  other  display  generator/console  interfaces 
are  low-speed  control  and  status  indicator  circuits 
and  are  not  constrained  by  critical  cable  lengths.  The 
longer  cable  runs  may  require  amplification  and 
redrive  circuitry,  but  this  is  not  considered  a major 
problem. 


Display  and  Analysis  Consoles 

The  display  and  analysis  consoles  will  employ  a 
modular  design  consisting  of  standard  modules.  The 
two  basic  categories  of  standard  modules  are  display 
and  control. 

The  display  modules  include  a color  monitor  (SI 2 
by  512  pixels),  a conversational  monitor  (black-and- 
white  repeat  field),  a high-resolution  monitor  (1024 
by  1024),  and  a light  table  using  an  illuminant  having 
the  same  color  temperature  as  the  color  monitor.The 
control  modules  include  a keyboard,  target  or  cursor 
control,  a display  control  and  status  module,  func- 
tion buttons,  and  a communications  panel. 

The  display  and  control  modules  are  used  in  a 
building  block  approach  to  provide  the  required  con- 
sole configuration  to  perform  a specific  task  or  func- 
tion. As  programmatic  requirements  change,  console 
modules  may  be  added  or  deleted  as  required  with 
minimal  impact  on  the  overall  system. 

Preliminary  studies  indicate  that  the  consoles  may 
fall  into  four  basic  categories:  analysis  console; 
screening  and  editing  console;  registration,  mensura- 
tion, and  correlation  console;  and  systems  manager 
console.  The  analysis  console  is  a full  capability  con- 
sole with  two  color  displays,  a conversational  moni- 
tor, a light  table,  a keyboard,  and  appropriate  control 
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FIGURE  4.— Centralized  display  generator. 


modules;  whereas  a screening  and  editing  console 
may  use  only  a single  color  display.  The  registration, 
mensuration,  and  correlation  console  capability  is 
primarily  centered  around  the  high-resolution  (1024 
by  1024  pixels)  black-and-white  display.  The  systems 
manager  console  provides  the  capability  to  monitor 
the  activity  of  all  consoles.  The  systems  manager  is 
capable  of  calling  up  any  display  from  any  console  to 
ensure  quality  control  or  to  aid  in  maintenance.  The 
systems  manager  console  is  the  means  by  which 
system  resources  are  allocated  and  tracked. 


SUMMARY 

Interactive  image  analysis  systems  currently  at 


JSC  have  been  used  to  perform  a limited  number  of 
specific  tasks;  however,  future  requirements  indicate 
that  a more  general  image  analysis  capability  must  be 
provided.  The  design  of  such  a system  will  involve  a 
thorough  analysis  of  the  environment  projected  for 
the  1980’s.  The  design  must  address  the  analyst- 
machine  interface  in  detail  because  this  is  often  a 
limiting  factor  in  interactive  image  analysis  systems. 
The  design  of  future  systems  must  involve  careful 
evaluation  of  state-of-the-art  technology  as  well  as 
improved  analysis  techniques. 

In  summary,  it  is  not  possible  to  address  all  the 
considerations  affecting  the  design  of  future  image 
analysis  systems;  however,  this  paper  has  attempted 
to  identify  those  key  considerations  for  a future  JSC 
image  analysis  system. 
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Very  High  Speed  Processing:  Applicability  of 
Peripheral  Devices  to  Pixel-Dependent  Tasks 

/ C.  Lyona 


ABSTRACT 

The  LACIE  was  representative  of  applications 
users  of  Landsat  data;  it  was  distinguished  in  the 
context  of  the  present  paper  primarily  by  the  quan- 
tity of  data  processed  in  the  project.  This  data 
volume  was  anticipated,  before  project  inception,  to 
exceed  the  processing  capacity  of  existing  support 
systems,  particularly  in  the  performance  of  LACIE 
implementations  of  classical  pattern-recognition 
functions;  viz,  iterative  clustering  and  maximum- 
likelihood  classification.  This  paper  describes  the 
early  options  studied  in  the  satisfaction  of  LACIE 
computational  demands  and  the  ultimate  selection 
and  development  of  an  array  processing  solution  to 
the  problem.  The  economic  justification,  as  a func- 
tion of  required  multitemporal  Landsat  analysis,  is 
provided  for  this  approach;  the  suitability  of  such 
processors  for  LACIE  and  other  applications  is  dis- 
cussed. 


THE  IMAGE  PROCESSING  ENVIRONMENT: 
PIXEL  PROCESSING  IN  LACIE 

Digital  image  processing  typically  involves  use  of 
computationally  intensive  techniques  in  both  data 
preparation  and  data  analysis  applications.  Examples 
of  such  techniques  are  geometric  and  radiometric 
corrections,  other  filtering  applications,  some  data 
clustering  procedures,  and  various  statistically  based 
classifiers.  These  procedures  are  characterized  by  a 
relatively  large  number  of  arithmetic  operations  to 
be  executed  for  each  picture  element  in  an  image  or 
image  subset.  Since  images  tend  to  be  composed  of 
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many  pixels,  the  performance  of  any  of  these  pro- 
cedures can  demand  an  astronomical  number  of  in- 
struction executions.  Even  in  relatively  limited  ap> 
plications,  conventional  serial  processing  devices 
have  sometimes  proved  to  be  inadequate  image 
analysis  vehicles,  either  yielding  unacceptable  in- 
teractive response  times  or  monopolizing  system 
resources.  The  result  has  been  a proliferation  of 
special-purpose  equipment  designed  to  assume  the 
processing  load  associated  with  computationally  bur- 
densome algorithms.  The  implementation  of  a pro- 
gramable  parallel  processor,  peripheral  to  a large 
mainframe,  as  the  solution  of  the  “pixel  processing” 
problem  posed  by  LACIE  is  discussed  herein. 

To  justify  this  implementation,  the  first  major 
portion  of  the  paper  will  be  a review  of  the  condi- 
tions driving  the  decision  to  modify  the  system.  This 
review  will  be  effected  through  (I)  a detailed  ex- 
amination of  the  behavior  of  the  pre-LACIE  system 
when  performing  a representative  computationally 
bound  (compute-bound)  application  and  (2)  a 
qualitative  survey  of  considered  processing  alterna- 
tives to  the  existing  unsatisfactory  mainframe  per- 
formance. In  the  second  major  section,  the  perform- 
ance and  economic  justification  of  the  selected 
special-purpose  device  as  applied  to  LACIE  will  be 
discussed  and  the  general  applicability  of  special 
devices  to  the  general  image  processing  problem  will 
be  noted. 

Finally,  it  should  be  observed  that  the  basic  intent 
of  the  implementation,  to  improve  system 
throughput  in  the  face  of  very  large  data  volumes, 
was  achieved.  Essentially,  in  most  applications,  the 
system  performance  was  converted  from  a central 
processing  unit  (CPU)-bound  state  to  an  input-out- 
put (I/0)-bound  state.  This  conversion,  however,  ex- 
posed the  I/O  and  data  management/traffic  control 
of  increasingly  large  and  diverse  data  sets  as  a critical 
problem  for  future  large-scale  remote-sensing 
applications. 
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PROBLEM  STATEMENT:  PRE-LACIE  AND 
EARLY  LACIE  CAPABILITIES  AND 
LIMITATIONS 


Syttom  Description 

In  late  1973,  the  Earth  Resources  Interactive  Proc- 
essing System  (ER1PS),  an  existing  NASA  Johnson 
Space  Center  (JSC)  image  analysis  facility,  was  iden- 
tified as  the  baseline  from  which  the  LACIE  produc- 
tion classification  system  was  to  be  developed.  This 
proposed  use  of  ERIPS,  which  had  been  conceived 
Only  as  an  interactive  research  tool,  was  based  pri- 
marily on  the  existence  within  the  system  of  much 
suitable  applications  software,  an  expandable  hard- 
ware/software configuration,  and  schedule  con- 
straints precluding  total  design  and  development  of  a 
new  support  system.  The  choice  of  ERIPS  was 
satisfactory,  even  if  not  ideal,  but  extensive  enhance- 
ments1 were  required  in  a number  of  areas.  A block 
diagram  of  the  system  as  it  existed  in  1973  is  shown 
in  figure  1,  with  later  additions  for  LACIE  indicated. 

Of  prime  concern  to  the  use  of  this  system  within 
LACIE  was  its  throughput,  constrained  both  by  fac- 
tors resolved  as  described  elsewhere1  and  by  the 
more  fundamental  CPU  limitation  exposed  by  com- 
putationally bound  routines  invoked  frequently  in 
analysis.  Within  the  principal  ERIPS  subsystem 
(Pattern  Recognition)  to  be  used  by  LACIE,  the 


FIGURE  1.— LACIE /ERIPS  ftcllity. 


'For  a description  of  these  enhancements,  see  the  following 
LACIE  Symposium  papers:  C.  L.  Johnson,  “LACIE/ERIPS  Soft- 
ware System  Summary";  L.  E.  Westberry,  “The  LACIE  Data 
Bases:  Design  Considerations";  and  Barbara  B.  Duprey,  "Man- 
Machine  Interfaces  in  LACIE/ERIPS." 


modules  exhibiting  this  characteristic  were  iterative 
clustering  (a  derivative  of  the  1SOCLS  algorithm), 
feature  selection  (a  weighted  divergence  routine), 
and  maximum-likelihood  classification.  These  proc- 
essors are  described  in  the  paper  by  Johnson;  here, 
the  example  of  maximum-likelihood  classification  is 
adequate  to  quantify  the  extent  of  the  CPU-driven 
problem  and  will  be  used  subsequently  for  con- 
tinuity. 

The  ERIPS  Maximum-Likelihood  Classifier: 
LACIE  Implications 

The  objective  of  maximum-likelihood  classifica- 
tion is  to  minimize  the  function: 


Wc(x)  ■ Sc  + |(x  - nc) T\' 1 (x  - nc)  (1) 

where  Sc  is  a class  constant  associated  with  each  class 
C,  A"1  is  the  Akhannel  inverse  covariance  matrix  of 
class  C,  Mr  is  the  AAchannel  mean  vector  for  class  C 
x is  the  ^channel  observation  vector  (pixel)  under 
test,  and  Hc  is  the  likelihood  measure  that  the  pixel  x 
belong  to  class  C If  //r  < Hk  for  all  k +c,  then  x is 
assigneo  to  class  C.  The  essential  limitations  within 
ERIPS  were  typical  to  many  classification  schemes; 
viz,  N « 30  and  C « 20  (in  LACIE,  N « 16  and 
C ^ 60). 

Various  implementations  of  equation  (1)  have 
been  adopted  at  image  processing  installations;  most 
follow  rational  considerations  of  maximization  of 
processing  efficiency  given  a specific  equipment  and 
software  architecture.  ERIPS  was  no  exception;  con- 
siderable attention  was  paid  to  code  optimization  in 
the  IBM  360/75  computer.  The  following  steps  were 
adopted. 

1.  Reformulate  equation  (1)  into  a computa- 
tionally efficient  expression: 

N i 

tfc(x>  ■ sc  ♦ E °i  Z VS  <2> 

<=i  /= i 

where  <rt  is  the  standard  deviation  in  channel  / of 
class  C.  The  covariance  matrix  is,  of  course,  lower 
triangular.  This  restatement  is  typical. 

2.  Replace  original  FORTRAN  code  with  assem- 
bly language. 
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3.  Improve  disk  access  methodology— A local 
(ERIPS)  disk  access  method  was  developed  to  ex- 
ploit image  organization  characteristics.  This  pro- 
cedure, the  Image  Data  Access  Method  (ref.  1),  per- 
mitted retrievals  from  the  image  storage  medium 
(IBM  2314  disks)  at  essentially  full  disk  rates  and 
thus  reduced  I/O  waits  at  any  point  to  a minimum. 

4.  Serialize  representation  of  equation  (2)  in  core 
to  avoid  CPU  cycles  in  resolution  of  branch 
addresses. 

5.  Exploit  register-to-register  arithmetic  and 
eliminate  register-to-storage  arithmetic  (slower) 
where  possible. 

Given  the  preceding  considerations,  the  classifica- 
tion process  (eq.  (2))  was  reduced  essentially  to  its 
arithmetic  components  with  minimal  system  over- 
head or  time-consuming  addressing  and  logical 
operations.  Consequently,  the  arithmetic  required 
for  the  solution  of  equation  (2)  was  given  by 
Adds:  (A2  + N+  1)C  per  vector 
Multiplies:  l/2(Af2  + 3A)C  per  vector 
consisting  of  the  bulk  of  CPU  cycles  (0.7S  second  in 
the  ERIPS  CPU)  used  in  the  classification  process. 

The  number  of  operations  required  on  a single 
pixel  under  this  breakdown  for  representative  ERIPS 
test  cases  is  shown  in  table  I.  Using  a LACIE  sample 
segment  (for  comparison  with  later  sections)  consist- 
ing of  22  932  pixels,  it  can  be  seen  that  the  number  of 
arithmetic  operations  for  this  “inner  loop"  of 
classification  can  approach  109,  given  the  LACIE  ex- 
perience of  approximately  40  classes  defined  per 


Table  I.— Maximum-Likelihood  Classification 
Instruction  Executions  Per  Pixel  In  the  Quadratic  Form 


Parameter 

Values 

No.  of  channels 

4 

4 

16 

16 

16 

No.  of  classes 

10 

20 

10 

20 

60 

No.  of  adds 

250 

$00 

2890 

$780 

17  340 

No.  of  multiplies 

140 

280 

1520 

3040 

9120 

sample  segment. 

Table  II  affords  a more  global  look  at  the 
classification  processes  other  than  the  inner  loop  and 
shows  the  significance  of  the  quadratic  form  to  the 
complete  process.  It  is  easily  seen  that  the  classifier  is 
heavily  compute-bound  and  that  necessary  I/O 
operations,  represented  only  in  lines  1 , 2, 5,  and  6,  are 
negligible  contributors  to  the  process.  The  “Gen 
stats"  entry  is  associated  with  the  preparation  of 
statistics  for  the  classifier  from  the  original  training 
set,  and  is,  with  the  quadratic  form  evaluation, 
almost  totally  compute-bound. 

To  exemplify  the  limiting  time  resource,  consider 
the  problem  of  classifying  into  10  classes  an  image  of 
7.S  million  pixels  in  4 channels  (a  Landsat-2  frame). 
This  problem  would  require  nearly  4 hours  of  CPU 
time  under  the  indicated  breakdown  of  the  table. 

The  timing  figures  developed  here  were  applied  to 
the  LACIE  anticipated  workload;  similar  values 
were  obtained  for  the  other  processes  which  would 


Table  II.— IBM  360/75  Maximum-Likelihood  Performance,  June  7973° 


[Computed  timings  for  LACIE  segments,  set 1 


Process 

Time,  psec 

0/0; 

N-4 

020; 

N-4 

0/0; 

N-/6 

C-20; 

N-/6 

System  overhead 

3000 L 

0.351 

0.351 

0.351 

0.351 

Data  movement 

(6.49/V+  8.82)/.P 

.797 

.797 

2.583 

2.583 

Gen  stats 

S.03&LP 

11. 53$ 

46.140 

11.535 

46.140 

Quadratic  form 

(7.46  + N(4N+  16.21)1  CLP 

31.256 

62.512 

296.010 

592.020 

Store  best  Q 

incLP 

.700 

1.400 

.700 

1.400 

Store  best  C 

5.9LP 

.135 

.135 

.135 

.135 

Total 

44.780 

111.335 

311.314 

642.629 

Percent  in  quadratic 
form  (pure  compute) 

70 

56 

95 

92 

‘image  containing  L lines,  P puela/line,  S channels  with  < classes. 
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contribute  significantly  to  a typical  LACIE  scenario. 
The  conclusion,  based  on  a project  peak  of 4800  sam- 
ple segments,  was  that  some  60  hours  of  IBM  360/75 
time  would  be  required  daily  to  perform  the  assign* 
able  ERIPS  functions.  Such  equipment  time, 
although  theoretically  available  locally  within  the 
five  360/75  CPU’s  of  the  installation,  was  clearly  im- 
practical (ref.  2).  An  alternative  was  indicated. 


Processing  Alternatives 

The  problem  identified  previously,  common  in 
some  degree  to  most  image  processing  installations, 
can  be  addressed  in  several  ways.  Some  of  these  tech- 
niques were  unsuitable  for  LACIE  application  but 
are  included  here  for  completeness.  Given  a con* 
strained  CPU,  the  following  alternatives  may  ment 
consideration  under  given  assumptions  for  some 
applications. 

1.  Technique  improvements/alternatives  (soft- 
ware) 

a.  A variety  of  improvements  to  the  classifica- 
tion process  time  can  be  achieved  without  loss  of  ac- 
curacy; the  measure  of  improvement  is  strictly  data- 
dependent  but  does  not  generally  exceed  a factor  of  4 
or  5 for  the  most  successful,  vector  classification. 
This  procedure  involves  the  retention  of  classifica- 
tion assignment  for  pixels  through  maximum  likeli- 
hood in  an  ordered  table.  Subsequent  pixels  (vectors) 
are  interrogated  for  presence  in  the  table.  If  present, 
the  assignment  is  made  on  the  table  class;  if  not, 
classification  proceeds  normally.  This  procedure 
works  well  on  single-acquisition  Landsat  images  in 
agricultural  areas  in  which  data  spread  is  small  and 
duplication  of  pixels  is  a frequent  occurrence.  Unfor- 
tunately, in  multitemporal  applications  (LACIE), 
the  procedure  is  essentially  unusable;  the  likelihood 
of,  for  example,  duplication  of  16-channel  vectors  is 
small  in  typical  images. 

b.  Another  alternative  is  the  treatment  of  the 
classification  expression  (eq.  (2))  as  a running  sum 
of  the  calculated  components  for  a pixel  (x)  against  a 
class  (C).  Periodically  in  this  summing  procedure,  a 
test  of  the  sum  against  the  best  Hc  to  date  is  made;  if 
the  running  sum  exceeds  this  bat  value,  processing 
is  terminated  on  the  class  and  the  pixel  is  advanced 
to  examination  against  the  next  class.  Typically,  the 
performance  gain  in  this  approach  may  be  50  per- 
cent. The  total  system  improvement  afforded  by  this 
technique,  however,  was  inadequate  to  meet  LACIE 
requirements. 


c.  Other  techniques  which  offer  varying 
degrees  of  promise  to  specific  users  as  computational 
reduction  devices  include  linear  (fixed-point) 
classifiers  and  the  paratlelpiped  classifier  imple- 
mented in  the  General  Electric  Image-100  system. 
These  techniques  cause  some  degradation  in  the 
statistical  basis  of  the  assignment  process  (or  infor- 
mation loss)  and  in  their  existing  forms  were  un- 
suitable for  LACIE.  It  is  not  impossible  that  better 
understanding  of  the  data  may  lead  to  increased  use 
of  similar  procedures  in  the  future.  The  performance 
advantage  of  such  algorithms  is  a function  of  their 
computational  simplicity  when  compared  to  max- 
imum likelihood. 

d.  In  the  data  clustering  application  (see  the 
paper  by  Johnson)  used  in  ERIPS,  a derivative  of 
ISOCLS,  a large  number  of  iterations  could  be  re- 
quired before  the  desired  stabilization  of  clusters  was 
achieved.  In  the  aggregate,  these  passes  required 
CPU  resources  nearing  those  for  classification.  An 
adaptive  clustering  procedure  was  adopted;  in  this 
procedure,  cluster  centers  were  dynamically  adjusted 
and  clusters  created  during  a pass,  and  the  results  of 
one  such  pass  were  input  to  the  iterative  clustering 
process.  If  the  adaptive  control  parameters  are  prop- 
erly chosen,  the  entry  vectors  to  iterative  clustering 
can  be  close  approximations  to  the  results  of  several 
iterative  passes  under  the  same  initial  conditions. 
Such  intelligent  preprocessing  has  the  potential, 
never  successfully  adapted  to  LACIE,  of  reducing 
clustering  processing  requirements.  Similar  methods 
continue  to  imply  attractive  computational  reduc- 
tions. 

e.  Feature  (or  channel)  selection  is  often 
employed  to  identify  channels  containing  maximum 
useful  information  on  the  separability  of  defined 
classes.  If  the  number  of  channels  entering  classifica- 
tion can  be  reduced,  the  number  of  arithmetic  opera- 
tions is  quadratically  reduced  according  to  equation 
(2).  Unfortunately,  the  computation  time  associated 
with  most  viable  feature  reduction  techniques  is 
great.  In  anticipation  of  later  results,  it  is  observed 
that  both  the  divergence  and  the  Bhattacharyya  dis- 
tance processors  implemented  in  ERIPS  and  LACIE 
are  burdensome  and,  as  well,  computationally  intrac- 
table for  meaningful  development  on  special-pur- 
pose equipment. 

f.  Finally,  it  is  noted  that  the  most  significant 
prospects  for  computational  gains  are  likely  to  come 
from  data  compression  and  from  demonstrations 
that  the  information  loss  from  application  of  some 
compression  procedure  is  minimal.  No  such  pro- 
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cedure,  however,  has  yet  been  shown  applicable 
(workable)  in  the  LACIE  type  of  problem,  although 
some  progress  has  been  made. 

2.  Equipment  modification  or  expansion 

a.  An  obvious  route  to  increased  capability  is  a 
more  powerful  mainframe;  however,  it  will  be 
shown  subsequently  that  the  devices  of  paragraph  2c 
generally  obviate  this  consideration  in  the  relevant 
signal  processing  environment. 

b.  Special  (nonprogramsble)  hardware  has 
been  implemented  in  numerous  installations  which 
enable  performance  of  certain  image  analysis  proc- 
esses essentially  at  I/O  rates.  The  singular  disadvan- 
tage of  such  equipment  is  its  inflexibility.  The 
characteristic  trend  within  LACIE,  recognised  at  the 
outset,  was  continuing  algorithmic  modification  and 
development  of  new  processing  procedures  as  the 
project  evolved.  “Boxes”  capable  of  supporting  max- 
imum likelihood,  for  example,  at  extremely  high 
rates  would  shortly  have  outlived  their  usefulness  in 
the  project,  although  this  type  of  equipment  can  be 
cost-effective  in  facilities  with  definable  charters  im- 
plying long-term  stability.  The  performance  of  such 
hardware  classifiers  is,  nevertheless,  exceptional. 
Several  existing  devices  can  perform  the  full  Landsat 
frame  classification  exercise  described  earlier  in  less 
than  60  seconds,  given  appropriate  I/O  ports  and 
drivers. 

c.  The  final  option  to  be  considered,  and  the 
one  selected  for  ERIPS  augmentation,  is  division  of 
labor.  This  concept  can  be  realized  in  the  general 
sense  by  any  distributed  or  networking  approach  to  a 
configuration;  in  the  present  context,  however,  the 
limitations  of  the  conventional  serial  processor 
precluded  inexpensive  solutions  to  the  LACIE  prob- 
lem by  use  of  several  serial  devices.  The  availability 
of  programable  high-speed  computers  of  exotic 
architectures  offered  another  alternative.  Tradi- 
tionally. these  devices  had  their  origins  in  signal 
processing  applications  such  as  radar,  sonar,  and 
geophysical  data  reduction.  Normal  applications  in- 
cluded the  Cooley-Tukey  fast  Fourier  transforms, 
simultaneous  solution  of  many  differential  equa- 
tions. or  various  image  enhancement  or  filtering 
operations.  Several  architectural  concepts  have  been 
exploited  for  dramatic  performance  improvements 
in  these  compute-bound  applications;  in  all  such 
architectures,  the  conventional  serial  CPU  is 
replaced  by  several  (or  many)  arithmetic/logical 
•wits,  arranged  either  in  parallel  or  in  series.  More 
complex  architectures  generally  derive  from  these 
two  functional  types;  however,  several  implemented 


systems  can  be  viewed  only  functionally,  and  not 
necessarily  electronically,  as  parallel  or  serial.  The 
focus  of  the  second  portion  of  this  paper  will  be  the 
use  of  one  such  architecture  in  LACIE,  with  some 
discussion  of  different  approaches  for  other  applica- 
tions. 


PERIPHERAL  PROCESSOR 
IMPLEMENTATION 

Projected  LACIE  workload  models  and  algorithm 
characteristics  were  jointly  examined  during  1974  to 
define  features  suitable  for  implementation  in  a pe- 
ripheral device.  The  utility  criterion  was  used  by  the 
investigated  algorithm  of  the  mainframe  CPU,  a 
direct  product  of  its  CPU  requirement  per  use  and  its 
frequency  of  use.  It  was  concluded  that  the  bulk  of 
algorithms  or  calculations  which  dealt  with 
manipulations  only  of  statistics  or  data  sets  in  the  ag- 
gregate were  best  left  in  the  mainframe  and  that  all 
processes  requiring  direct  pirel  Derations  in  quan- 
tity would  be  transferred  to  the  special-purpose 
device.  These  processes  included  the  following 
specific  modules. 


Statistics 

The  statistics  module  was  invoked  both  for  direct 
computation  of  training  field  means  and  covariancea 
and  for  establishing  these  same  measures  on  the 
results  of  clustering  runs  (as  described  subse- 
quently). It  was  not  anticipated  that  the  LACIE 
statistics  processor  would  be  an  excessive  user  of 
system  CPU  resources,  but  rather  that  the  in-line 
nature  of  its  use  within  clustering  would  tend  to  com- 
plicate and  reduce  efficiency  of  the  clustering  pro- 
cedure if  statistics  were  left  in  the  host  mainframe. 
The  computations  of  interest  were 
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where  M is  the  population  of  the  class  (field)  under 
investigation,  and  the  other  terms  are  as  defined 
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earlier  (for  eq.  (I)).  It  was  intended  that  many  (<60) 
such  classes  be  processed  simultaneously;  i.e.,  ac- 
cumulations to  be  made  would  be  performed  on 
fields  in  tandem  as  the  processor  advanced  through 
the  image  only  once. 

Adaptive  Cluttering 

The  adaptive  clustering  procedure,  briefly  men- 
tioned earlier,  provided  a means  of  grouping  similar 
vectors  from  measurement  space  (where  similarity  is 
determined  through  a selected  .Vepace  metric).  The 
technique  was  never  exploited  in  the  LACIE  produc- 
tion (batch)  system,  although  it  was  frequently  in- 
voked for  continuing  research  into  data  behavior. 
The  adaptive  clustering  algorithm  is  relatively  com- 
plex and,  because  of  its  low  significance  to  the  proj- 
ect, will  not  be  described  here.  However,  it  is  ger- 
mane to  note  that  the  algorithm,  as  ultimately  imple- 
mented on  the  parallel  processor  chosen,  was 
modified  conceptually  to  exploit  parallel  features  in 
the  equipment  architecture.  Essentially,  an  algorithm 
which  had  been  a serial,  spectral  clustering  technique 
was  converted  into  one  which  implicitly  incorpor- 
ated spatial  data  characteristics. 

Iterative  Cluttering 

The  iterative  clustering  algorithm  provides  a 
means  both  for  assigning  measurement  vectors  to 
clusters  and  for  evolving  the  statistical  description  of 
the  reference  clusters.  The  algorithm  determines  the 
“distance"  of  each  measurement  vector  (or  a set  of 
such  vectors)  from  the  mean  vector  of  each  duster 
and  assigns  each  measurement  vector  to  the 
“nearest"  duster.  The  statistics  of  all  measurement 
vectors  assigned  to  a particular  class  are  determined 
and  are  used  to  modify  the  original  dusters  and 
cluster  statistics.  When  the  tasks  described  pre- 
viously are  accomplished,  the  algorithm  is  con- 
sidered to  have  undergone  one  “pass."  Usually, 
several  passes  are  executed  before  the  iterative 
clustering  process  is  terminated. 

The  implementation  of  this  algorithm  using  the 
peripheral  processor  required  interpass  operations  by 
the  host,  the  prime  expression  to  be  resolved  by  the 
peripheral  processor  was 


for  each  cluster  center  C if  0r(x)  < Dk(*)  for  all  k + 
r,  then  x was  assigned  to  C 


Maxlmum-LR(  etfhood  Ctee uHleuWow 

The  maximum-likelihood  algorithm  was  de- 
scribed earlier.  The  computational  kernel  and  the  re- 
tention and  generation  of  a classification  map  under 
defined  thresholding  conditions  were  viewed  as  ap- 
propriate for  conversion  to  the  special-purpose 
device. 


Mixture  Oenetty  Ctessfflcatfon 

The  mixture  density  (sum  of  likelihoods) 
classification  algorithm  is  similar  to  the  maximum- 
likelihood  algorithm.  The  distinction  is  a derivative 
of  the  dan  statistics  definition  made  in  each  case. 
The  maximum-likelihood  classification  algorithm 
uses  a set  of  class  statistics  (mean  and  covariances) 
obtained  for  the  population  of  the  clan  as  a whole; 
the  mixture  density  (Unction  is  formulated  to  treat  a 
clan  n a union  of  independent  subdassn,  each  of 
which  is  described  n a population  having  ■ complete 
set  of  dan  (subdan)  statistics.  This  representation 
tends,  under  careful  preprocessing  and  definition  of 
subclasses,  to  separate  a population  consisting  of  a 
multimodal  distribution  into  several  unimodal  dis- 
tributions and  to  improve  the  performance  of  the 
classification  algorithm.  This  algorithm  supplanted 
maximum  likelihood  as  the  principal  LACIE 
classifier.  The  computation  of  the  likelihood  expres- 
sion was  complicated  by  the  requirement  to  compute 
an  exponential.  Let 
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where  Qt  is  the  quadratic  fora  for  subclass  k.  Mis 
the  number  of  subclasses  In  the  class  C,  and  other 
tenm  area  defined  previously.  The  pixel  is  assigned 
to  the  class  Cif  Ht  > H4  for  c 


The  c toorithms  defined  previously  and  perform- 
ance benchmarks  consistent  with  LACIB 
throughput  requirements  wore  specified  in  a com- 
petitive request  for  proposal  in  September  1974.  In 
Merch  197S,  award  was  made  to  the  Goodyear  Aero- 
space Corporation,  which  had  proposed  a Staran 
S-SOO  system  manufactured  by  Goodyear  as  the 
LAC1E  special-purpose  processor  (SPP).  The  SPP 
was  insured  in  December  1975  and  made  available 
for  LAC1E  production  use  in  March  1974. 

The  SPP  (STARAN)  system  is  based  on  a com- 
puter organization  in  which  many  identical  opera- 
tions are  executed  simultaneously;  that  is,  it  is  a 
"single  instruction  stream,  multiple  data  stream" 
processor.  For  example,  in  the  SPP,  an  "add"  opera- 
tion can  be  executed  simultaneously  for  512  pairs  of 
numbers.  The  parallel  execution  of  an  operation  for 
many  data  pairs  is  made  possible  by  employing  many 
processing  dements  (512). 

A top-cut  diagram  of  the  SPP  mainframe  is  drown 
in  figure  2.  It  consists  of  a conventionally  addressed 
control  mommy  for  program  storage  and  data  buffer- 
ing. a control  logic  unit  for  sequencing  and  decoding 
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instructions  from  control  memory,  and  two  associ- 
ative array  modutea. 

The  high  processing  and  throughput  speeds 
achieved  by  the  SPP  rasuhod  from  the  capabilities  of 
the  associative  array  (fig.  5).  Each  SPP  array  contains 
256  simple  processing  elements.  All  processing  ele- 
ments (Hft)  perform  the  same  operation  at  the 
same  time,  but  each  processing  element  acts  mi  inde- 
pendent data.  Thus,  in  each  SPP  array,  256  indepen- 
dent data  streams  car.  be  processed. 

Array  memory  lived  to  support  the  PEI  is  com- 
posed of  256  words  having  256  bits.  Multiple  access 
paths  between  the  PE's  and  the  bit  memory  locations 
provide  ready  access  to  256  different  hit  patterns  in 
the  array.  Two  access  "stencils"  are  shown  in  figure 
5. 

To  farther  enhance  the  data  routing  capability  of 
an  array  module,  an  aUnement,  or  permutation,  net- 
work in  the  machine  provides  a flexible  interconnec- 
tion between  processing  dements.  The  multiple 
processing  dements,  the  multidimensional  access 
memory,  and  the  permutation  network  give  the  SPP 
the  flexibility  to  be  useful  for  a wide  set  of  problems. 

The  LAC1E  algorithms  were  all  wdl  suited  to  the 
SPP  architecture  because  they  had  an  inherent 
parallelism  resulting  from  a given  computation  being 
performed  on  all  picture  elements  (pixels)  in  an  im- 
age. Since  computation  associated  with  each  pixel 
was  the  same  for  a given  algorithm,  it  could  be  imple- 
mented in  a single-instruction  stream.  The  LAC1E 
algorithms  thus  fit  the  single-instruction,  multiple- 
da  ta-ttream  concept  that  is  part  of  the  SPP  architec- 
ture. It  should  be  observed  that,  although  the  SPP  is  a 
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full  array  processor,  the  LACIE  applications  ex- 
ploited the  architecture  primarily  as  a strictly  parallel 
device.  The  details  of  the  SPP  architecture  and 
arithmetic  operations  are  described  elsewhere  (ref. 
2);  it  will  suffice  here  to  describe  briefly  the  LACIE 
installation. 

In  the  configuration  of  figure  1,  an  SPP  interface 
unit  (IU)  is  the  communications  link  between  the 
SPP  and  the  IBM  360/75,  which  shall  subsequently 
be  identified  as  the  host.  The  host  port  is  a conven- 
tional high-speed  (to  1.2  Mbytes/sec)  IBM  2860 
selector  channel.  The  IU  contains,  on  the  SPF  side, 
an  I/O  controller  (IOC)  interfacing  to  (1)  a buffered 
I/O  channel  to  the  SPP  control  memory  used  in 
LACIE  for  large  data  transfers  (imagery)  and  (2)  an 
external  function  channel,  which  generally  carries 
control  signals  governing  processing  sequences.  The 
IU  interface  to  the  host  is  through  an  interface 
module  (IFM),  which  is  a functionally  self-contained 
module  that  communicates  with  and  transfers  data 
to  and  from  the  host  channel.  The  module  handles 
all  control  line  sequences  required  for  selection  and 
command  execution  on  the  host  channel.  There  are 
two  IFM's  of  identical  design,  one  for  each  host 
channel.  Each  can  operate  independently  of  the 
other;  however,  only  one  can  be  logically  connected 
to  the  IOC  at  a time.  Again,  additional  details  are 
specified  in  reference  2. 


Performance  Summary 

The  LACIE  performance  advantages  of  the  SPP 
over  the  360/75  are  functionally  dependent  on  (1) 
algorithm  organization  (the  ability  to  exploit  parallel- 
ism), (2)  number  of  data  channels,  (3)  number  of 
signatures  (classes/clusters),  (4)  number  of  pixels 
(vectors)  per  quantum  of  system  workload  (job),  (5) 
SPP  setup  time  (formatting  of  vector  transfers  to  and 
from  the  SPP),  and  (6)  data  base  retrieval  rates.  The 
effects  of  these  drivers  are  mutually  dependent  and 
difficult  in  many  cases  to  distinguish.  The  sampling 
of  results  provided  subsequently  will  be  generally 
treated  in  terms  of  these  driving  functions,  with  only 
a few  specific  comments  in  order  as  they  relate  to 
computational  idiosyncracies  of  the  individual 
algorithms.  Some  preliminary  remarks  follow. 

1.  To  repeat  earlier  comments,  a LACIE  image 
consists  of  22  932  data  vectors  or  as  many  as  4 sets  of 
such  (4-channe!)  vectors.  A maximum  of  60  sig- 
natures for  classification  or  clustering  may  be 
defined;  practically,  these  values  remain  between  30 


and  40.  Other  system  or  algorithm  delimiters  are 
generally  exercised  across  their  entire  range.  Exten- 
sive testing  of  the  SPP  software  in  the  production  en- 
vironment confirmed  both  logical  and  performance 
timing  behavior  of  the  system  throughout  the  range 
of  software  specifications. 

2.  The  historical  driver  of  the  360/7S-based 
L ACIE/ERIPS  performance  was,  as  stated,  the  CPU. 
In  the  SPP  configuration,  principal  limitations  on 
throughput  are,  in  practice,  the  retrieval  functions 
from  the  imagery  storage  medium,  the  IBM  2314 
disks.  Only  on  jobs  of  significant  complexity, 
specifically  classification  exercises  on  12  channels  or 
greater  with  discrimination  of  more  than  20  classes, 
dv : ; the  system  perform  in  an  SPP  CPU-bound  state. 
Development  of  an  imagery  data  retrieval  technique 
(ref.  1)  has  ensured  optimal  exploitation  of  the  disks 
for  the  peculiarities  of  the  LACIE  application,  but 
the  disks  generally  remain  the  system  driver.  Direct 
access  to  the  imagery  on  the  ITEL  7330  data  base 
would  permit  significant  throughput  improvements 
for  most  LACIE  jobs;  such  implementation  may  be 
made  at  a later  date,  as  necessary,  but  current 
performance  (although  suboptimal  because  of  I/O) 
satisfied  existing  constraints  on  resources. 

3.  SPP  arithmetic  is  field-length  dependent  in  per- 
formance characteristics.  The  LACIE  applications 
specifications  dictated  effective  equivalence  with 
360/75  floating-point  arithmetic  results  for  purposes 
of  continuity;  this  stringent  requirement  on  the  SPP, 
which  was  achieved,  is  not  statistically  justified  on 
the  basis  of  measurement  vector  variance,  and  legiti- 
mate results  of  processing  ecu  be  obtained  by  way  of 
shorter  fields  than  those  employed  with  significant 
performance  advantage. 

4.  In  a comparison  of  pre-SPP  and  post-SPP  tim- 
ing. the  control  base  was  modified  to  some  extent  in 
software  that  could  have  affected  360/75  applications 
performance;  that  is,  certain  360/75  system  software 
routines  were  optimized  at  the  time  of  SPP  imple- 
mentation. These  changes  could,  to  some  extent,  be 
reflected  in  the  timing  figures  given  subsequently  for 
pre-SPP  algorithms,  but  the  figures  shown  display 
pre-SPP  results  without  such  system  changes. 
Further,  as  cited  earlier,  the  adaptive  clustering 
algorithm  was  extensively  and  theoretically  modified 
when  incorporated  into  the  SPP;  the  objective  was  to 
maximize  the  benefits  of  parallelism  and  to  use 
spatial  as  well  as  spectral  data  characteristics.  Ti  e 
result  has  been  a technique  of  improved  convergence 
and  stability,  but  no  direct  (timing)  performance 
comparisons  can  be  made  with  pre-SPP  results.  Com- 
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meats  and  conclusions  pertinent  to  each  imple- 
mented algorithm  are  given  in  the  following 
paragraphs. 


Statistics 

Statistical  processing  ordinarily  occurs  fairly 
rapidly  in  the  LACIE  system  and  was  included  in  the 
SPP  development  for  consistency  with  the  notion 
that  all  pattern-recognition  processors  of  a pixel-de- 
pendent type  would  be  SPP-resident.  Also,  the 
STATS  routine  is  invoked  in  the  body  of  ITCLUS; 
SPP  implementation  reduced  organizational  com- 
plexities. LACIE  characteristics,  however,  include 
occasional  and  numerous  small  (<20  pixel)  fields  on 
which  processing  must  be  performed;  SPP  perform- 
ance is  severely  compromised  by  system  overhead 
on  such  jobs.  Occasionally,  SPP  STATS  is  slightly 
slower  even  than  the  360'75STATS,  but  the  SPP  rate 
has  never  been  less  than  90  percent  of  360  rates  (on 
tasks  of  4 to  5 seconds).  On  larger  fields  and  on  large 
channel  set  jobs,  the  SPP  performance  advantage 
reaches  about  3 to  1;  but,  because  the  process  rarely 
requires  more  than  20  seconds  on  the  360  in  the  most 
complex  LACIE  cases,  360/75  execution  would  not 
be  deleterious  to  the  system. 


Clustering 

An  adaptive/iterative  clustering  exercise  was 
defined  for  a benchmark  as  follows:  500  x 200  (10s) 
vectors,  16  channels,  to  be  distinguished  into  10 
clusters  in  an  artificial  data  set;  results:  non-SPP  re- 
quired 35.1  minutes,  SPP  required  37  seconds,  a per- 
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FIGURE  4.— Iterative  clustering  timings. 


formance  gain  of  57  to  1.  Figure  4 shows  typical 
LACIF  results  for  22932  vectors,  under  various 
channel  set  sizes  and  with  implicitly  discriminated 
clusters.  Performance  gains,  less  than  for  the 
benchmark,  reflect  system  overhead  penalties  for 
smaller  data  sets  but  demonstrate  the  I/O  constraints 
driving  the  SPP  on  complex  applications  and  signifi- 
cant performance  improvement  (as  great  as  15  to  1) 
normally  experienced. 


ClMilfteatlon 

A classification  benchmark  was  defined  as 
follows:  MAXLIK,  4 channels,  10  classes,  2340  x 
3240  vectors  (7.58  million  pixels);  results:  pre-SPP — 
105  minutes,  SPP— 8.15  minutes,  a performance  gain 
of  13  to  1.  Figure  5 shows  MIXDEN  results  on 
LACIE  images  of  22  932  vectors  under  various  chan- 
nel set  sizes  and  defined  signatures.  As  in  clustering, 
system  overhead  diminishes  performance  factors  on 
smaller  segments  of  data,  although  the  trends  are 
clearly  I/O  driven.  MAXLIK,  essentially  identical  to 
MIXDEN  organizationally,  produces  timings  ap- 
proximately 20  percent  less  for  both  SPP  and  non- 
SPP. 


FIGURE  5. — LACIE  classification  timings  with  SPP  (mixture 
density). 
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CONCLUSIONS 

The  SPP  has  satisfied  and  exceeded  performance 
specifications  originally  defined.  The  system  per- 
formance can  be  significantly  improved,  when 
necessary,  by  modifications  in  the  host  data  retrieval 
technology  without  impact  to  the  SPP  software  or  ad- 
dition of  arrays.  Within  the  LACIE  context,  the 
most  tangible  improvements  have  been  in  processes 
(clustering,  classification)  that  were  previously 
prohibitively  expensive  users  of  host  resources. 
Because  of  host  I/O  con  Taints,  the  statistics  func- 
tion on  the  SPP,  as  antic  ated,  offered  relatively  lit- 
tle improvement  except  in  exotic  test  cases  involving 
large  data  sets. 

Table  III  is  a summary  of  the  performance  gains 
associated  with  the  SPP,  both  as  a function  of  SPP 
execution  only  and  in  terms  of  total  system 
throughput,  for  representative  LACIE  jobs.  It  is  to 
be  recalled  that  some  LACIE  jobs  are  neither  suita- 
ble for  nor  implemented  on  the  SPP  but  are  still  con- 
siderable users  of  system  resources.  Such  tasks  ob- 
viously should  be  examined  in  detail  for  perform- 
ance improvements  in  production  environments. 

As  anticipated  before  the  SPP  procurement,  addi- 
tional requirements,  both  modifying  existing 
algorithms  and  proposing  entirely  new  analytic  tech- 
niques, were  developed  for  LACIE  support.  Because 
of  serial  device  limitations,  such  schemata  pre- 
viously have  been  useful  only  on  limited  amounts  of 
data? 

In  summary,  the  LACIE  environment,  including 
high  throughput  requirements  in  a quasi-production 
system  and  acquirements  flux  in  a technologically 
and  theoretically  developing  discipline,  demon- 
strated the  cost-effectiveness  and  utility  of  a pro- 
gramable  SPP.  It  has  been  shown  that  the  image 


Table  III. — LACIE  Throughput  Improvement s° 
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processing  tasks  conventionally  considered  un- 
manageable (in  quantity)  are  tractable  with  such 
devices  as  described  here  or  available  from  other 
sources.  It  is  believed  that  the  primary  foci  of  atten- 
tion in  image  processing  in  the  long  term  should  not 
be,  in  general,  the  computational  tasks,  but  rather 
data  management,  storage,  and  traffic  control  for 
large  numbers  of  large  data  sets. 
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ABSTRACT 

The  question  of  computer  system  selection  cri- 
teria  is  growing  more  complex  as  the  cost  of 
centralized  systems  decreases  and  the  performance 
of  distributed  systems  increases.  In  many  cases,  the 
discussions  become  emotional  and  evaluations  are 
made  on  criteria  which  do  not  address  the  technical 
merits  of  a system  solution  to  a specific  problem. 

Identifying  the  criteria  involved  in  the  selection 
process  is  not  difficult.  The  complexity  arises  in  ob- 
jectively evaluating  various  candidate  configurations 
against  these  criteria  based  on  the  user’s  specific  re- 
quirements. This  paper  describes  a process  (model) 
which  can  be  used  to  formalize  the  selection  process. 
The  process  consists  of  two  major  steps. 

1.  Verification  that  the  candidate  configuration  is 
adequate  to  meet  the  user's  processing  requirements 

2.  Determination  of  an  overall  system  evaluation 
rating  based  on  cost,  usability,  adaptability, 
availability,  etc. 

An  Earth  resources  data  system  of  the  future  has 
been  used  as  an  example  in  the  application  of  the 
weighting  factors  to  a set  of  user  requirements,  and 
the  LACIE/Earth  Resources  Interactive  Processing 
System  (ERIPS)  package  at  the  NASA  Johnson 
Space  Center  (JSC)  provides  an  example  of  the  ap- 
plied procedure.  This  approach  does  not  eliminate  all 
judgment  from  the  procedure  and  therefore  is  still 
subject  to  some  discussion.  It  does  force  the  discus- 
sion away  from  emotional  arguments  into  an  orderly 
set  of  decisions  which  provides  a specific  solution  to 
the  problem  being  addressed. 
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INTRODUCTION 


The  Problem  Statement 

Discussion  of  computer  system  selection  criteria 
is  complex  today  and  is  growing  more  complex  as 
technological  advances  affect  price  and  performance 
of  computing  systems.  The  problem  is  further  com- 
plicated by  the  fact  that  there  are  few  system  design- 
ers or  analysts  who  can  speak  as  experts  on  both 
centralized  and  distributed  system  architectures.  Dis- 
cussions on  selection  criteria  tend  to  be  composed  of 
some  fact  and  of  some  emotion  and  are  of  limited 
value.  The  burden  of  selecting  a computer  /stem 
falls  to  the  buyer.  His  ability  to  judge  the  technical 
merits  of  a particular  hardware  architecture  for  an 
application  will  be  tested  by  many  discussions  which 
require  the  use  of  every  available  tool  to  establish  the 
relevance  of  a solution  to  the  problem  being  ad- 
dressed. In  today’s  bag  of  tools,  high  value  is  placed 
on  historical  precedence  and  on  rules  of  thumb 
derived  before  the  advent  of  current  technology.  Ex- 
trapolating partly  applicable  models  to  produce 
evaluation  data  can  magnify  the  problem  by  compar- 
ing unlike  quantities  as  though  they  were  alike  in- 
stead of  realizing  that  the  real  data  point  from  this  ac- 
tivity is  qualitative. 

Identifying  the  parameters  involved  in  the  selec- 
tion process  is  not  difficult  since  most  system  design- 
ers will  agree  on  what  they  are.  The  complexity 
arises  in  developing  the  model  that  will  generate  the 
weighting  factors  needed  to  evaluate  candidate  con- 
figurations for  solving  a particular  problem.  Since  the 
basic  question  will  become  more  and  more  common, 
a model  which  can  be  used  in  the  selection  process 
has  been  developed. 
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The  parameters  involved  in  the  evaluation  proc- 
ess and  the  impact  of  the  technical  approach  selected 
on  their  relative  importance  to  the  solution  will  be 
reviewed.  The  decisionmaking  process  is  structured 
into  an  orderly  review  which  provides  a specific  solu- 
tion to  the  problem  being  addressed  and  reduces  the 
emotional  aspects  of  this  activity. 

System  Selection  Process  Definition 

The  system  selection  process  is  a two-step  activity 
in  this  approach  to  the  evaluation  of  candidate  con- 
figurations for  the  solution  of  a stated  problem  (fig. 
1).  Step  I is  the  testing  of  the  candidate  configuration 
for  technical  adequacy.  This  step  involves  significant 
requirements  analysis  for  effectiveness.  The  problem 


FIGURE  I.— System  selection  process  flow  diagram,  (a)  Step  I. 
(b>  Step  II. 


to  be  solved  must  be  divided  into  functional  ele- 
ments which  are  described  in  terms  of  data  in  and 
out.  instructions  executed,  and  overall  size.  The 
functions  are  mapped  into  processes  and  the  prob- 
ability of  a function  executing  in  a process  is  ad- 
dressed. The  processes  are  then  incorporated  into 
workloads  which  reflect  user  activities,  and  workload 
baselines  are  established. 

Candidate  configurations  are  then  defined  and 
system  loading  for  the  various  workload  baselines  is 
evaluated  against  them.  Acceptable  configurations 
are  retained,  and  others  may  be  redefined  and  evalu- 
ated again.  Only  candidate  configurations  which  sup- 
port system  definitions  will  proceed  beyond  this  step 
in  the  process. 

The  establishment  of  the  technical  adequacy  of  a 
candidate  configuration  in  this  manner  provides  a 
system-level  solution  set,  the  elements  of  which  can 
be  compared  using  other  selection  criteria.  Con- 
figurations which  do  not  meet  the  performance  re- 
quirements or  the  system  definition  are  eliminated 
from  further  consideration. 

Step  II  of  the  system  selection  process  is  the 
derivation  of  an  evaluation  rating  for  each  tech- 
nically adequate  candidate  configuration.  The  most 
important  element  of  this  step  is  the  definition  of  a 
set  of  well-described,  detailed  selection  criteria  which 
can  be  viewed  objectively  and  which  specifically  ad- 
dress the  user  requirements  to  be  supported. 

The  user  of  a system  who  has  a requirement  for 
real-time  processing  is  concerned  with  a different  set 
of  capabilities  from  that  of  the  interactive  researcher 
or  the  time-sharing  inventory  control  administrator. 
Regardless  of  the  user,  the  following  generic  criteria 
set  will  support  the  derivation  of  a user-specific 
system  selection  criteria  set. 

1.  Adequacy — Does  this  configuration  fulfill  the 
requirements? 

2.  Cost — Over  the  life  of  the  system,  is  the  cost 
reasonable? 

3.  Adaptability — Is  the  system  capable  of  growth 
to  new  applications? 

4.  Availability — Is  the  system  maintainable  and 
are  partial  support  options  possible? 

5.  Transportability — Is  the  system  composed  of 
standard  hardware  and  software  which  is  generally 
available? 

6.  Usability — Can  the  users,  developers,  and 
operators  perform  their  activities  on  the  system  with 
ease? 

The  user-specific  system  selection  criteria  set  is 
categorized  to  assist  in  the  weighting  process.  Each 
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category  is  assigned  a weight,  and  subfunctions  with* 
in  each  category  are  also  assigned  weighting  factors. 

Numerical  evaluation  ratings  are  applied  to  the 
various  elements  of  the  selection  criteria  for  the  ac- 
ceptable candidate  configurations.  Weighted  evalua- 
tion ratings  are  summed  for  a category,  and  the 
results  are  weighted  appropriately.  The  resultant  rat- 
ings are  then  summed  to  produce  an  overall  evalua- 
tion rating  for  a configuration  which  can  be  evalu- 
ated against  ratings  of  other  candidate  configura- 
tions. Then,  based  on  system  availability  on  a 
schedule  consistent  with  project  needs,  a selection 
can  be  made. 

A summary  of  the  methodology  is  presented  in 
the  following  section.  Detailed  descriptions  of  Steps  I 
and  II  are  contained  in  appendix  A and  appendix  B, 
respectively.  A comprehensive  example  demonstrat- 
ing the  configuration  adequacy  step  is  contained  in 
appendix  C.  This  example  is  an  analysis  of  the 
LAC1E/ERIPS  currently  operational  at  JSC. 


THE  EVALUATION  PROCE8S 

When  attempting  to  select  a data  processing 
system  to  address  a particular  problem,  the  buyer 
quickly  discovers  that  the  process  is  very  compli- 
cated. Many  factors  must  be  analyzed  and  evaluated 
to  make  an  intelligent  decision.  Examples  are  the 
following. 

1.  Architecture— There  are  usually  several 
architectural  approaches  available  to  solve  a prob- 
lem. A number  of  questions  must  be  answered. 
Should  the  system  processing  be  performed  by  a 
centralized  computer  only  or  should  some  processing 
be  distributed  throughout  the  system?  Should  the 
system  support  on-line  and  batch  processing  concur- 
rently? Should  remote  terminals  be  supported?  If  so, 
what  kind  of  communication  capabilities  are  re- 
quired? How  much  intelligence  should  be  distributed 
to  communications  processors?  Should  the  data  base 
be  centralized  or  distributed?  What  kind  of  data  base 
management  capabilities  are  required? 

2.  Hardware  manufacturers — Th*  *e  are  many 
data  processing  equipment  manufacturers,  each 
offering  a wide  range  of  hardware  capabilities  and 
options.  The  buyer  must  decide  whether  his  job  can 
be  done  with  the  proposed  hardware.  For  example, 
does  each  processor  have  the  necessary  processing 
speed?  Are  the  main  storage  and  the  auxiliary 
storage  of  adequate  size  and  speed?  Are  the  on-line 
storage  devices  of  adequate  size  and  speed?  Can  the 


job  be  handled  with  standard  interface  devices?  If 
not,  what  are  the  requirements  for  any  special-pur- 
pose devices? 

3.  System  software— The  vendor-supplied  soft- 
ware, such  as  operating  systems,  utilities,  data  base 
managers,  and  communications  controllers,  varies 
widely  in  functions  and  capabilities.  Some  vendors 
offer  limited  or  no  software  support  for  their  hard- 
ware. The  buyer  must  determine  whether  the  ap- 
propriate (Unctions  are  provided  and  the  extent  to 
which  they  meet  his  requirements. 

4.  Application  software— There  are  many  ap- 
plication software  packages  available  for  sale  or  lease 
which  may  be  suitable  for  performing  part  or  all  of 
the  buyer's  data  processing.  Each  package  must  be 
reviewed  to  determine  the  functions  available,  the 
extent  to  which  they  meet  the  requirements, 
performance  and  usability  characteristics,  etc. 

5.  Intangible  attributes— There  are  many  intangi- 
ble attributes  of  a system  that  must  be  evaluated;  for 
example,  maintainability,  reliability,  availability, 
flexibility,  operability,  usability,  and  ease  of  tech- 
nology transfer. 

6.  Cost— There  is  a wide  range  of  costs  for  both 
hardware  and  software  products.  The  buyer  must 
evaluate  cost  against  the  functions  provided,  the 
functions  required,  and  the  functions  potentially 
desirable  in  the  future.  He  must  also  evaluate  cost 
against  the  intangible  attributes  mentioned  pre- 
viously. 

To  further  complicate  the  process,  evaluation  of 
various  capabilities  frequently  produces  an  answer 
which  is  not  clearly  black  or  white  but  rather  a shade 
of  gray.  Evaluations  and  opinions  of  several  experts 
in  different  areas  are  usually  required.  Even  after  a 
thorough  evaluation  is  completed,  it  is  often  difficult 
to  summarize  conclusions  and  evaluation  ratings  in  a 
manner  that  can  be  easily  understood  by  the  deci- 
sionmaking levels  of  management. 

It  is  obvious  that  there  is  a need  for  a systematic 
and  simplified  approach  to  the  problem  of  computer 
system  selection.  This  paper  describes  a technique 
for  formalizing  the  process  by  creating  a step-by-step 
sequence  of  activities.  The  reader  should  not  mistake 
this  technique  (model)  for  an  “automatic  selection 
program.”  Human  judgment  is  still  required.  The 
model  simply  requires  that  judgment  be  made  in  an 
orderly  fashion  and  reduces  evaluation  of  various 
elements  to  a common  unit  of  measure.  The  end 
result  is  a single  numerical  evaluation  rating  for  the 
system. 

The  model  actually  consists  of  two  separate  but 
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related  models,  each  having  a different  objective. 
The  models  are  referred  to  as  Step  1 and  Step  II  and 
are  defined  as  follows. 

1.  Step  I:  Configuration  Adequacy— This  model 
predicts  system  performance  at  a fairly  gross  level. 
Based  on  a predefined  workload,  loading  on  each 
system  component  is  calculated.  Configurations  in* 
capable  of  handling  the  workload  are  either  elimi- 
nated from  further  consideration  or  modified  until 
they  are  adequate.  All  candidate  configurations  pass- 
ing this  test  are  then  carried  to  Step  II. 

2.  Step  II:  Evaluation  Rating— This  model  calcu- 
lates an  evaluation  rating  for  the  system  based  on  the 
selection  criteria  defined  by  the  user.  The  selection 
criteria  are  carefully  structured  in  a hierarchical 
fashion  and  a relative  importance  (wei<  lit)  assigned 
to  each.  The  user  then  evaluates  each  item  in  the 
lowest  level  of  the  criteria  hierarchy  on  a scale  from  1 
to  10,  thus  converting  all  evaluations  to  a common 
unit  of  measurement.  The  model  then  calculates  the 
weighted  evaluation  rating  for  each  element  of  the 
selection  criteria  and  sums  them  in  a hierarchical 
manner.  The  final  result  is  a single  numerical  evalua- 
tion rating. 

Each  of  these  steps  is  discussed  in  more  detail  in 
the  following  sections. 

The  reader  should  understand  that  the  models  dis- 
cussed here  are  neither  discrete  simulation  models 
nor  analytical  models.  Thus,  they  are  not  intended 
for  detailed  system  design  evaluation  and  perform- 
ance prediction.  Instead,  they  are  at  a higher  level,  in- 
tended to  help  the  user  answer  those  questions  he  en- 
counters when  trying  to  select  a configuration.  The 
models  should  help  eliminate  those  configurations 
that  cannot  handle  the  job  and  quantitatively  evalu- 
ate the  relative  merits  of  those  that  can. 


Stop  1 : Configuration  Adequacy 

The  first  step  in  evaluating  a configuration  is  to 
determine  whether  it  is  adequate  to  handle  the  proc- 
essing kud.  The  objective  is  to  eliminate  from  con- 
sideration those  configurations  that  do  not  meet  per- 
formance requirements.  Each  component  of  the 
system  will  be  analyzed  to  determine  the  adequacy  of 
such  parameters  as  processor  speed,  memory  size, 
data  transfer  rates,  and  on-line  storage  capacity. 
There  are  three  major  activities  in  this  step. 

1.  Define  workload — Determine  the  amount  of 
processing  that  must  be  performed  during  a specific 
time  period. 


2.  Define  candidate  configurations— Diagram  the 
configuration,  identifying  each  hardware/software 
component  and  the  characteristics  of  each. 

3.  Calculate  system  loading— Compute  consump- 
tion of  system  resources,  elapsed  times,  and  compo- 
nent utilizations  and  then  analyze  total  system  per- 
formance. 

Each  of  these  activities  is  described  in  greater 
detail  in  the  following  sections.  To  understand  the 
text,  however,  it  is  important  that  the  reader  first  un- 
derstand the  definition  of  terms  used  throughout  the 
discussion.  Table  I contains  a list  of  terms,  the  defini- 
tion of  each,  and  examples.  A step-by-step  detailed 
description  of  all  three  major  activities  is  contained 
in  appendix  A. 

Workload  definition.— The  first  activity  in  deter- 
mining configuration  adequacy  is  to  define  the 
system  workload  for  typical  time  periods  to  be 
analyzed.  This  workload  will  act  as  the  yardstick 
against  which  each  candidate  configuration  is 
measured.  This  activity  is  performed  only  once  and 
the  results  saved  for  later  use  when  system  loading  is 


Table  I.— Definition  of  Terms 


Term 

Definition 

Examples 

Resource 

An  element  of  the  com- 
puter system  re- 
quired to  perform  a 
function 

Central  processing 
unit,  direct  access 
storage  device, 
memory 

Resource  usage 
variables 

Parameters  on  which 
the  amount  of 
resource  consumed  is 
dependent 

Number  of  fields, 
number  of  classes 

Function 

A logical  user  action  for 
which  resource  usage 
can  be  defined  in  a 
manner  independent 
of  the  configuration 

Clustering,  classi- 
fication, dot  sum- 
mary 

Process 

A typical  sequence  of 
functions  that  repre- 
sents the  activity  of  a 
particular  user 

Production  analyst, 
researcher,  soft- 
ware developer 

Benchmark 

A combination  of  proc- 
esses (users)  that  is  a 
typical  representation 
of  the  total  system 
workload  during  a 
specific  period 

2 production 
analysts.  1 
researcher,  3 soft- 
ware developers, 
etc. 
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calculated.  The  workload  is  defined  in  terms  of 
system  resources  consumed.  This  definition  process 
is  performed  in  a hierarchy  of  three  levels:  functions, 
processes,  and  benchmarks.  A flow  diagram  depic- 
ting  these  actions  is  shown  in  figure  2. 

Function  definition:  The  configuration  evaluator 
first  looks  at  all  work  to  be  performed  by  the  system 
and  identifies  each  logical  user  action,  called  a func- 
tion. The  amount  of  system  resources  used  by  each 
function  must  then  be  defined  in  terms  which  arc  in- 
dependent of  a particular  configuration.  For  exam- 
ple, the  processor  execution  time  is  defined  as  the 
number  of  machine  instructions  that  must  be  ex- 
ecuted to  perform  that  function  rather  than  a fixed 
process  time.  The  model  converts  this  value  to  proc- 
ess time  based  on  the  instruction  execution  rate  of 
the  processor  being  analyzed.  Similarly,  the  amount 
of  input  data  and  output  data  is  expressed  in  total 
bytes  rather  than  in  some  configuration-dependent 
measurement  such  as  number  of  records  or  number 
of  tracks. 

In  most  cases,  the  amount  of  system  resources 
consumed  by  a function  is  not  a constant  but  rather  a 
variable  dependent  on  certain  input  parameters.  For 
example,  the  amount  of  processing  performed  for 
the  “classification"  function  may  depend  on  the 


number  of  fields,  subclasses,  and  channels  (fig.  3). 
The  amount  of  main  storage  necessary  to  execute 
this  function  may  depend  on  the  number  of  fields. 
Therefore,  resource  usage  for  these  functions  must 
be  defined  by  resource  usage  variables  like  those 
shown  in  the  example  in  figure  3. 

Process  definition:  The  next  level  in  the  hierarchy 
is  the  “process."  The  process  is  a typical  sequence  of 
functions  which  represents  the  activities  of  a particu- 
lar user  session.  The  objective  is  to  identify  a set  of 
“typical"  users  which  can  be  used  to  construct  the 
total  workload.  The  system  evaluator  must  identify 
all  potential  users  of  the  system  and  the  charac- 
teristics of  each.  For  this  discussion,  a “user"  may  be 
thought  of  as  a work  session  by  an  individual.  If  the 
same  person  uses  the  system  in  a significantly 
different  manner  at  various  times,  then  each  session 
might  be  represented  by  a different  process  defini- 
tion. For  example,  one  user  may  be  a research 
analyst  evaluating  various  classification  techniques. 
One  work  session  may  be  a batch  job  run  consisting 
of  a compile-load-and-go  sequence.  At  another  time, 
the  same  person  may  perform  some  actual  classifica- 
tion exercises  in  an  interactive  mode  at  a display. 
These  two  sessions  would  be  represented  by 
different  process  descriptions. 


FUNCTION  RESOURCE  DEFINITION  VARIABLES 

CLASSIFICATION  CPU:  608  000  * (NO  FIELDS!  * <4S  000 

♦ (NO.  SUBCLASSES)  x <42  600 

♦ (NO  CHANNELS!  * 2600)  > 

MEMORY:  16  700  ♦ 80  x (NO.  FIELDS) 

I/O:  46  260  ♦ 22  932  x (NO  CHANNELS) 

♦ 240  x (NO  SUBCLASSES) 

♦ (NO.  FIELDS)  x (20  x (NO  SUB 
CLASSES) ) 

PROCESS  1 - CLUSTERING  ALGORITHM  RESEARCH 


2P,  ♦ 8P3  ♦ 21P4  ‘ 8P8  ♦ . . . 


BENCHMARK  B (AVERAGE  LOAD) 
P,*3P2*6P306PB*  .. 


FIGURE  2.— Workload  definition  flow  dlatram. 


FIGURE  3.— Workload  definition  example. 
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After  identifying  the  processes,  the  evaluator 
must  determine  the  characteristics  of  each.  The 
following  information  must  be  specified. 

1.  Functions— Identify  the  functions  executed  by 
this  process  and  the  order  in  which  they  are  ex- 
ecuted. This  can  be  done  by  a functional  flow 
diagram  as  shown  in  figure  3. 

2.  Probabilities— Since  a work  session  does  not  al- 
ways consist  of  a fixed  sequence  of  functions,  the 
evaluator  must  determine  the  probability  of  moving 
from  one  function  to  the  next.  This  information  is 
used  by  the  model  to  determine  the  number  of  times 
each  function  is  executed. 

3.  Parameters— The  value  of  each  parameter  in 
the  resource  usage  variables  must  be  set.  For  exam- 
ple, the  number  of  fields,  subclasses,  'Panels,  etc., 
must  be  determined  for  this  user  session. 

Benchmark  definition:  The  highest  level  in  the 
hierarchy  of  workload  definition  is  the  total  system 
workload  for  a specific  time  period.  This  is  called  a 
benchmark.  In  the  model,  it  is  represented  as  a com- 
bination of  processes  (users).  The  evaluator  must 
first  determine  which  time  periods  he  wishes  to 
analyze.  He  would  normally  select  a period  that 
represents  the  average  workload  and  another  that 
represents  the  peak  workload.  If  so,  he  will  define  a 
separate  benchmark  for  each  period. 

Since  all  the  detailed  information  is  defined  at  the 
function  and  process  levels,  the  benchmark  defini- 
tion is  simply  a specification  of  the  number  of  times 
each  process  is  executed.  It  can  be  specified  as  a 
mathematical  expression  as  illustrated  in  figure  3. 

Candidate  system  definition,— The  second  activity 
in  determining  configuration  adequacy  is  the  defini- 
tion of  candidate  configurations.  It  consists  of 
diagraming  the  configuration,  identifying  the  hard- 
ware components,  determining  hardware  charac- 
teristics, assigning  maximum  utilization  values,  and 
assigning  functions  to  the  appropriate  hardware  com- 
ponents. Each  of  these  actions  is  depicted  in 
figure  4. 

The  candidate  configurations  to  be  evaluated 
would  normally  consist  of  the  solutions  proposed  by 
the  various  responding  vendors.  The  buyer  may, 
however,  choose  to  determine  the  system  architec- 
ture himself  and  specify  the  desired  configuration  in 
the  request  for  proposal.  In  this  case,  each  vendor 
would  merely  specify  the  equipment  to  use  for  each 
system  component.  This  approach  has  several  disad- 
vantages of  which  the  buyer  should  be  aware. 


FIGURE  4,— Confifu.-atlon  definition  flow  dial  ram. 


1.  It  discourages  individual  vendors  from  seeking 
the  best  architecture  to  address  the  problem. 

2.  It  may  not  allow  vendors  to  propose  the  hard- 
ware/software systems  containing  recent  advances  in 
the  state  of  the  art. 

3.  It  usually  results  in  more  expensive  configura- 
tions submitted  by  vendors. 

Configuration  diagram:  The  first  action  is  to 
create  a detailed  diagram  of  the  configuration  show- 
ing each  system  component  and  its  interface  with 
other  components.  Each  component  must  be  iden- 
tified by  its  name,  model  number,  etc.  For  each  proc- 
essor, the  main  storage  size  and  secondary  storage 
size,  if  applicable,  must  be  indicated.  The  diagram 
must  also  identify  the  number  and  type  of  each  data 
channel,  input/output  (I/O)  device,  terminal,  etc. 
Any  special-purpose  equipment  and/or  interface 
should  be  clearly  identified. 

Hardware  characteristics:  The  characteristics  of 
each  hardware  component  identified  previously 
must  now  be  determined.  The  type  of  information 
needed  includes  the  following. 

1 . Instruction  execution  rate  for  each  processor 

2.  Memory  speed 

3.  Data  channel  transfer  rates 
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4.  Disk  characteristics  such  as  rotation  time, 
average  seek  time,  data  transfer  rate 

5.  Display  characteristics  such  as  screen  size,  data 
transfer  rate 

These  data  are  used  by  the  model  in  calculating 
elapsed  execution  times  and  component  utilizations. 

Maximum  utilization:  To  avoid  overloading  any 
hardware  component,  the  user  may  specify  a max* 
imum  allowable  utilization  percentage  for  each  com* 
ponent.  The  model  will  not  allow  that  component 
utilization  to  exceed  the  input  value.  The  model  first 
computes  elapsed  execution  times  and  component 
utilizations  for  the  specified  workload  without  regard 
to  this  maximum  value.  A check  is  then  made  to 
determine  whether  the  calculated  value  exceeds  the 
input  maximum.  If  so,  the  elapsed  times  are  adjusted 
to  bring  the  component  utilization  down  to  this  max* 
imum  value. 

Function  assignment:  By  analyzing  the  configura- 
tion and  the  functions  to  be  performed,  the  evaluator 
must  now  assign  each  function  to  the  appropriate 
hardware  component  This  is  a trial  and  error  process 
and  represents  only  the  initial  best  estimate  of  func- 
tion assignment.  If  the  configuration  is  shown  to  be 
inadequate,  some  functions  may  have  to  be 
reassigned. 

System  loading  calculation. — The  last  activity  in 
determining  configuration  adequacy  is  the  calcula- 
tion of  system  loading  and  analysis  of  the  results. 
This  procedure  includes  the  calculation  of  resource 
consumption  for  each  process,  calculation  of  total 
resource  utilizations  and  elapsed  execution  times,  ad- 
justment of  elapsed  times  if  any  resources  are  over- 
used. and  analysis  of  total  system  performance. 
These  actions  are  shown  as  a flow  diagram  in  figure 
5.  Each  is  discussed  in  detail  in  appendix  A and  in 
summary  form  in  the  following  paragraphs. 

Resource  consumption  calculation:  The  objective 
of  resource  consumption  calculation  is  to  determine 
the  amount  of  each  system  resource  consumed  by 
the  benchmark  being  analyzed.  Resource  consump- 
tion is  first  calculated  by  function,  then  by  process, 
and  finally  for  the  total  benchmark;  the  value  of  each 
resource  usage  variable  is  calculated  using  the 
specified  input  parameters  for  each  process.  System 
overhead  is  estimated  and  added  to  the  totals.  The 
result  is  the  total  active  or  “busy"  time  of  each 
resource. 

Elapsed  time/utilization  calculation:  The  evalua- 
tor is  now  ready  to  determine  the  elapsed  execution 


time  and  resource  utilizations.  To  obtain  the  total 
elapsed  execution  time  for  the  entire  benchmark, 
each  function  must  first  be  evaluated  individually  to 
determine  stand-alone  execution  times.  This  pro- 
cedure requires  estimation  of  all  I/O  inactive  or 
“wait"  time  based  on  the  I/O  jevice  characteristics, 
data  base  design,  etc.  It  also  involves  consideration  of 
simultaneous  use  of  multiple  resources  that  may  be 
used  by  a function. 

The  elapsed  time  for  each  process  can  now  be  ob- 
tained by  simply  summing  the  elapsed  times  for  each 
function  in  that  process.  Similarly,  elapsed  execution 
time  for  the  total  benchmark  is  calculated  by  sum- 
ming the  times  for  the  appropriate  processes.  The 
percentage  utilization  can  now  be  calculated  for  each 
system  resource  by  dividing  the  total  busy  time  (for  a 
resource)  by  the  total  elapsed  time. 

Elapsed  time  adjustment:  If  any  resource  utiliza- 
tion exceeds  the  maximum  value  (percentage) 
specified  by  the  evaluator,  then  elapsed  times  for  all 
processes  using  the  overused  resources  must  be  in- 
creased until  each  resource  utilization  value  drops 
below  its  maximum.  A “stretchout  factor"  is  com- 
puted to  determine  the  amount  of  the  increase  for 
each  process.  This  calculation  is  described  in  detail  in 
step  31  of  appendix  A.  After  elapsed  times  have  been 
adjusted,  elapsed  times  for  the  benchmark  and 
resource  utilizations  must  be  recalculated. 

System  performance  analyses:  The  evaluator  can 
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now  make  a judgment  about  the  adequacy  of  the  can* 
didate  configuration  by  first  making  a reasonableness 
check  on  the  resource  utilizations.  Low  utilizations 
across  all  workloads  might  suggest  that  the  con* 
figuration  is  overpowered.  Maximum  utilizations  on 
a majority  of  resources  may  mean  the  hardware  is 
not  fast  enough. 

If  resource  utilizations  appear  reasonable,  the 
evaluator  can  compare  the  total  benchmark  elapsed 
execution  time  to  the  predetermined  time  required. 
Elapsed  times  of  each  process  and  function  should 
also  be  analyzed.  If  the  configuration  is  judged  to  be 
adequate  to  handle  the  workload,  it  is  saved  for 
further  analysis  in  Step  11.  If  not,  the  evaluator  has 
three  alternatives. 

1.  Augment  the  configuration  in  those  places 
where  resource  utilization  is  very  high. 

2.  Modify  assignment  of  functions  to  system 
components. 

3.  Eliminate  the  configuration  from  further  con- 
sideration. 


Stop  II:  Evaluation  Rating 

The  second  step  in  evaluating  a system  is  to  evalu* 
ate  it  against  a set  of  detailed  selection  criteria  to  ob* 
tain  an  overall  system  rating.  The  objective  of  this 
model  is  to  reduce  the  evaluations  of  all  the  various 
system  elements  to  a common  unit  of  measurement 
so  that  numerical  methods  can  be  used  to  combine 
them.  The  evaluation  is  based  on  a set  of  user- 
defined  selection  criteria  representing  the  require- 
ments for  this  system.  Evaluation  is  performed  on 
such  items  as  cost,  maintainability,  usability,  flex- 
ibility, and  operability.  Only  those  configurations 
judged  to  be  technically  adequate  in  Step  I are  evalu- 
ated in  Step  II.  The  entire  process  and  the  relation- 
ship of  Steps  I and  II  are  shown  in  figure  1. 

Three  major  activities  are  involved  in  this  model. 

1.  Develop  selection  criteria— Develop  a detailed 
set  of  criteria  against  which  each  candidate  con- 
figuration will  be  evaluated  and  determine  the  rela- 
tive importance  of  each  item. 

2.  Determine  numerical  ratings — Evaluate  the 
system  against  each  element  of  the  selection  criteria 
and  assign  numerical  ratings  for  each  element. 

3.  Calculate  evaluation  rating— Calculate  the 
weighted  evaluation  rating  for  each  level  in  the  selec- 
tion criteria  hierarchy  and  also  for  the  total  system. 


A detailed  description  of  these  activities  is  con- 
tained in  appendix  B.  The  following  sections  contain 
a summary-level  discussion  of  each. 

Selection  criteria  development.— The  first  activity 
in  Step  It  is  to  develop  a detailed  set  of  selection  cri- 
teria ‘•gainst  which  each  configuration  will  be  evalu- 
ated. This  activity  consists  of  three  actions. 

Develop  the  user-specific  selection  criteria  list. 

2.  Separate  criteria  into  categories. 

3.  Determine  the  relative  importance  of  each  item 
and  assign  a weighting  factor. 

This  process  is  depicted  in  fiow-chart  form  in 
figure  6. 

User-specific  selection  criteria:  By  analyzing 
system  requirements,  the  evaluator  must  develop  a 
detailed  set  of  selection  criteria  specific  to  his 
system.  These  criteria  are  developed  by  reviewing 
the  generic  set  of  criteria,  selecting  the  applicable 
groups,  and  then  expanding  them  to  the  appropriate 
detail.  The  detail  is  acquired  by  using  the  hierarchical 
decomposition  technique.  Each  major  area  is 
analyzed  individually  and  broken  down  into  its  ele- 
ments, each  of  which  is  in  turn  broken  down  into  its 
subelements.  This  process  is  repeated  until  each 
selection  category  is  decomposed  into  a set  of  subele- 
ments which  can  be  clearly  evaluated  for  each  can- 
didate configuration.  There  is  no  restriction  on  the 
number  of  levels  of  decomposition.  This  hierarchy 
forms  the  logical  levels  for  later  combination  of 
weighted  evaluation  ratings. 

Figure  7 is  an  example  for  an  Earth  resources  data 
system.  Shown  are  five  major  areas  of  selection  cri- 
teria and  several  elements  of  each.  Each  element 
must  be  analyzed  by  subelements  before  ratings  can 
be  generated.  Figure  8 is  an  example  of  criteria  rat- 
ings and  also  of  the  level  of  the  criteria  subelements 
that  might  be  evaluated. 

Criteria  categorization:  After  definition  of  his 
selection  criteria,  the  user  may  find  it  helpful  to 
categorize  them  in  a different  manner.  This  step  is 
optional  and  simply  reorders  the  criteria  set  in  a way 
which  better  relates  selection  criteria  to  system  re- 
quirements or  simplifies  the  weighting  process  de- 
scribed subsequently.  Figure  9 is  an  example  of  cri- 
teria categories  for  an  Earth  resources  data  system. 

Criteria  weighting:  The  user  is  now  ready  to  deter- 
mine the  relative  importance  of  each  criterion  by 
assigning  a weighting  factor  (a  percentage  of  the 
total)  to  each  category,  element,  subelement,  etc.,  in 
the  selection  criteria  set,  starting  at  the  highest  level 
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DEVELOP  SELECTION  CRITERIA 


DETERMINE  NUMERICAL  RATINGS 


CALCULATE  EVALUATION  RATING 


FIGURE  E**lwrtl#«  Mine  Mid  IUw  chart. 


in  the  hierarchy  and  working  downwa.*d.  The  weight- 
ing factors  ere  assigned  such  that  the  total  for  each 
level  in  the  hierarchy  is  100  percent.  That  is,  the  sum 
of  all  categories  (top  level)  must  be  100.  For  each 
category,  the  element  weights  total  100.  Within  each 
dement,  the  subelement  weights  total  100,  etc.  An 
example  of  criteria  weighting  is  shown  in  figure  10. 

Numerical  rating  determination. — The  second  ac- 
tivity in  evaluating  a configuration  is  to  determine  its 
relative  merit  or  value  for  each  criterion.  This  pro- 
cedure requires  analysis  and  judgment  by  the  evalua- 
tor. The  relative  value  of  each  item  must  be  deter- 
mined on  a scale  from  1 to  10.  These  data  are  then 
used  in  calculating  the  weighted  values  for  each  cri- 
terion. This  activity  requires  that  the  evaluator  (1) 
eliminate  any  configuration  that  does  not  contain  the 
mandatory  support  items  and  (2)  assign  a numerical 
rating  value  for  each  criterion.  Figure  6 shows  these 
actions  in  flow-chart  form. 


FOR  EACH  TECHNICALLY  ADEQUATE  CONFIG  EVALUATE; 

• COST 

• INITIAL  OEV  COSTS  (NONRECURRING) 

• OPS  COSTS  (RECURRING  - 10- YR  LIFE  CYCLEI 

• CONFIG  FLEXIBILITY 

a EXPERIMENTAL  FLEXIBILITY 

• NEW  TECHNIOUE/TECHNOLOQY  EVAL  FLEXIBILITY 

• CASE  OF  TECHNOLOGY  ACCESS 

• TECHNOLOGY  TRANSFER 

• RELEVANCE  TO  USER  REOS 

• TRANSITION  AND  OPS  COSTS 

• SUPPORT  TECH  TO  AIO  NEW  USER 

• CONFIGURATION  USABILITY 

• USER  PRODUCTIVITY 

• EVOLUTIONARY  DEV  CAPABILITY 

• OPS  ACCEPTABILITY 

• SCHEDULES 

• TIMELY  DEV  SUPPORT 

• PLANABVE  CONFIG  EXPANSION 

FIGURE  7.  Eaaaiplc  d nr  ipiclftc  ariMtlaa  criteria  far  aa 
Earth  matrices  Sate  system. 
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KIHHt, 

ftACINT 


AATfNO 


OtM  AAL  SUffOAT  CAfAOlLITIU 
• OMAATIONAl  MANAOtMLNT  AMU 
fiixioiurv 

A IKflAtMINTAL  fUXlMllTV 

- OUffOAt  mwvitvn  LANOUAOM 

- I AIT  ADDITION  Of  NtW  fAOO MM 

• TAAN4MA  fAOOAAMi/OATA  TO< 

f MM  AtMOTI  LOCATIONS 

- (AM  Oil  ATKM  Of  HI*  BATA  MTS 

- At  MOTS  SOLMCI  COOS  tOOIM 

- AIMOTt  fOO  INITIATION 

• SUffOAT  MULTI  MS  AS  SlMut 

TANCOUSLT 

• Tf  AMINAl  CONTROL  fAOOAAM 

• 


FIGURE  B.— EamurU  *f  otltcMM  crtt»rt»  rMip;. 


CAITC  AIA  CATtOOHT 

WtWMT, 

ftACINT 

1.  COSTUOYAUMCVCIII 

SO 

2.  INTIAACTIVI  SUffOAT  CAf ASIUTlIS 

20 

• OVCAAU  SVSTIM  AACHITICTUAI 

40 

• IXfANMSIUTV 

20 

• availabiutv/mawtainasilitv 

40 

1 Of  NIAAL  SUffOAT  CAf  ASIUTlIS 

SO 

• Off  A ATIONAL  MANAOI  MINT  ANO 

fLixmurv 

so 

- IXfANOAOILITV 

so 

• TAANSfOATASILITV  Of  TICMNOIOCY 

1 

20 

FIGURE  Eiimplt  at  wifctloo  triferi*  nliff-rlN  for  so 
Earth  ihwkm  data  ayattm. 


Configuration  elimination:  There  are  certain  func- 
tions and  support  items  in  every  data  processing 
system  which  are  mandatory.  Any  system  not  con- 
taining these  items  should  be  considered  nonrespon- 
sive.  The  objective  of  this  action  is  to  eliminate  from 
further  consideration  any  configuration  which  does 
not  contain  the  mandatory  items.  The  evaluator 
may.  of  course,  allow  the  vendor  an  opportunity  to 
supply  the  missing  capabilities. 

Rating  value  assignment:  The  evaluator  must  now 
judge  the  configuration's  merit  for  each  selection  cri- 
terion. The  rating  is  done  on  a scale  of  1 to  10,  where 
10  is  the  best  or  highest  rating.  Three  methods  of  ar- 
riving at  (he  value  are  described  in  appendix  B. 

I . Direct  method— To  be  used  if  a rating  can  be 
easily  assigned  with  no  ambiguity.  These  ratings  are 
generally,  but  not  necessarily,  linear. 


2.  Sample  variance  method— To  be  used  if  ratings 
cannot  be  directly  assigned  to  the  values  being  rated. 
Basically,  the  method  presumes  that  all  points  being 
rated  are  valid  points  but  probably  not  extreme 
points.  This  presumption  does  not  preclude  the 
assignment  of  a i or  a 10  rating  to  a point,  but  it 
makes  it  more  difficult. 

3.  No t-quan titled  method— To  be  used  if  the 
value  being  rated  is  subjective  rather  than  objective; 
that  is,  a direct  numerical  value  cannot  he  assigned. 
Typically,  no  more  than  three  ratings  are  used. 


Ffl/ur  Rating 

Above  avenft  * 

Avtra|t  S 

Below  2 


An  example  of  criteria  weighting  and  rating  is  con- 
tained in  figure  1 

Weighted  rating  calculation  .—The  final  activity  in 
~valuating  the  system  is  to  calculate  weighted  rating 
values  for  the  selection  criteria.  This  activity  uses  the 
criteria  weights  and  rating  values  determined  pre- 
viously. The  output  is  a weighted  rating  for  each  cri- 
terion and  each  hierarchical  level. 

At  this  point,  the  evaluation  is  a matter  of  simple 
multiplication  and  addition.  Starting  at  the  lowest 
(subelement)  level  in  the  hierarchy,  the  weighted  rat- 
ing is  calculated  for  each  criterion  by  multiplying  the 
weight  (converted  from  <i  percentage  to  a decimal) 
by  the  rating.  The  results  are  then  summed  to  the 
next  highest  level,  which  is  in  turn  multiplied  by  that 
weighting  factor  snd  accumulated  into  the  next  high- 
est level.  The  process  is  repeated  until  the  top  of  the 
hierarchy  is  reached  and  s single  system  rating  is  ob- 
tained. An  example  demonstrating  this  process  is 
shown  in  figure  10. 


SUMMARY 

The  complex  world  of  computer  systems  selection 
will  not  suddenly  become  simple.  Technological  ad- 
vances will  accelerate  the  trend  to  distribute  func- 
tions among  nodes  of  a widely  scattered  network.  As 
communication  devices  become  more  sophisticated, 
reliable,  and  generally  available,  the  intensity  of  argu- 
ments for  centralized  versus  distributed  solutions  to 
a problem  will  begin  to  fade. 

These  changes  in  environment  will  only  serve  to 
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CRITERIA 


WEIGHT 


RATING 


WEIGHTED  RATING 


1 CATEGORY  1 

70 

7.5 

A.  ELEMENT  IA 

40 

5.3 

1.  SUBELEMENT  A 

30 

8 

2.  SUBELEMENT  B 

40 

2 

3.  SUBELEMENT  C 

30 

7 

B.  ELEMENT  IB 

60 

9.0 

1.  SUBELEMENT  D 

50 

10 

2.  SUBELEMENT  E 

60 

B 

D.  CATEGORY  11 

30 

5.3 

A.  ELEMENT  HA 

25 

3.5 

1.  SUBELEMENT  F 

80 

4 

2.  SUBELEMENT  G 

10 

2 

3.  SUBELEMENT  H 

10 

1 

B.  ELEMENT  OB 

50 

5.8 

1.  SUBELEMENT  1 

40 

9 

2.  SUBELEMENT  J 

40 

3 

3.  SUBELEMENT  L 

20 

5 

C.  ELEMENT  HC 

25 

5.9 

1.  SUBELEMENT  M 

90 

6 

2.  SUBELEMENT  N 

10 

5 

TOTAL  SYSTEM 

100 

6.8 

FIGURE  10.— Example  of  weighted  rating  calculation. 


complicate  the  problem  of  the  decisionmaker  who  is  system  is  being  selected  to  support.  Parochial  con- 

evaluating  a set  of  proposed  solutions  to  his  com-  siderations  must  be  identifiable  and  capable  of  being 

puter  system  problem.  Since  the  number  of  available  evaluated,  lii-defined  criteria  will  be  eliminated  from 

options  is  growing  and  the  relative  importance  of  consideration. 

each  of  them  to  the  user  involved  is  different,  the  This  procedure  ensures  an  equitable  evaluation  of 
most  objective  approach  possible,  to  the  selection  technically  adequate  candidate  configurations  for  a 

process,  is  demanded.  computer  system  application.  It  lends  itself  to  use  by 

The  procedure  described  is  not  trivial  in  that  each  buyers  for  evaluation  and  vendors  for  system  deter- 

step  requires  significant  time  and  effort  to  ac-  mination.  The  system  loading  model  and  a model 

complish.  The  process  provides  a focus  on  the  ele-  which  supports  sensitivity  analysis  of  evaluation  cri- 

ments  of  the  selection  process  which  are  required  to  teria  weighting  are  relatively  simple  computer  imple- 

make  an  objective  evaluation;  in  particular,  the  mentations  in  a programing  language, 

system  loading  analysis  provides  a pass/fail  evalua-  The  need  for  a formalized  computer  selection  pro- 
tion  of  candidate  configurations  which  will  eliminate  cess  based  on  quantifiable  criteria  will  grow  as  the 

exposure  to  inadequate  system  solutions  and  their  at-  number  of  alternative  solutions  continues  to  grow, 

tendant  loss  of  flexibility  and  cost  overruns.  It  also  The  approach  presented  addresses  this  need.  It  does 

highlights  underuse  which  might  add  unwarranted  not  eliminate  alt  judgment  from  the  procedure  and 

cost.  The  evaluation  rating  requires  that  project-level  therefore  is  still  subject  to  some  discussion.  It  does 

decisionmakers  be  identified  and  involved  in  the  es-  force  the  discussion  away  from  emotional  arguments 

tablishment  of  the  weighting  algorithms  as  they  ap-  into  an  orderly  set  of  decisions  which  provides  a 

ply  to  the  user  community  which  the  computer  specific  solution  to  the  problem  being  addressed. 


c.  -r 
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Appendix  A 

Computer  Sizing  Evaluation  Process 


This  appendix  provides  a 36-step  instruction  set  to 
the  computer  sizing  evaluation  process.  It  includes 
sample  worksheets  and  the  definitions  required  to 
apply  it  to  a specific  problem.  This  process  provides 
the  te  chnical  adequacy  evaluation  of  a candidate  con- 
\'pi<  Jtion  described  as  Step  1 of  the  computer  system 
soection  process. 


PART  I— THE  PROCESS 

A.  Define  Computer  Functions  and  Interfaces 

Step  Description 

1 Describe  coded  functions  in  terms  of  data 

input  (data  base,  bytes),  instructions 
executed  (loop  sizes,  loop  control 
variables,  base  cost),  data  output 
(buffers,  data  bases,  bytes),  and  size 
(bytes,  variables  controlling  size). 

2 Represent  with  hierarchical  input/proc- 

essing/output diagram  and  identify  all 
variables  affecting  computer  resources. 


OATA  BASE  1 
(BYTES) 

INST-Cy*  £(£(£  + £)J 

BUFFER 

(BYTES) 

WHERE  £ REPRESENTS 

LOOPS  AND  C,  IS  THE  BASE 
COST  (INSTRUCTIONS) 

OATA  BASE  3 
(BYTES) 

SIZE  = ct  * Blx) 

DEVICE  (AS 

SPECIAL 

PURPOSE 

OATA  BASE  2 

WHERE  fl( jc) 
IS  VARIABLE 

PROCESSOR) 

(MATRIX) 

(BYTES) 

AND  C IS 
THE  BASE  SIZE 

3 Repeat  steps  1 and  2 for  all  functions  to 
be  included  in  the  following  processes. 


NOTE:  Step  1 should  include  application 
use  of  operating  system  services,  ac- 
cess methods,  and  data  managers.  Any 
operating  systems  or  control  program 
overhead  not  specifically  invoked  by 
an  application  function  can  be  ac- 
counted for  in  a later  step. 

B.  Define  Processes  and  Process  Resource  Usage 

Step  Description 

4 Establish  flow-chart  functional  relation- 

ships for  a process  (m)  (function  * 
nodes  in  flow). 

NOTE:  Refer  to  Part  III  for  assistance  in 
calculating  steps  S and  6. 

5 Add  probability  decisions  for  the  desired 

optional  paths  based  on  interactive 
analyst  decisions. 


NODE  PROCESS  m 
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where  node  n represents  function  it,  node 
n + 1 terminates  the  process  m,  and  p 
and  q are  probabilities  of  branching. 

6 Define  a transition  matrix  using  the  prob- 

abilities of  moving  from  any  present 
node  (/)  to  any  other  node  (/)  on  the 
very  next  step,  and  find  the  utilization 
of  all  nodes  using  the  theory  of 
Markov  chains  (example  in  Part  III)  or 
another  suitable  method. 

7 Repeat  the  steps  in  category  B for  alt  pro- 

cesses to  be  defined  in  category  C 
workloads. 

C.  Define  Workload  for  Computer  System 
Step  Description 

8 Identify  computer  users  (including  batch) 

that  would  simultaneously  be  active  on 
the  computer  system. 

9 Assign  process  descriptions  (from  Part  1- 

B)  that  might  best  describe  each  user’s 
activity  while  active  on  the  computer 
system.  This  step  assumes  that  a 
reasonable  test  for  a computer  con- 
figuration is  the  successful  execution 
of  several  benchmark  workloads. 

10  Specify  workload  parameters  that  dictate 

how  each  process  within  a workload 
will  exercise  the  functions  within  the 
processes;  for  example,  segments  of 
data  processed,  multispectral  bands  per 
segment. 

11  Repeat  the  preceding  steps  for  all  pro- 

cesses within  a workload  and  select 
(specify)  response  times  for  each  pro- 
cess. 

12  Repeat  the  preceding  steps  for  all 

benchmark  workloads  that  are  judged 
reasonable  tests  of  computer  con- 
figuration suitability. 

D.  Define  Candidate  Configurations 

Step  Description 

13  By  inspection  of  resources  used  by  pro- 

cedure and  by  judging  the  total 
workload  from  multiple  users  (includ- 


ing batch),  structure  a configuration 
that  appears  to  offer  the  capacity  re- 
quired for  the  benchmark  workloads 
(consider  bus  loading  for  distributed 
systems). 

14  Describe  each  hardware  component  and 

data  path  in  terms  of  the  parameters 
used  to  represent  the  functions  (i.e., 
bytes  transferred  per  second,  instruc- 
tions executed  per  second,  etc.). 

15  Assign  a maximum  utilization  threshold 

to  each  hardware  component  such  that 
there  will  be  spare  capacity  to  handle 
estimating  errors  and  peak  loading 
conditions.  (Application  + system 
overhead  + estimating  contingency  + 
spare  capacity  for  peak  loading  ■»  100- 
percent  utilization.) 

16  Assign  each  function  within  each  process 

to  appropriate  hardware  components 
(central  processing  unit  (CPU),  direct 
access  storage  device  (DASD),  etc.) 
and  assign  data  (low  to/from  all  func- 
tions to  a data  path.  (Use  the  work 
from  steps  in  categories  A and  B.) 

1 7 Repeat  this  process  for  all  reasonable  con- 

figuration alternatives. 

E.  Calculate  System  Loading 
Step  Description 

18  Select  a configuration  (from  steps  in  cat- 

egory D)  and  a specific  benchmark 
(from  steps  in  category  C).  For  exam- 
ple, select  a centralized  architecture 
and  a benchmark  representative  of  a 
development  and  test  environment. 

19  Compute  the  consumption  of  resources 

for  each  function  in  terms  of  bytes  of 
data  transferred  plus  number  of  in- 
put/output (I/O)  accesses  for  data 
bases,  CPU  instructions  executed,  etc. 
Category  C defines  the  workload 
parameters,  whereas  category  A 
defines  the  effect  those  parameters 
have  on  utilization  of  resources. 

NOTE:  Refer  to  tables  and  discussion  in 
Part  II  to  aid  with  the  following  steps. 
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20  Calculate  resource  busy  time  (save  in  ta- 

ble A-I)  required  to  support  a single  ex- 
ecution of  each  function  n of  process  m 
within  the  selected  workload.  (Use 
hardware  configuration  characteristics 
from  category  D.) 

21  Using  judgment,  measured  data,  or  other 

sources,  evaluate  each  function  (in- 
dividually) to  determine  stand-alone 
elapsed  execution  time.  This  step  in- 
volves consideration  of  simultaneous 
execution  of  multiple  resources  that 
may  be  used  by  the  function.  (Use  ta- 
ble A-I  as  an  aid.) 

22  Calculate  the  use  of  computer  resources 

by  each  function  n within  process  m by 
multiplying  the  individual  resource  use 
times  by  the  function  node  utilization 
factors  calculated  in  step  6,  categpry  B. 
Update  table  A-I. 


23  Sum  the  use  of  individual  computer 

resources  by  each  function  to  obtain 
resources  used  by  process  m (table 
A-U) 

24  Determine  stand-alone  process  elapsed 

time  by  summing  the  serial  elapsed 
time  component  from  each  function. 

25  Calculate  the  average  utilization  of  com- 

puter resources  for  the  process  m ex- 
ecuting in  a stand-alone  environment. 
Use  table  A-II  as  an  aid. 

26  Repeat  steps  19  through  25  for  each  pro- 

cess within  the  selected  workload. 

27  Calculate  the  instantaneous  use  (average 

utilization)  of  each  computer  resource 
in  support  of  the  selected  workload. 
Use  table  A-III  as  an  aid. 


Table  A't. — Function  Resource  Utilization  and  Elapsed  Time a 

( function  n of process  m] 


Resource  Stand-alone  single  function  Serial  resource 

busy  time  execution  elapsed  lime 

( diagonal ) ■ ' ■ 11  ■ ■ ■ ■ 1 ■ 


Contribution  to  process 


\ 


R / R2  R3 


Node  Adjusted  Adjusted  function 

utilisation  busy  time  elapsed  time 
(step  6) 


Ll 


*1 
*2 

*3  '3.1 


U„ 


'2J 


U, 


nm 


K 


r3  ~ '3,1  ~ '2.3  Ui 


nm 


A verage  function  elapsed  time 


0 » 


V« 


" nm 


"nm 


^nm  *fn  “ F nm 


“Where  Rf-  a hardware  resource  t used  by  function  n 

F,  - the  time  A,  it  in  use  for  a single  execution  of  fn 

1^  • units  or  time  resource  * time  Tf  overlapped  execution  with  another  resource  / or  j (where  j w«  ini  listed  prtor  to  1 1 
- utilization  of  function  n by  process  m (step  6) 

Sfl,  • stand-alone  elapsed  time  for  a single  execution  of  fn 
Tf  m resource  busy  time  for  Rf  for  fn  within  trocesa  m 
F^m  m *l*p*od  ume  component  of  fn  for  pneeas  m 
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Table  A-/1. — Process  Resource  Usage  and 
Elapsed  Time* 


IBesource  utilization  for  proem  m] 


Process  m 
functions 

A (Ousted  resource  busy  lime 
(table  4-1) 

Adjusted 

function 

*2 

-Kf 

elapsed 

time 

A 

*ii’ 

hi 

r,3- 

hr 

F\  m 

h 

*21' 

hi 

hi  •• 

hr 

hm 

h 

hi 

r33‘  • • 

hr 

Fim 

hi 

hi 

hi 

hr' 

hm 

Total 

n 

V' 

n 

V' 

n 

X7Y  .. 

1 J 

n 

■ r-' 

Utilization 

'ml 

'm2 

'm3 

^ mr 

‘Where  df*i  raxxitn  in  proms  m 
fn  • a function  in  process  m 
Tm  - ft,  busy  time  Tor  function  n within  process  m 
1 jy ' * total  busy  time  for  process  m 
Pm  • \ T,’/XFm  - utilisation  of  ft,.  or  the  probability  that  ft,  is  busy 
during  the  eMCuirm  of  process  m 
XFm  — stand-alone  elapsed  execution  time  for  process  m 


Table  A -HI.  —System  Resource  Usage a 


Processes 

Resources  used  by  processes 

m Wre 
workload 

*/ 

r2 

*r 

Process  1 

'll 

hi 

hi 

hr 

Process  2 

'll 

hi 

hi  ••• 

hr 

Process  3 

'll 

hi 

hi 

'3  r 

Process  m 

'ml 

'm2 

'm3  - 

'mr 

Total 

m 

m 

f** 

m 

m 

V' 

System 

overhead 

°i 

0, 3 

Or 

Threshold 

value 

h 

h 

h • 

h 

Stretchout 

factor 

h 

h 

s3 

Sr 

BWhere  process  m - processes  in  the  selected  workload 

ft,  * all  computer  resources  used  by  m processes 
Pm  m the  utilisation  or  probability  of  use  of  any  resource  r by  tny 
process  m 

X/*,  " the  insiantsneous  utilization  of  computer  resources  by  the 
workload 

V,  - the  "threshold  value"  or  the  maximum  permitted  instan- 
taneous utilisation  of  a resource 

5,  • the  stretchout  factor  for  resource  r tused  to  approximate  proc- 
ess response  time) 

0,  - the  fraction  of  utilisation  of  rsaourcc  r to  he  expected  (an  esti- 
mate) for  the  selected  workload  being  evaluated 


28  Estimate  (by  means  of  experience  with 

similar  systems  or  a calculation)  the 
system  overhead  use  of  computer 
resources  to  support  such  system  func- 
tions as  workload  and  configuration 
management.  Add  this  information 
plus  the  process  response  criteria  (step 
11,  category  C)  to  table  A-IIl. 

29  Compare  resource  utilizations  with  the 

maximum  permitted  utilization  of 
resources  by  application  processes. 
(Maximum  utilization  for  all  processes 
equals  threshold  utilization  less  system 
overhead  utilization,  or 


" m 

■ 

Lr,  * (t 

°r) 

1 

for  any  resource  rin  the  workload  of  m 
processes.) 


30  Calculate  a “stretchout  factor"  (S, ) for  all 

overused  resources  in  order  to  extend 
the  process  elapsed  times  of  all  pro* 
cesses  using  the  resource.  See  calcula- 
tion in  table  A-III. 

31  If  any  resources  are  overused,  approxi- 

mate the  average  elapsed  execution 
time  for  all  processes  in  the  selected 
workload  by  recalculating  (extending) 
each  resource  time  using  a stretchout 
factor  (5r)  for  the  involved  resources 
(ri. 

m 

S,‘K~o, 


S,  - 1 


iff;  pr  * o,  > v, 
\ 

p,  + o,<  vr 

t 
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Table  A -/K — Function  Elapsed  Time  for  Selected  Workload * 

[Function  n/br  process  m) 


Resource  In  a workload  single  junction  Serial  resource  elapsed  lime 

busy  time  (diagonal)  execution 


(from  tables) 


Contribution  to  process 


Node  Recalculated 

utilisation  /Unction  time 

(step  6) 


Unm  ^nm^flt  m Fnm 


Sm  table  A-l  for  definition  of  eymbota,  ty  proceeding  ekxtg  Uw  dledonel. calculate  the “eeriel  renounce  tlepeed  lime"  by  etretchint  oul  Uk  lime  fitful 
Ik*  iMm  *,  it  beep  tot a «in*lt  function  execution  by  iu "atretchoui  rector"  S,  and  lubtrectm*  injr  ponton  of  Hut  Unc  r(,  which  overlapped  ilw  execution 
at  odwt  reeourcce  name  to*  or  him  column!  lined  lo  the  Itf)  and  above  the  diagonal  at  point  where  (-  I to  rt  Scaleeaca  amount  of  overlapped  reaource 
lint  Ij,  f where  J it  lh«  mourn  ihu  initialed  execution  Aral)  by  aelecting  • itrechoui  factor  Sf  t where  S,  •»  Sj  if  » in  ih«  tame  row.  and  S,  - St  If  r^  i>  In  the 
unw  whim  u Tt ) In  the  example  ahown  in  this  table.  the  unovcrlappcd  component  of  T}  - TjISj)  - i}  ,(S,I  - fjjfSjl. 


Use  tables  A-IV  and  A-V  as  aids. 

32  Repeat  for  all  combinations  of  configura- 
tions and  workloads. 

NOTE:  Step  31  uses  a stretchout  factor  to 
force  reasonable  utilization  of 
resources;  i.e.,  equal  to  or  less  than  the 
specified  threshold  value.  However, 
some  resources  queueing  should  be  ex- 
pected and  the  average  process 
response  times  will  (most  likely)  be 
greater  than  the  final  value  calculated 
(step  31).  Therefore,  based  on  reasona- 
ble resource  utilization  (70  percent  or 
less)  and  guided  by  queueing  theory 
calculations  for  mean  waiting  time  in  a 
queue, 


T ■ --  T 

w 2(1  - p)  * 

where  Tw  is  time  spent  waiting  for 
resource,  p is  the  resource  utilization 
after  step  31  has  stretched  out  proces; 


time,  and  Tt  is  a constant  resource  ser- 
vice time  or  average  time  per  use.  The 
average  process  time  should  be 
bounded  by  the  time  calculated  in  step 
31  to  twice  (2  times)  that  number.  In- 
dividual process  times  may  well  vary 
beyond  those  bounds. 

F.  Readjust  Configurations 

Step  Description 

33  Compare  utilizations  to  reasonableness 

(low  utilizations  across  all  proposed 
workloads  suggest  lower  performance, 
less  expensive  hardware). 

34  Compare  final,  recalculated  average  pro- 

cess elapsed  times  (table  A-V)  for  all 
workloads  to  the  expected  response, 
specified  by  end-users.  For  process 
times  that  exceed  expectations,  either 

(1)  change  the  proposed  configuration 
by  increasing  resource  performance  or 

(2)  separate  multifunction  use  of 
resources  to  avoid  contention  and  pro- 
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cess  elapsed-time  stretchout  (Config- 
uration resources  with  the  highest 
stretchout  factors  should  be  considered 
first.) 

33  Repeat  steps  in  categories  D and  E based 
on  adjustments  from  steps  33  and  34. 

36  Retain  configurations  that  will  handle  the 
benchmark  workloads  satisfactorily  as 
candidate  configurations  that  can  be 
fiirther  evaluated  on  the  basic  of  other 
selection  criteria,  such  as  cost  and 
operations  flexibility. 


Table  A-V. — Recalculated  Process  Elapsed  Time a 


[Process  m for  selected  workload | 


Function  number  by 
process 

Processes  in  selected  workload 

Process  1 

Process  2 Process  3 . 

. . Process  m 

A 

F\.\ 

f\2 

F\S 

■ F\,m 

h 

F2.\ 

f22 

F2.3 

F2.m 

h 

Fi,l 

f)2 

f3.3 

f3,m' 

?n 

Fn.\ 

Fnj 

Fn,i 

• Fn.m 

n 

n 

n 

n 

Recalculated 

process 

lime 

V' 

V1 

r> 

■ if  • 
i m 

n 

n 

n 

n 

Twice  the 
process 
lime 

Wl> 

2(Xf2) 

. . 2 (Xfm-> 

Expected 

response 

Fm 

f2 

Fi 

• Fm 

Delta  response 

AE, 

A£2 

AEj 

A£m 

“Where  If.  II  iht  lum  of  <11  o funclion  clipKd  umei  for  prore«»  or 
' " 

Itlf.'l  n (tie  monrnim  of  the  r«n|e  in  which  ihc  irue  avctift  proccu  m 
\ " 

response  timet  should  fall 

» the  expected  response  time  or  the  required  response  time  that  should 
be  met  if  the  system  meets  end-user  processing  needs 
n 

Mm  • 21  fm  - t n is  a difTerence  between  actual  and  expected  response 
times  for  each  process  m 


PART  II — SAMPLE  WORKSHEETS 
FOR  PART  l-E 

Based  on  the  parameters  specified  in  the 
benchmark  workload,  the  computer  resources  used 
by  any  function  may  vary.  For  example,  the  number 
of  instructions  to  handle  a record  of  date  may  be 
relatively  constant  but  the  total  cost  of  the  record- 
processing function  will  be  a multiple  of  the  number 
of  input  records.  Steps  19  and  20  use  workload 
parameters  plus  configuration  rate  (speed)  charac- 
teristics to  derive  the  common  unit  of  time  for  ex- 
pressing use  of  computer  resources  (i.e.,  resource 
busy  time).  Record  calculations  on  diagonal  of  table 
A-l. 

Step  21  considers  the  overlap  of  resource  usage  (if 
any)  and  uses  this  information  to  approximate  the 
average  function  elapsed  time  if  executed  in  a stand- 
alone environment  (no  other  interfering  activity  in 
the  computer).  Table  A-I  provides  a way  to  express 
total  time  Tr  by  resource  (CPU,  I/O  device,  etc.)  on 
the  diagonal  and  the  overlapped  units  of  time  (fy) 
within  the  row  or  column  for  the  particular  resource 
R.  Record  units  of  time  that  resource  R overlapped 
execution  time  with  previously  considered  resources 
by  recording  the  amount  of  the  overlap  to  the  left  of 
the  diagonal  if  the  resource  overlapped  was  first  to 
initiate  or  by  recording  the  amount  of  the  overlap 
above  the  diagonal  if  resource  R began  executing 
first.  If  there  are  n functions  in  m processes,  there 
will  be  nm  function  resource  utilization  tables  (table 
A-I). 

By  proceeding  along  the  diagonal,  calculate  the 
“serial  resource  elapsed  time”  by  subtracting  any 
time  ty  that  resource  r overlapped  execution  with 
another  resource  (same  row  to  left  of  the  diagonal  or 
same  column  above  the  diagonal).  Table  A-III  is  an 
example  for  T}.  Calculate  all  rows,  record  results  in 
the  column  labeled  “elapsed  time  ” and  sum  the 
“elapsed  time”  column. 

Since  a specific  process  m may  not  use  function  n 
just  once  and  only  once,  the  function  node  utilization 
was  calculated  in  step  6,  category  B.  Record  the 
utilization  factor  V„m  in  the  proper  column  and 
multiply  the  factor  Vm  by  the  resource  busy  time 
(diagonal)  for  each  row  (resource). 

To  find  an  adjusted  elapsed-time  component  for 
the  process  m,  use  the  factor  Unm  to  multiply  the 
stand-alone  average  elapsed  time  for  function  n. 

Steps  23  to  2$  require  the  calculation  of  process 
elapsed  time  and  the  average  utilization  of  each 
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resource  used  by  the  process  if  the  process  was  ex* 
ecuting  in  a stand-alone  environment  Table  A-II 
aids  this  calculation  process.  If  there  are  in  processes 
in  the  workload,  there  will  be  m process  resource 
usage  tables  (table  A-II). 

The  next  steps  assume  that  given  the  selected 
workload,  any  mixing  of  resources  is  possible  and 
could  occur  without  ordering.  Also,  over  any  ex- 
tended period  of  time,  a sustained  utilization  of 
resources  would  occur  and  could  be  calculated  for  the 
combined  processes  in  the  workload  by  summing 
P",  for  ail  r’s  given  infinite  resource  capacity.  Ob- 
viously, the  data  provides  only  utilization  in  a stand- 
alone environment.  However,  by  summing  all 
average  resource  utilizations  across  all  processes,  one 
can  determine  whether  or  not  the  workload  (the 
combined  processes)  overloads  the  computer  hard- 
ware resources.  If  not,  the  summed  utilizations  offer 
a reasonable  approximation  of  hardware  loading  (ex- 
cluding operating  system  configuration  management 
or  error  management  functions).  If  overloads  do  oc- 
cur (i.e.,  application  processes  plus  operating  system 
usage  exceeds  the  threshold  utilization),  then  there 
will  be  conflicts  fbr  resources  that  will  surely  stretch 
out  process  average  response  time.  Steps  29  to  31  pro- 
vide an  approximate  way  of  evaluating  resource  load- 
ing and  resulting  impact  to  process  response  time. 

Use  table  A-I1I  to  record  resource  utilization  by 
process,  to  calculate  system  resource  utilization,  and 
to  calculate  a “stretchout  factor”  (Sr)  affecting  pro- 
cess response  time  for  all  overused  resources.  Next, 
use  the  “threshold  value”  V,  (step  30)  for  calculating 
a multiplier  to  adjust  average  process  response  time 
for  all  resources  exceeding  the  threshold. 

The  maximum  permitted  utilization  of  a resource 
could  be  reached:  first,  because  the  application  pro- 
cessing required  the  resource  and,  second,  because  of 
system  overhead  utilization  Or  resulting  from  shared 
use  of  the  resource  to  manage  the  workload  and  the 
hardware  configuration  (step  31).  Therefore,  the 
stretchout  factor  for  resource  r is  determined  as 
follows: 


E', 


where  O,  is  the  overhead  use  of  resource  and  r is  re- 


quired to  support  configuration  and  workload  man- 
agement. 
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Record  either  1 or  the  calculated  value  for  the 
“stretchout  factor”  5,  in  table  A-I1I. 

The  following  is  a method  for  recalculating  ap- 
proximate average  process  response  time.  The 
recalculation  process  requires  that  both  the  function 
elapsed  times  (a  table  A-l  calculation)  and  the  pro- 
cess elapsed  times  be  reevaluated.  Table  A-IV  pro- 
vides a means  of  structuring  the  recalculation  of 
function  elapsed  time.  Similar  to  table  A-I,  there  may 
be  nm  uses  of  table  A-IV  to  represent  n functions  of 
m processes.  If  any  function  of  any  process  does  not 
use  an  overutilized  resource  (i.e.,  S,  — 1 for  all  r’s) 
then  table  A-IV  is  unnecessary  and  the  adjusted 
function  time  U„m  X/n  may  be  immediately  recorded 
in  table  A-V  as  the  recalculated  function  elapsed 
timeFJ. 

Sum  the  serial  resource  elapsed  time  to  obtain  an 
elapsed  time  for  the  particular  function  n.  This 
elapsed-time  figure  represents  a single  execution  in 
the  selected  workload  environment.  Factor  the  result 
by  U„m  or  the  utilization  of  function  n for  process  m 
and  record  the  recalculated  function  time  Fnm'  in  ta- 
ble A-V.  Repeat  for  all  functions  using  overused 
resources  for  each  process  in  the  selected  workload. 

Table  A-V  is  simply  a means  of  recording  func- 
tion elapsed  time  components  of  the  process  elapsed 
time  in  a column.  Summation  of  the  columns  pro- 
vides a new  approximate  process  elapsed  time  (lFm‘ 
summed  for  all  n functions  of  process  m).  1 

By  summing  the  recalculated  function  times  for  a 
process,  reduction  of  the  configuration  throughput 
(for  the  workload  being  considered)  has  been  forced 
to  the  extent  that  resource  utilization  is  reasonable. 
The  configuration  can  be  expected  to  complete  work 
no  faster  than  the  average  process  times  suggest:  i.e., 
if  it  takes  20  minutes  to  handle  a segment  at  each 
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analyst  station,  then  only  three  segments  per  hour 
will  be  processed  at  each  analyst  station.  However,  in 
performing  the  preceding  calculations,  waiting  time 
for  resources  has  not  been  considered.  From  queue* 
ing  theory. 


T rn  T ♦ T 

7 aW  7 


where  7)  is  the  average  response  time  for  a /esource, 
Tw  is  time  spent  in  a queue  waiting  for  the  resource  to 
be  free,  and  Ts  is  the  average  service  time  the 
resource  spends  on  a request. 

It  is  obvious  that  Tw  - 0 for  all  previous  calcula* 
tions,  although  excessive  queueing  should  be  avoided 
by  generally  expecting  resource  utilization  to  be  70 
percent  or  less  and  by  expecting  processes  within  the 
workload  to  use  any  resource  serially  within  the  pro* 
cess.  Nevertheless,  some  queueing  between  pro* 
cesses  will  occur.  If  one  assumes  that  resources  will 
be  used  in  a relatively  constant  way  (i.e.,  Ts  — con* 
stant),  then  the  foliowing  queueing  theory  formula 
applies. 

T"  “ ^1  - p)7* 


where  Tw  and  Ts  are  as  defined  previously  and  p is 
the  resource  utilization  (p  * threshold  value  or  less). 
At  approximately  70*percent  utilization  (p  — 0.7)  of 
any  resource,  Tw  is  approximately  equal  to  Tr 

At  this  point,  one  could  recalculate  the  “serial 
resource  elapsed  time"  column  of  table  A-IV  or 
simply  assume  that,  as  an  approximation,  the 
average  response  time  of  any  process  m falls  between 
the  value  already  calculated  in  table  A-V  and  twice 
that  value.  In  table  A*V,  a row  for  two  times  has 
been  left  for  each  process  in  the  workload. 1 

If  differential  response  A£m  > 0,  then  there  is  a 
configuration  problem  for  this  workload.  Another 
hardware  alternative  should  be  considered.  More 
specifically,  all  resources  used  by  process  m for 
which  A£m  > 0 should  be  examined  to  determine 
whether  there  are  overutilizations  (table  A-III)  that 
can  be  solved  by  faster  hardware  components  or  by 
limiting  usage  by  other  processes  and/or  the  system 
overhead  functions. 


If  £m  falls  between  %Fm'  and  twice  that  value,  then 

there  is  a potential  configuration  problem.  Proceed 
with  judgment. 

Given  new  approximate  average  process  elapsed 
times,  utilizations  of  hardware  resources  that  were 
not  overused  could  be  recalculated.  Repeat  the  finai 
calculation  for  table  A*1I  using  process  elapsed  times 
from  table  A*V  instead  of  2m  to  obtain 
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for  all  processes.  Recreate  table  A-III  and  sum  to 
find  approximate  utilization.  By  inspection,  one 
might  determine  that  some  resources  are  clearly  un- 
derused. A reduction  in  performance  (if  possible, 
considering  other  workloads)  could  mean  a reduc- 
tion in  cost. 


PART  III— IMPLEMENTATION  OP  STEPS  5 
AND  6 OP  PART  l*B 

In  any  process  m that  offers  the  opportunity  to  re- 
peat or  to  skip  functions  in  that  process,  one  faces 
the  necessity  of  calculating  the  use  (or  utilization)  of 
any  specific  function  (node).  If  the  function  is  re- 
peated, it  is  reasonable  to  expect  that  computer 
resources  are  used  more  and  the  process  will  take 
longer.  If  the  function  is  skipped,  then  the  opposite  is 
true.  The  following  is  a simple  example  with  a 
method  of  calculating  the  expected  (average)  utiliza- 
tion of  functions  within  an  interactive  process.  These 
results  for  utilization  Um  for  n functions  of  process 
m are  used  in  tables  A-I  and  A-IV. 

The  solution  for  function  utilization  is  necessary 
only  if  a process  has  probabilistic  branching;  i.e.,  no 
branches  or  predeterminable  branching  implies  that 
each  function  executed  is  used  once  and  Unm  ■»  1 for 
all  function:  (I  to  «).  The  example  uses  a solution  to 
a Markov  chain;  however,  any  other  means  (as 
simulation,  a calculation  of  probabilities,  etc.)  would 
be  appropriate,  in  the  example,  there  ar:  three  func- 
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tion  nodes  in  the  process  m plus  a terminating  node  4 
(«+ 1).  There  am  two  decision  points  at  which  a ter* 
mine!  user  might  alter  the  flow  through  the  proem. 
Each  probability  fythat  the  process  will  flow  from 
node  / to  nodeyon  the  very  next  step  must  be  deter- 
mined by  judgment  or  from  past  experience 
(measurements).  Assignment  of  transition  prob- 
abilities is  done  by  assuming  that  the  process  is  now 
at  node  / and  deciding  (without  regard  to  and  inde- 
pendent of  past  branching)  the  likelihood  of  either 
proceeding  to  the  next  function  (node)  or  taking  a 
branch. 

In  this  example  (and  for  all  processes  considered 
by  this  system  sizing  procedure),  the  probability  of 
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In  the  transition  matrix,  each  row  sums  to  1: 


£ V 

/•» 


where  / - 1, 2 1.  Although  the  process  m 

is  always  begun  at  node  1 from  a zero  node  outside 
the  process  and  will  terminate  at  node  N + 1,  the 
solution  for  utilization  considers  only  the  n nodes  of 
the  process  (in  this  case,  three  nodes  representing 
three  functions).  <erefore.  in  this  example,  matrix 
P is  define'*  *iuiout  either  node  0 or  node  n + 1 
represented. 


'l.2  Pl.i 


P2A  ® P2A 


beginning  at  the  first  function  node  yj  is  one  (i.e.,  ^O.! 
— 1).  Also,  the  probability  that  the  proems  will  com- 
plete and  terminate  is  one  (i.e.,  Pnjri.\  m 1,  or.  for 
this  example,  P3  4 - I).  A solution  can  be  imple- 
mented by  simply  building  a matrix  to  represent  the 
probability  of  moving  from  node  I to  node  j for  all 
combinations  of  « + 1 nodes  end  solving  for  the 
utilization  of  each  function  (1  to  n). 


Using  the  theory  of  Markov  chains,  the  utilization  at 
each  node  is  taken  from  the  matrix  A defined  by 


A • (7  -/»)-* 

where  A is  the  resulting  matrix,  / is  the  identity 
matrix  (an  n by  n matrix  with  all  zeroes  except  for 
ones  on  the  main  diagonal),  and  P is  a transition 


375 


matrix  for  process  nr,  (/-  P)~l  is  the  inverse  of  the 
difference  between  matrices  / and  P. 

Once  the  matrix  A has  been  determined,  th- 
utilization  of  each  function  (assuming  process  m 
begins  at  function  1)  is  the  value  of  the  correspond- 
ing element  in  row  1 of  the  matrix  A. 


where  aJs  the  element  in  the^th  column  of  the  first 
rowof  /and,  in  general,  U„  - o ^ for  process 

m.  If  matrix  (/  - does  not  exist  (i.e.,  the  in- 
verse  cannot  be  obtained),  matrix  P defines  an  in- 
definite loop  such  that  the  process  will  never  termi- 
nate. 

The  described  procedure  gives  the  expected 
utilization  at  each  node.  In  cases  for  w.iich  the  max- 
imal loading  of  the  nodes  is  important,  one  would 
augment  this  procedure  by  calculating  the  variance 
of  the  utilization  at  each  node.  Appropriate  equations 
for  this  calculation  may  be  found  in  “Finite  Markov 
Chains”  by  Kemeny  and  Snell  (see  bibliography). 
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Appendix  B 

Selection  Criteria  end  Weighting  Proceee 


This  appendix  provides  a process  by  which 
acceptable  configurations  can  be  compared.  It  also 
depicts  the  authors'  application  of  the  process  to  an 
Earth  resources  data  system. 


PART  I— -THE  PROCESS 

Once  a set  of  configurations  capable  of  performing 
the  required  workloads  is  found,  it  is  necessary  to  es- 
tablish a process  which  compares  them.  Since  each 
configuration  can  perform  the  required  functions, 
this  choice  will  be  made  on  the  basis  of  system 
characteristics  other  than  technical  feasibility.  This 
appendix  presents  an  approach  to  f nding  an  evalua- 
tion rating  and  the  application  of  the  approach  to  the 
Earth  resources  data  system  example. 

In  selecting  comparison  criteria,  it  is  necessary  to 
have  well  in  mind  the  characteristics  and  needs  of 
the  intended  user.  Some  criteria  apply  universally 
(e.g..  cost),  whereas  others  are  applicable  to  only  a 
small  fraction  of  users  (e.g..  real-time  support 
facilities).  The  weights  assigned  to  the  various  cri- 
teria will  also  vary  from  user  to  user.  A user  with 
very  clear-cut  applications  firmly  in  mind  will  place 
great  emphasis  on  the  capability  of  a potential 
system  to  suproi , ihM  application  and  extensions  of 
it.  while  a user  with  somewhat  ill-defined  future 
plans  will  emphasize  adaptability  and  flexibility. 


In  the  case  of  the  Earth  resources  data  system  ex- 
ample, the  intended  user  is  assumed  to  have  two  ma- 
jor uses  for  the  system:  as  an  analysis  tool  to  evaluate 
algorithms  and  techniques  and  as  a demonstration 
tool  to  apply  already  developed  algorithms  and  tech- 
niques on  a repetitive  basis  to  show  feasibility  in  a 
production  environment.  Since  these  algorithms  can- 
not be  known  beforehand,  this  user  also  needs  good 
incremental  development  tools.  It  is  assumed  that 
some  of  the  analysis  and  possibly  the  demonstration 
work  will  be  done  interactively.  Based  on  these  user 
characteristics,  three  major  criteria  for  evaluating  the 
candidate  systems  have  been  selected:  cost,  interac- 
tive support  capabilities,  and  general  support 
capabilities.  The  weights  chosen  make  cost  equal  to 
the  sum  of  the  others  and  make  general  support 
more  important  than  interactive  support,  corre- 
sponding to  the  assumed  relative  utilization. 

When  defining  the  components  of  each  criterion, 
the  objective  is  a set  of  descriptions  of  functions 
and/or  facilities  that  are  as  explicit  as  possible.  The 
definitions  presented  here  should  permit  an  evalua- 
tor to  precisely  r tc  a system  on  the  basis  of  quan- 
titative measurements  rather  than  subjective  assess- 
ment. The  following  points  describe  the  method  to 
be  used. 

I.  Three  major  areas  were  established  and 
v ighted. 

a.  Cos*—  $0-percent  weighting 
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b.  Interactive  support  capabilities— 30-percent 
weighting 

c.  General  support  capabilities— 30-percent 
weighting 

Each  of  these  areas  is  defined  in  detail  subse- 
quently. 

2.  With  the  exception  of  cost,  the  measurable 
components  of  the  major  areas  were  defined  and  also 
weighted.  Specific  items  were  defined  dial  would 
permit  an  evaluator  to  assign  ratings  to  individual 
components  without  having  a detailed  knowledge  of 
how  the  measured  value  for  that  component  was 
derived. 

3.  A rating  system  of  1 to  10  was  defined,  where 
10  is  the  best  rating  that  can  be  assigned. 

Several  methods  of  assigning  numerical  ratings 
can  be  used.  In  general,  the  method  chosen  for 
assigning  a particular  rating  depends  primarily  on  the 
numerical  values  that  are  being  rated.  The  methods 
to  be  used  are  described  as  foiiows. 

1.  Direct  method— To  be  used  if  a rating  can  be 
easily  assigned  to  the  values  being  rated.  It  should  be 
noted  that  the  ratings  are  generally  linear,  though  not 
precisely. 

2.  Sample  variance  method— To  be  used  if  ratings 
cannot  be  easily  assigned  to  the  values  being  rated. 
Basically . the  method  presumes  that  all  points  being 
rated  are  valid  points  but  probably  not  extreme 
points.  This  presumption  does  not  preclude  the 
assignment  of  a 1 or  a 10  rating  to  a point,  but  it 
makes  it  more  difficult.  The  method  consists  of  the 
following  steps. 

a.  An  average  value  (mean)  is  computed. 


c.  Maximum  and  minimum  values  are  ar- 
bitrarily assigned  as 


“•  ’ 'a  - 2 & 


d.  The  difference  between  the  maximum  and 
the  minimum  points  is  uniformly  divided  into  10 
segments,  and  each  segment  is  assigned  a rating.  The 
ratings  may  be  either  increasing  or  decreasing  (e.g., 
higher  cost  would  be  rated  lower).  An  example  of 
this  method  follows. 


VAIUK 
HANGS ' 

POINTS 

RATING 


TTTWTTTfl 


to 
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The  interim  calculations  that  lead  to  the  rating  scale 
are  not  shown.  However,  note  that  the  point  with  the 
higher  value  receives  the  lowest  rating.  The  final  rat- 
ing would  be  /*|  ■■  8,  “ 7,  and  Py  • 2. 

3.  Not-quantified  method— To  be  used  if  the 
value  being  rated  is  subjective  rather  than  objective; 
that  is,  a direct  numeric  value  cannot  be  assigned. 
Typically,  no  more  than  3 values  and  ratings  should 
be  used,  as  shown  in  the  following  example. 


/•I 


b.  The  average  variance  between  the  points 
and  the  mean  is  computed. 


Effort  required  w .wimvf  new  tt.mina I 

Value  Rating 

Minimal  I 

Medium  5 

Significant  2 


PART  II— COST  ANALYSIS 


£ (*  - 2) 

<-t 


Cost  is  defined  as  being  the  total  estimated  cost 
for  a 10-year  period,  including  recurring  and  non- 
recurring costs.  The  importance  of  the  cost  criterion 
inherent  in  the  $0-perccnt  weighting  factor. 
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Aside  from  the  fact  that  budget  limitations  are  al- 
ways an  important  factor,  cost  is  especially  impor- 
tant in  this  evaluation  because  of  the  way  it  is  used. 
Each  configuration  is  sized  to  meet  the  requirements 
of  a user  workload.  Consequently,  cost  inherently  in- 
cludes a number  of  other  functions  and/or  facilities 
that  could  be  used  as  evaluation  criteria.  An  example 
might  be  the  programing  language  facilities  of 
different  operating  systems.  This  item  is  not  con- 
sidered outside  of  the  cost  criterion  because  most 
operating  systems  support  the  necessary  language  re- 
quirements. it  is  important  to  determine  that  all  the 
systems  to  be  costed  are  capable  of  performing  all  re- 
quired functions.  The  cost  criterion  does  not  allow 
the  direct  trade-off  of  function  for  cost. 

Because  all  dollars  expended  on  a system  are  treat- 
ed as  being  of  equal  value,  no  weights  are  assigned  to 
components  of  costs.  Some  examples  of  the  cost 
items  to  be  included  are  as  follows. 

1.  Initial  purchase  costs 

a.  Equipment 

b.  Installation 

c.  Custom  and  purchased  software 

d.  Engineering 

2.  Maintenance  costs 

a.  Software  (in-house  or  vendor  supplied) 

b.  Hardware  (in-house  or  vendor  supplied) 

3.  Operation  cost 

a.  Personnel 

b.  Expendables  (paper,  film) 

c.  Electrical  power 


PART  III— INTERACTIVE  SUPPORT 
CAPABILITIES 

The  interactive  support  capabilities  criteria 
measure  the  capability  of  the  configuration  to  sup- 
port interactive  analysis  activities.  The  components 
of  these  criteria  are  as  follows. 

1.  Overall  system  architecture  (40-percent 
weighting) — This  is  an  evaluation  of  how  well  the 
system  is  structured  to  support  interactive  use. 
Specific  items  to  be  considered  are  the  ease  of  use  of 
the  analyst/user  interface  language  and  how  much 
flexibility  it  provides  for  respondirg  to  errors  or  in- 
terim results.  The  adequacy  of  the  system  with 
regard  to  physical  accessibility,  ease  of  operation, 
and  response  time  for  interactive  use  will  also  be 
evaluated. 

2.  Expandability  (20-percent  weighting)— This  is 
an.  evaluation  of  the  system  capacity  to  support 


future  additional  interactive  use.  Specific  items  to  be 
considered  are  availability  of  central  processing  unit 
(CPU),  memory,  and  input/output  (I/O)  resources, 
ease  of  adaptation  of  hardware  and  software  to  addi- 
tional terminals  of  a new  or  different  type,  and  ease 
of  connection  to  remote  interactive  users. 

3.  Availability  (40-percent  weighting) — This  is  an 
evaluation  of  the  probability  of  error  conditions  oc- 
curring and  how  well  the  system  responds  to  errors. 
Specific  items  to  be  considered  are  the  likelihood  of  a 
hardware  error  interrupting  a terminal  session,  the 
ease  of  reconnection,  and  the  amount  of  processing 
which  must  be  repeated  after  a hardware  or  software 
error. 

In  assigning  weights  to  these  internal  compo- 
nents, it  was  believed  that  current  usability  was  of 
overriding  importance;  therefore,  the  factors  making 
the  system  easy  to  use  and  available  for  use  were 
weighted  most  heavily. 


PART  IV— GENERAL  SUPPORT 
CAPABILITIES 

The  general  support  capabilities  criteria  measure 
the  capability  of  the  configuration  to  support  nonin- 
teractive use.  The  components  of  these  criteria  are  as 
follows. 

1.  Operational  manageability  and  flexibility  (50- 
percent  weighting)— This  is  an  evaluation  of  how 
well  the  configuration  allows  for  control  and  man- 
agement of  its  resources.  Specific  items  to  be  con- 
sidered are  the  existence  of  an  operational  interface 
to  control  access  to  specific  hardware,  software,  or 
data  bases;  support  for  new  program  development 
including  testing  and  incremental  release 
capabilities;  and  the  flexibility  of  the  system  to  man- 
age several  changing  workloads. 

2.  Expandability  (30-percent  weighting)— This  is 
an  evaluation  of  the  system  capability  to  support  ad- 
ditional noninteractive  use.  Specific  items  to  be  con- 
sidered are  availability  of  CPU,  memory,  and  I/O 
resources  to  support  additional  workloads,  the  ease 
with  which  additional  resources  could  be  added,  and 
ease  of  connection  to  remote  users.  The  capability  of 
the  system  to  support  program  development  is 
measured  as  part  of  the  operational  manageability. 

3.  Transportability  of  developed  technology  (20- 
percent  weighting) — This  is  an  evaluation  of  how 
well  the  system  can  be  duplicated  and  adapted  to  the 
needs  of  other  users.  Specific  items  to  be  considered 
are  the  proportion  of  the  cost  of  the  system  ex- 
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pended  for  special  engineering  or  installation  support 
activities,  the  ease  with  which  the  system  input/ 
output  and  data  base  formats  can  be  actuated  to  a 
new  user,  and  the  availability  of  support  for  all  soft- 
ware used  within  the  system. 

Again,  in  assign  ins  weights,  ease  of  current  use 


has  been  heavily  favored.  It  is  believed  that  the  suc- 
cess of  a multiuse  system  such  as  the  Earth  resources 
data  system  wilt  greatly  depend  on  its  capacity  to  re- 
spond to  a changing  environment  and  to  manage- 
ment of  that  change. 


Appendix  C 

Sample  Application  of  the  Configuration  Adequacy  Model 


This  appendix  contains  an  example  of  the  com- 
puter siting  evaluation  process  presented  in  appen- 
dix A.  It  applies  the  methodology  to  solving  a hy- 
pothetical problem  that  is  related  to  the  Earth 
Resources  Interactive  Processing  System  (ERIPS) 
currently  in  use  at  the  NASA  Johnson  Space  Center. 
The  hypothetical  problem  that  is  addressed  is 
"Given  a 'batch  production'  job,  determine  the 
elapsed  time  to  complete  that  job  and  leave  40  to  SO 
percent  of  the  central  processing  unit  (CPU)  availa- 
ble for  other  applications."  This  example  is  divided 
into  seven  parts,  and  the  various  parts  address 
specific  ideas  presented  in  appendix  A. 


part  i-function  definition 

The  LAC1E/ERIPS  batch  production  user  en- 
vironment was  specified  as  (he  target  for  investiga- 
tion. It  is  necessary  to  define  ail  processing  functions 
associated  with  this  environment  and  the  resource 
usage  variables  for  each.  Steps  I to  3 of  the  process 
covered  in  appendix  A are  depicted  here. 

A flow  chart  of  the  baseline  environment  pro- 
vides insight  and  serves  as  an  aid  for  identifying  the 
functional  elements  of  the  system.  This  flow  chart  is 
developed  as  though  the  ERIPS  is  a centralized 
system,  Even  though  it  is  known  (hat  the  special-pur- 
pose processor  (SPP)  is  part  of  the  configuration,  the 
concern  here  is  only  with  identifying  the  functions 
which  must  he  performed  and  the  cost  associated 
with  those  functions  as  they  relate  to  ensuring  that 
40  to  50  percent  of  the  host  CPU  is  available  to  other 
jobs.  This  objective  can  be  accomplished  by  treating 
the  SPP  simply  as  an  input/output  (I/O)  device  to 
which  some  functions  write  and  others  read.  Unique 


functions  are  identified  with  dotted  lines  in 
figure  C-l. 

Hierarchical  input/processing/output  (H1PO) 
representations  of  functions  and  their  cost  in 
resource  variables  were  developed.  Figure  C-2  repre- 
sents this  activity. 


PART  II— PROCESS  DEFINITION 

Steps  4 to  7 of  the  computer  sizing  evaluation 
process  are  depicted  in  this  section.  A batch  process 
is  defined  for  a representative  production  user,  and 
the  probability  of  that  user  executing  functions  in  a 
particular  sequence  is  taken  into  consideration;  then, 
resource  utilization  is  computed  on  the  basis  of  the 
probability  (fig.  C-3). 


PART  III— WORKLOAD  DEFINITION  AND 
SELECTED  BA8ELINE 

Steps  8 to  12  of  the  computer  sizing  evaluation 
process  arc  depicted  in  this  section.  The  workload 
defined  is  a hatch  production  workload,  which  con- 
sisted of  the  processing  requested  on  a site.  In  table 
C-l,  the  parameters  to  be  input  to  each  function  with- 
in the  process  are  specified. 


PART  IV— CONFIGURATION 

Steps  13  to  17  arc  depicted  in  this  section.  Since 
this  example  involves  the  answering  of  a question 
about  the  existing  configuration,  it  is  the  only  candi- 
date configuration.  This  system,  with  the  appropriate 
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hardware  capabilities,  is  shown  in  the  configuration 
block  diagram  in  figure  C-4.  Alt  functions  are 
assigned  to  the  host  CPU  and  the  SPP  is  simply  treat- 
ed as  an  I/O  device  for  the  reasons  discussed  earlier. 


PART  V— INITIAL  SYSTEM  LOADING 
CALCULATIONS 

Steps  18  to  31  are  represented  in  this  section.  The 
system  loading  chart  presented  in  table  C-U  is  only 
for  the  clustering  and  products  function.  It  repre- 
sents the  resource  requirements  of  this  function  for  a 
single  pass  through  the  transition  matrix  for  the  pro- 
cess. Table  C-III  represents  the  elapsed  time  for  a 
single  pass  through  the  matrix  for  all  functions  con- 
tained in  the  process.  This  same  chart  must  be 
developed  for  each  function  contained  in  the 
process. 

Resource  busy  times  and  elapsed  execution  times 
for  each  process  function  are  summarized  in  table  €• 
III.  Resource  utilizations  were  computed  as  de- 
scribed in  steps  22  to  25.  The  results  of  these  com- 
putations show  that  the  expected  CPU  utilization  in 
a stand-alone  environment  will  be  72  percent,  which 
exceeds  the  SO-  to  60-percent  guideline  (table  C-IV). 
Thus,  it  is  necessary  to  proceed  with  steps  27  to  31, 


which  involve  the  stretchout  computations.  Subse- 
quent calculations  use  SO  percent  as  the  target  CPU 
utilization. 


PART  VI— STRETCHOUT 

The  CPU  stretchout  factor  was  applied  to  each 
function  in  the  process  as  is  shown  for  the  clustering 
proem  and  products  function,  and  a new  table  was 
developed  using  the  stretched  out  adjusted  function 
elapsed  times  (table  C-V). 


PART  VII— SUMMARY 

Using  the  data  from  the  stretchout  computations 
contained  in  Part  VI,  a new  table  was  developed 
which  details  the  results  to  be  expected  from  the 
stretchout  (table  C-VI).  Upon  completion  of  the  first 
stretchout  computation,  it  appears  that  the  elapsed 
time  to  complete  the  batch  production  process  and 
leave  40  to  50  percent  of  the  CPU  available  for  other 
jobs  has  increased  from  476.5  seconds  to  839.12  sec- 
onds (table  C-VII).  This  increase  in  elapsed  time 
results  in  a CPU  utilization  of  56  percent,  which  is 
within  the  50-  to  60-percent  target. 
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Table  C-l. — Benchmark  Case  Study 


Function 

Process  control  parameter 
description 

Image  merge 

3 acquisition  12  channel 

Fa 

Signature  extension 

Xa 

Fa 

Field  retrieve 

(field 

Fm 

Field  report  (both) 

Both 

F» 

Clustering 

All  channel.  RSEG  (DO.  DU) 

% 

Select  Sun  angle 

RSEG 

Fm> 

Iterative  clustering 

LI,  dots  as  starting  vectors 

F» 

Cluster  report 

X 

fB6 

Store  statistics 

X 

F* 

Detailed  report 

X 

FB6 

Distance  table 

X 

FBb 

Cluster  map 

Conditional 

FBb 

Cluster  map 

Image  tape 

FBh 

Cluster  dot  report 

X 

fB6 

Green  number  report 

Both 

Fm 

Mean/standard  report 

Both 

Fm\ 

Feature  selection 

Sepawbility  acquisition 

Fm\ 

A prioris  computation 

X 

FBll 

Feature  selection  report 

X 

Fm 

Classification 

W/O  replacement,  all  subclass, 
compute  a prioris,  RSEG 

fBI  2 

Class  summary  report 

No  overrides 

fBI  2 

Class  map 

X 

fB\2 

Bias  correction 

Both 

fbm 

Spectral  trajectory  plots 

X 

functions  marked  “X**  have  no  parameters. 


Table  C-II. — Function  Resource  Utilization  and  Elapsed-  Time  Table 


Resource  Stand-alone  Serial  Contribution  to  process 

(a)  single  function  execution  resource 

— elapsed  Node  Adjusted  Adjusted 

SCI  SC2  MCO  CPU  time  utilisation  busy  time  function 

elapsed  time 


'Selector  channel  I - SCI.  selector  channel  2 “ SC2.  byte  multiplexer  channel  0 - MCO,  central  processor  busy  time  - CPU. 


Table  GW.— Resource  Usage  and  Elapsed  Time 
for  Batch  Process 


Barth  Adjusted  resource  busy  time  Adjusted 

process  ——————————  function 


f Unctions 

SCI 

SC2 

MCO 

CPU 

elapsed 

time 

ffll 

OSS 

1.61 

0 

3.25 

5.71 

2.63 

1.41 

mm 

6.71 

10.75 

19 

.15 

.21 

fi 

1.94 

fbs 

Kfl 

FBb 

48.75 

9.31 

98.53 

165.61 

Fm 

mm 

mm 

mm 

n 

fbs 

n 

llv 

-mm 

Fm 

mm 

■s 

■■ 

fb\o 

9.S0 

.05 

0 

7.40 

16.95 

10.81 

.13 

0 

158.74 

169.68 

*812 

29.9 

2.5 

6.5 

64.9 

103.8 

F«  3 

.31 

0 

0 

.47 

.78 

Total 

103.76 

14.85 

15.81 

343.14 

476.50 

Utilization 

.22 

.03 

.03 

.72 

Table  C-IV. — System  Resource  Usage 


Practises  Resources  used  by  processes 


in  the  workload 


SCI 

sc: 

MCO 

CPU 

Batch  production8 

022 

0.03 

0.03 

0 72 

Total 

.22 

.03 

.03 

.72 

System  overhead 

.00 

.00 

00 

.15 

Threshold  value 

.70 

.70 

.70 

.50 

Stretchout  factor** 

1 

1 

1 

2.06 

*Wnh  current  LaCIE  system.  interactive  user*  «nd  batch  production  arc  mutually 
exclusive 

\pv~  ((iSd-’lii)  ■ 206 


Table  GV. — Func.lon  Elapsed  Time  for  Selected 
Workload 


Resource  Single  function  Serial  Contribution 

busy  time  execution  (in  a workload}  resource  to  process 

" elapsed  ■ 

SCI  SC2  MCO  CPU  lime  Node  Recalcu. 

utlli-  fated 
cation  fUnc~ 
tion 
time 


factor 


Table  GVI. — Modified  System  Resource  Usage 


Processes  in  the  Resources  used  by  processes 


workload 

SCI 

sc: 

MCO 

CPU 

Batch  production 

0.22 

0.03 

0.03 

0.41 

Total 

.22 

.03 

.03 

.41 

System  overhead 

.00 

.00 

.00 

.IS 

Threshold  value 

.70 

.70 

.70 

.50 

Stretchout  factor 

1 

1 

1 

l 

Table  G VII. — Modified  Resource  Usage  and 
Elapsed  Time  for  Batch  Process 


Batch  process 
J Unctions 

Adusted  resource  busy  time 

Adjusted 

Junction 

elapsed 

time 

SCI 

sc: 

MCO 

CPU 

fB\ 

0.85 

1.61 

0 

3.25 

9.16 

fB1 

2.63 

1.41 

0 

6.71 

17.86 

FBi 

0 

.06 

0 

.15 

.37 

fB* 

FJS 

1.01' 

.07 

0 

1.94 

5.08 

0 

0 

0 

0 

0 

48.75 

9.02 

9.31 

99.58 

270.05 

FB1 

0 

0 

0 

0 

0 

fbs 

0 

0 

0 

0 

0 

Fm 

0 

0 

0 

0 

0 

fbio 

9,50 

.05 

0 

7.40 

24.79 

fbi\ 

10.81 

.13 

0 

158.74 

337.94 

fB\2 

29.9 

2.5 

6.5 

64.9 

172.59 

fb\i 

.31 

0 

0 

.47 

1.28 

Total 

103.76 

14.85 

15.81 

343.14 

839.12 

Utilization8 

.22 

.03 

.03 

.41 
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Process  Control 
Batch  Supervisor 


Image  Merge 


Signature 

Extension 


Pattern 

Recognition 

Supervisor 


Field  Definition 
Project 


Statistics 


Clustering 


Clustering 


Clustering 


Clustering 


(a) 

FIGURE  C-l.— Baseline  environment  flow  diagram,  (a)  Functlors  Fg,  to  Fm..  (b)  Functions  fW  to  F«,n.  (c)  Functions  Fmt  and 
Fm.  (d)  Function  Fm  3 . 
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Feature 

Selection 


Feature 

Selection 

Report 


Classification 


Classification 

Summary 

Report 


(e) 


Bias  Correction 
Report 


Dot  Summary 


Classification 

Map 


CAMS/CAS 

Interface 


FIGURE  C-l.— Continued. 
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Trajectory 

Plots 


Mora  Than  One 
Sit* /Same 
Processing 

Requirements 
New  Processing 
Requirements 


(d) 


FIGURE  C-l. — Concluded. 


386 


Input 


Output 


Unit  R#co»d  (Input  Ciidt)  80  (4  * XI 
B\t#»  P#»  Site  WH#?#  (?%.  X ik  te»  ^ 
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Cost  and  Psrformancs  Characteristics 
of  Data  System  Configurations  for 
Processing  Remotely  Sensed  Data 

P.  J.  Gregor a and  J.  F.  Spinet0 


INTRODUCTION 

The  objective  of  this  paper  is  to  explore  some 
alternative  approaches  to  constructing  a large 
remote-sensing  data  system.  The  discussion  focuses 
mainly  on  the  cost  and  performance  implications  of 
using  a collection  of  “small”  computers  versus  using 
one  large  computer.  Several  implications  of  the  use 
of  large  remote-sensing  data  systems  in  general  are 
discussed. 

The  discussion  begins  with  a consideration  of  the 
first  step  in  the  planning  for  any  large  data  system: 
definition,  both  functional  and  quantitative,  of  the 
expected  use  of  the  data  system.  Several  possible 
large  data  system  architectures  and  some  cost  factors 
that  may  influence  the  selection  of  an  architecture 
are  identified.  Cost  and  performance  of  architectures 
based  on  a collection  of  small  computers  versus 
architectures  based  on  a single  large  computer  are 
compared.  Finally,  some  conclusions  concerning 
data  system  architecture  are  drawn  which  may 
benefit  members  of  the  Earth  resources  community 
who  are  contemplating  acquisition  of  a large  data 
system.  Because  of  the  MITRE  Corporation's  recent 
experience  with  planning  for  a data  system  at  the 
NASA  Johnson  Space  Center  (JSC),  the  proposed 
Earth  Resources  Data  System  (ERDS)  will  be  dis- 
cussed extensively  as  an  example  of  a large  data 
system. 


CURRENT  TREND8 

A trend  in  the  Earth  resources  remote-sensing 
community  is  toward  the  construction  of  “large" 


data  systems.  Established  users  of  remote-sensing 
technology  have  been  replacing  small  single-user 
research  and  development  (R&D)  data  systems  with 
larger,  multiuser,  multifaceted  data  systems.  Three 
factors  have  combined  to  bring  about  this  trend:  (I) 
successful  experience  with  remote-sensing  tech- 
nology (such  as  the  near-operational  production 
estimation  procedures  of  LACIE),  (2)  improving 
price  and  performance  characteristics  of  computers, 
and  (3)  availability  of  large  volumes  of  data.  The 
trend  toward  large  systems  is  expected  to  continue  as 
more  users  plan  to  take  advantage  of  Landsat-D’s 
thematic  mapper.  (Beginning  in  1981,  the  thematic 
mapper  will  provide  almost  an  order  of  magnitude 
more  data  per  day  than  does  the  currently  used 
muitispectral  scanner  on  Landsat-2.) 

Several  Earth  resources  data  systems  which  are 
planned  or  under  construction  can  be  cited  as  exam- 
ples of  the  trend  toward  large  systems.  The  U.S. 
Department  of  Agriculture  (USDA)  is  planning  a 
User  Advanced  System  that  will  tentatively  support 
IS  interactive  image  analysis  stations  plus  additional 
data  processing  (ref.  1).  The  NASA  Goddard  Space 
Flight  Center  (GSFC)  has  issued  a specification  for  a 
Landsat-D  Assessment  System  that  will  have  con- 
siderable processing  capability  (ref.  2).  The  Canada 
Centre  for  Remote  Sensing  is  currently  completing  a 
facility  designed  to  support  multiple  users  (G. 
Willoughby,  personal  communication).  A final  ex- 
ample of  this  trend  is  the  experience  here  at  JSC.  In 
the  past,  LACIE  has  required  the  use  of  several  com- 
puters in  dispersed  locations.  JSC  is  currently  plan- 
ning a new,  unified  ERDS  to  support  its  continuing 
remote-sensing  activities.1  In  all  four  of  these  cases. 


1 ADP  Acquisition  Plan  Tor  an  Earth  Resources  Data  System 
for  the  Johnson  Space  Center.  Unpublished  document,  JSC,  Nov. 
9,  1977. 


aThe  MITRE  Corporation.  Houston,  Texas. 


: ;<  11  ;•  f - i)  $ ■/ 


the  use  of  a single  image  display  device  connected  to 
a small  minicomputer  was  an  early  step  in  the 
development  of  remote-sensing  activities.  Now,  the 
data  processing  requirements  of  each  of  these 
organizations  considerably  exceed  the  capacity  of  a 
single  small  computer.  . , 


DATA  SYSTEM  USAGE 

A necessary  first  step  toward  the  acquisition  of  a 
new  data  system  is  the  determination  of  the  extent  to 
which  the  system  will  be  used.  The  use  of  an  existing 
data  system  is  a good  basis  for  making  projections 
about  the  use  of  a new  system.  When  such  informa- 
tion is  not  available,  planning  becomes  rather 
difficult.  Fortunately,  such  information  generally 
should  exist  in  the  Earth  resources  community,  for 
large  data  systems  characteristically  represent 
growth  from  a small  data  system. 

The  uk  of  a data  system  can  be  categorized  by  the 
type  of  “activities"  supported  and  quantified  in 
terms  of  what  system  “resources"  are  required.  At 
the  planner's  discretion,  activities  can  represent 
Earth  resources  functions  (image  preprocessing,  im- 
age classification,  data  base  management)  or  com- 
puter system  functions  (edit,  compile,  execute). 
Creation  of  several  different  categorizations  of 
system  use  (system  workload)  can  yield  insight  into 
how  the  work  may  be  allocated  to  the  resources  of  a 
large  data  system. 

In  an  Earth  resources  data  system,  the  basic 
system  resources  can  include 

1.  General-purpose  computational  processor(s) 

2.  Special-purpose  computational  processor(s) 

3.  Alphanumeric  (A/N)  terminals 

4.  Image  analysis  stations 

The  set  of  resources  required  in  an  Earth  resources 
data  system  is  to  some  extent  unique.  A special-pur- 
pose processor  (SPP)  is  required  to  quickly  perform 
large  numbers  of  parallel  pixel-oriented  calculations, 
and  color  displays  provide  the  required  interaction 
between  the  analyst  and  the  imagery. 

The  units  of  utilization  of  each  resource  are 
generally  those  measured  by  the  accounting  system 
used  on  the  existing  or  proposed  data  system.  Exam- 
ples of  utilization  units  of  a general-purpose  pro- 
cessor (GPP)  are  the  System  Resource  Unit  of  the 
Control  Data  Corporation's  operating  systems  and 
the  Standard  Unit  of  Processing  (SUP)  hour  of 
UNI VAC's  Exec-8  operating  system.  Both  units 
represent  weighted  sums  of  measures  of  a program’s 


use  of  the  system's  central  processing  unit  (CPU), 
main  memory,  and  input/output  devices.  A weighted 
sum  approach  is  necessary  to  capture  the  complexity 
of  a multiuser,  multiqueue  general-purpose  pro- 
cessor; '-  'i  * ■ - 4‘. 

. Accounting  systems  for.  special-purpose  pro- 
cessors have  not  been  developed;  thus,  their  use 
must  be  roughly  measured  in  “connect  hours"  (ac- 
tual (tours  that  a particular  machine  is,  or  will  be, 
used).  Use  of  alphanumeric  and  image  terminals  is 
similarly  measured  in  connect  hours.  A simple  con- 
nect-hour measurement  for  there  devices  is  ap- 
propriate because  of  their  single-user,  single-queue 
nature. 

The  identification  of  user  activities  and  resources 
required  can  be  easily  represented  in  matrix  format 
Table  I represents  such  a matrix  constructed  for 
JSC's  Earth  resources  data  processing  activities  in 
fiscal  year  1977.  The  activities  are  grouped  initially 
by  the  data  system  component  used  to  support  them. 
The  last  category  in  the  table,  “Supporting  com- 
puters," represents  a variety  of  eight  IBM,  UNIVAC, 
and  DEC  computers  located  throughout  the  United 
States. 

The  utilization  unit  for  general-purpose  pro- 
cessors was  SUP  hours  per  week.  Use  of  each  system 
component  was  originally  measured  in  units  unique 
to  that  component.  Then,  with  results  from 
benchmarks  and  comparison  performance  data 
published  in  the  literature,  these  levels  of  use  were 
reexpressed  as  SUP  hours  per  week  to  present  a total 
picture  of  system  use.  Such  a conversion  also  enabled 
comparisons  of  the  relative  importance  of  each  com- 
ponent of  the  system. 

The  categorization  of  activities  in  table  I is  based 
on  accounting  subdivisions  used  at  JSC.  Other  ways 
of  categorizing  activities  are  possible;  table  II  is  an 
example.  Table  II  reveals  an  interesting  characteristic 
of  JSC's  Earth  resources  computer  workload. 
Routine  LACIE  activities,  such  as  classification  and 
production  estimation,  represent  only  a small  por- 
tion of  the  workload  in  the  general-purpose  and 
alphanumeric  resource  categories.  (They  are, 
however,  major  consumers— greater  than  50  per- 
cent—of  SPP  and  image  terminal  resources.)  The 
workload  is  instead  heavily  oriented  toward  software 
development;  quality  assurance;  and  research,  test, 
and  evaluation  (RTAE)  activities. 

The  choice  or  categories  for  activities  and  the  dis- 
tribution of  the  workload  among  the  activities  are 
important  considerations  when  planning  a large  data 
system,  especially  when  a multicomputer  confisura- 
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Tabu  I. —JSC  Earth  Resources  Workload  for  Fiscal  Year  197? 
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lion  is  being  considered.  As  shown  in  the  next  sec- 
tion, the  number,  size,  and  type  of  categories  can 
form  the  basis  for  choosing  the  number,  size,  and 
type  of  computers  to  use  in  the  data  system. 


DATA  SYSTEM  ARCHITECTURE 

Until  recently,  the  projected  workload  largely 


determined  the  data  system  architecture.  Small 
workloads  were  accomplished  on  small  machines, 
available  from  the  minicomputer  vendors,  whereas 
large  workloads  dictated  large  machines.  As 
minicomputers  have  become  more  powerful  (i.e., 
have  more  main  memory,  including  cache;  more 
sophisticated  operating  system;  wider  choice  of  word 
length— 16, 24,  and  32  bits;, wider  options  for  “net- 
ting" several  computers),  the  size  of  the  workload  no 
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longer  is  the  only  factor  in  choosing  the  data  system 
architecture. 

The  effect  of  this  freedom  of  choice  on  the  Earth 
resources  community  has  been  the  selection  of  a 
variety  of  architectures  for  a variety  of  data  system 
requirements.  Prominent  in  the  large  machine  camp 
are  the  NASA  Jet  Propulsion  Laboratory,  which  has 
an  IBM  360/65  (ref.  3),  and  the  Purdue  University 
Laboratory  for  Applications  of  Remote  Sensing 
(LARS),  which  has  an  IBM  370/148  (ref.  4).  A 
multicomputer  configuration  of  five  to  six  small*  to 
medium-scale  minicomputers  has  been  proposed  for 
the  USDA’s  User  Advanced  System  (ref.  1),  and 
GSFC  will  have  a collection  of  four  minicomputers 
once  the  Landsat  Assessment  System  (ref.  2)  is  in* 
terfaced  to  the  existing  Atmospheric  and 
Oceanographic  Information  Processing  System 
facility  (ref.  $).  Planning  studies  for  ERDS  have  con- 
sidered both  multicomputer  and  single-computer 
configurations  (ref.  6). 

Generally,  the  available  architectures  for  the  pre* 
viously  described  kinds  of  workloads  are  of  two  basic 
types:  (1)  a large  single-machine  configuration  (fig. 
1),  or  (2)  a multiple  small  to  medium  machine  con- 
figuration (fig.  2).  The  large  single-machine  con- 
figuration is  centered  around  a mainframe  from  the 
IBM  360-370  series  (more  recently  303x  series),  the 
UNIVAC  1100  series,  or  some  comparable  product 


line.2  One  or  more  SPP’s  and  alphanumeric  and  im- 
age analysis  terminals  me  attached  to  the  mainframe 
as  required.  The  rice  of  the  data  system  is  deter- 
mined by  the  workload.  The  general-purpose  compo- 
nent's (the  mainframe)  size  is  determined  by  die 
number  of  SUP  hours  required,  the  number  of 
special-purpose  processors  by  the  number  of  connect 
hours  needed,  and  the  alphanumeric  and  image  ter- 
minals by  both  the  number  of  connect  hours  and  die 
number  of  shifts  during  which  the  user  is  willing  to 
"staff"  the  terminals.  Requirements  ranging  from  70 
SUP  hours  per  week  and  9 terminate  to  600  SUP 
hours  per  week  and  greater  than  30  terminals  can  be 
accommodated  by  the  single  large  computer  architec- 
ture. A mass  data  storage  facility  (MDSF)  of  greater 
than  30  billion  bytes  and  the  data  base  management 
software  to  manage  this  mass  storage  are  available 
large  machine  options. 

The  multiple-machine  configuration  of  figure  2 is 
another  option  for  accommodating  large  workloads. 
Generally,  in  this  architecture,  the  workload  is  dis- 
tributed functionally  among  the  multiple  computers; 
i.e.,  a computer  and  its  associated  facilities  are  dedi- 
cated to  a project  such  as  LACIE  or  to  a function 
such  as  data  acquisition,  data  management,  or  image 
display.  The  separation  between  functions,  and  thus 
computers,  may  be  based  on  passive  requirements 
(e.g.,  two  functions  require  little  or  no  interchange  of 
information  and  thus  may  be  separated)  or  active  re- 
quirements (eg.,  two  functions  may  interfere  with 
each  other  if  they  are  not  separated).  The  major  ac- 
tive requirement  for  separation  of  functions  in  data 
systems  generally  is  that  development  and  produc- 
tion activities  should  not  be  supported  by  the  same 
machines.  This  reasoning  holds  if  the  development 
activities  center  on  modifying  the  operating  system 
(and  are  thus  likely  to  cause  system  "crashes")  and  if 
the  production  activities  involve  large  on-line  soft- 
ware systems  (such  as  an  airline  reservation  system) 
or  time-oriented  batch  systems  (such  as  billing  or 
payroll).  This  type  of  reasoning  would  not  generally 
apply  in  an  Earth  resources  setting. 

Figure  3 is  an  example  of  a functional  allocation  of 
an  Earth  resources  workload.  The  JSC  workload 
categories  presented  in  table  II  have  been  converted 
to  specifications  for  a series  of  four  computers.  In 


^References  to  vendor*  arc  for  illustrative  purposes  only  and 
do  no)  constitute  an  endorsement  or  recommendation. 
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FIGURE  1.— Single  large-machine  architecture. 


such  a system,  the  software  development  component 
would  require 

1.  A general-purpose  computer  with  a capacity  of 
at  least  2?  SUP  hours  per  week3 

2.  A special-purpose  processor  available  1 1 hours 
per  week 

3.  An  alphanumeric  terminal  available  two  shifts 
per  week  (or  two  terminals,  each  available  one  shift 
per  week) 

4.  An  image  analysts  terminal  available  one  shift 
per  week. 

Although  one  would  probably  not  choose  to  dedi- 
cate expensive  resources  to  the  software  develop- 
ment function  (since  software  development  loads 
tend  to  vary  widely  over  time),  one  might  well 
choose  to  dedicate  physical  resources  to  a longer 
term  and  more  predictable  “production"  activity.  In 
any  case,  the  example  illustrates  that  workload 
description  and  architecture  selection  are  strongly  in- 
terrelated. 

Figure  4 depicts  a system  which  is  a slight 
modification  to  the  system  in  figure  2.  Here, 
resources  are  pooled,  intercommunication  between 


Hhe  specified  capacity  should,  in  fact,  be  higher  than  the 
planned  average  week  usage  to  accommodate  peaks  in  activity. 


AM  TERMINALS 


IMAGE  ANALYSIS  TERMINALS 


FIGURE  3.— Allocation  of  workload  lo  resources. 
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FIGURE  4. — But-orlrnled  multiple-machine  architecture. 


processors  is  provided  by  a wideband  width  com- 
munications bus.  and  terminals  are  permitted  to  in- 
teract with  any  of  the  processors.  Processors  may  be 
of  the  same  or  different  size  and  model,  depending 
on  the  sophistication  of  the  communications  pro- 
tocol. The  goal  of  this  architecture  is  to  add  to  a 
multicomputer  configuration  some  of  the  capability 
for  resource  sharing  among  functions  that  is  inherent 
in  a single-computer  system.  The  degree  of  sharing 
car  become  quite  high  if  considerable  investment  is 
made  in  operating  system  development,  as  some  pro- 
totype systems  have  shown  (ref.  7). 


OATA  SYSTEM  FACTORS 

The  one-time  and  recurring  cost  factors  associated 
with  a new  large  data  system  are  as  follows. 

I.  One-time  costs 

a.  Hardware  purchase  and  development 

b.  System  software  purchase  and  development 

c.  Applications  software  development  and 
conversion 

d.  Facility  modifications 

e.  Communications  installation 

f.  Training 

g.  Procurement  and  system  engineering 
support 

h.  System  integration  and  test 

\ Recurring  costs 

a.  Hardware  maintenance  and  software  lease 

b.  Operations,  system  management,  and 
support 

c.  Communications 

d.  Consumables 


Frequently,  only  the  one-time  costs  are  con- 
sidered in  planning  a system  and  choosing  an 
architecture.  It  should,  however,  be  realized  that  over 
the  expected  7-  to  10-year  life  of  a data  system,  the 
recurring  costs  of  maintenance  and  operations 
generally  represent  the  majority  of  the  investment  in 
the  system.  Thus,  to  estimate  the  cost-effectiveness 
of  a plumed  large  system,  all  cost  factors  should  be 
considered. 

Despite  the  last  admonition,  only  some  cf  the  pre- 
viously mentioned  cost  factors  are  considered  in  this 
paper.  The  factors  discussed  are  hardware  purchase, 
hardware  maintenance,  software  conversion,  and 
operations.  These  factors  were  chosen  because  they 
represent  major  cost  items,  because  they  may  be 
affected  by  choice  of  architecture,  or  because  they 
are  costs  that  are  usually  underestimated. 

To  compare  the  hardware  costs  of  data  systems  of 
different  architectures  requires  identification  of 
specific  candidate  processors.  Once  candidate 
machines  have  been  selected,  the  determination  of 
hardware  purchase  and  maintenance  costs  is  rather 
straightforward.  Most  hardware  vendors  will  gladly 
supply  detailed  pricing  information  on  their  pro- 
ducts. Other  sources  of  cost  data  are  industry-survey 
publications  (e.g.,  ref.  8)  and,  for  systems  to  be  pro- 
cured by  the  Government,  price  schedules  of  the 
General  Services  Administration.  Studies  by  the 
MITRE  Corporation  have  shown  that,  in  general,  an- 
nual maintenance  costs  for  large  mainframes  are  ap- 
proximately 5 percent  of  the  initial  purchase  price, 
whereas  maintenance  of  minicomputers  annually 
costs  10  to  1 5 percent  of  the  purchase  price.4 

The  difficulty  in  determining  hardware  costs  lies 
in  the  selection  of  appropriate  processors  to  be  evalu- 
ated. The  main  criterion  of  appropriateness  is  that 
the  candidate  machine  must  have  the  computational 
capacity  to  process  the  desired  workload.  A 
machine’s  capacity  can  be  estimated  in  a number  of 
ways,  varying  from  “hands-on”  testing  (application- 
specific  benchmarks  and  general-purpose 
benchmarks)  to  literature  review  (industry  surveys 
and  benchmark  reports).  Table  III  summarizes  some 
of  the  information  on  the  capacities  of  various 


^hn.  tun  be  keen  by  comparing  vendor  price  livlv  It  ha»  alu> 
been  demonctrated  empirically  by  a vurvey  of  20  vyvlemv 
throughout  JSC.  at  ditcioted  by  S Bcrihiaume  In  a briefing  en- 
titled "Inatilutional  Data  Syttemc  Divition  Long-Range  Planning 
Review,"  prevented  to  C C'.  Kraft.  Director  of  JSC.  on  December 
8.  1977. 
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machines.  The  table  compares  various  processors  in 
terms  of  both  the  number  of  SUP  hours  they  can  pro* 
vide  in  an  IS-shift  week  and  the  average  quantity  of 
concurrently  active  terminals  they  can  support  dur* 
ing  the  day  shift.  The  table  was  compiled  from 
several  sources  including  benchmark  runs  (refs.  9 to 
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12),  published  articles,  and  discussions  with  hard- 
ware vendors. 

The  information  in  table  111  was  used  to  identify 
candidate  EttDS  processors  in  the  MITRE  study  and 
can  be  used  in  other  studies.  For  example,  suppose 
that  an  Earth  resources  workload  has  been  identified 
that  requires  200  SUP  hours  per  week,  allocated  as 
follows: 

1 . SO  SUP  hours  of  routine  analysis  and  classifica- 
tion (production  work) 

2.  SO  SUP  hours  of  software  development 

S.  20  SUP  hours  of  quality  assurance  activities 

4.  40  SUP  hours  of  data  base  maintenance 

5.  SO  SUP  hours  of  experimental,  scientific, 
numerical  processing 

A potential  configuration  for  such  a workload 
would  be  as  follows: 

1.  One  “large"  computer  (with  a capacity  of  ap- 
proximately 90  SUP  hours)  with  a mass  data  storage 
facility  for  the  scientific  processing  and  data  base  ac- 
tivities 

2.  One  "small"  computer  (with  a capacity  of  ap- 
p oximately  SS  SUP  hours)  for  the  production  ac- 
tivities 

3.  A second  "small"  computer  (SS  SUP  hours)  for 
the  software  development  and  quality  assurance  ac- 
tivities 

Table  III  reveals  that  a system  consisting  of  an 
IBM  370/148  and  two  SEL  32/75’s  should  meet  the 
user's  needs.  IBM  and  SEL  vendors  could  then  be 
contacted  for  detailed  pricing  information.  The  user 
could  also  investigate  the  costs  of  establishing  com- 
munications among  the  three  computers. 

A cost  item  that  is  frequently  not  considered  and 
is  generally  underestimated  is  software  conversion. 
In  moving  from  a small  system  to  a large  one.  the 
user  generally  wishes  to  retain  all  the  capabilities  he 
had  before.  If  the  new  hardware  and  the  operating 
system  are  not  compatible  with  the  old.  the  applica- 
tions software  must  be  converted.  A recent  study  for 
the  Central  Computational  Facility  at  JSC  projected 
that  conversion  of  its  code  to  a new  system  would 
cost  approximately  $2.60  per  line  of  code  (ref.  13). 
This  cost  was  found  to  be  in  line  with  other  Govern- 
ment conversion  efforts,  which  ranged  from  $2  to  $7 
per  line  of  code.  The  total  impact  of  this  cost  item  on 
a particular  Earth  resources  center  planning  a new 
system  will  depend  on  the  center's  previous  invest- 
ment in  software.  However,  users  should  be  aware 
that  the  actual  costs  of  the  conversions  reviewed  in 
reference  1 3 ranged  from  $900  000  to  $$  000  000  and 
could  well  exceed  hardware  costs  in  some  cases. 
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The  final  cost  item  to  be  considered  here  is  opera* 
lions.  Large  data  systems  require  people  to  run  them. 
Computer  operators  are  needed  for  scheduling  jobs, 
mounting  tapes,  and  servicing  the  card  reader  and 
primer.  Clerks  are  needed  for  accepting  card  decks 
from  users  and  for  distributing  output.  Systems  pro- 
gramers  are  required  for  maintaining  the  operating 
system  and  other  software  packages.  Consultants 
should  be  available  to  aid  users  having  problems  with 
the  data  system.  Training  courses  may  be  offered  for 
new  users.  Systems  analysts  are  necessary  to  prevent 
bottlenecks  in  the  system.  Earth  resources  systems 
will  likely  have  a large  data  base  and  thus  will  need  a 
data  base  administrator  to  control  the  structure  and 
to  maintain  the  integrity  and  security  of  the  data 
base.  And,  of  course,  there  is  the  need  for  super- 
visors and  data  center  administration. 

All  these  people  represent  a considerable  cost. 
These  costs  may  be  “buried"  by  shifting  some,  or  all, 
of  the  necessary  activities  to  the  user  (e.g.,  forcing 
the  user  to  teach  himself  how  to  use  the  system  or  to 
mount  his  own  tapes  on  a minicomputer).  They  will 
still,  however,  represent  an  expense;  for  when  an 
analyst  is  mounting  a tape,  he  is  not  performing 
those  activities  for  which  he  was  hired.  Thus,  more 
analysts  will  have  to  be  hired  to  accomplish  *he  same 
total  analysis  workload. 


ARCHITECTURE  SELECTION  CRITERIA 

After  identification  of  expected  data  system 
usage,  hardware  alternatives,  and  system  costs,  the 
next  step  in  system  development  is  to  select  a system 
architecture.  The  selection  can  be  made  using  several 
criteria.  Common  selection  methods  rely  on  the 
following  criteria  or  on  some  weighted  combination 
of  these  criteria: 

1.  Cost-effectiveness  analysis  (Which  system  is 
the  least  expensive  way  to  support  a specific 
workload?) 

2.  Qualitative  factors  (Which  system  is  easiest  to 
use?  Which  offers  the  most  services?) 

3.  Specific  characteristics  (Is  the  system's  word 
length  at  least  24  bits?) 

4.  Performance  factors  (Which  system  has 
quickest  response  or  turnaround  times?) 

Each  of  these  sets  of  factors  can  be  subdivided  into  a 
larger  number  of  categories.  For  example,  one  author 
cites  50  possible  performance  measures  of  computer 


service  delivered  through  a remote  terminal.*  Selec- 
tion methods  can  thus  get  quite  involved. 

As  system  development  proceeds  from  feasibility 
study  through  evaluation  of  vendor  proposals,  the 
selection  proem  can  require  very  detailed  informa- 
tion on  the  characteristics  of  both  the  workload  and 
the  candidate  configurations.  Certainly,  any  pro- 
posed architecture  should  be  studied  analytically  or 
by  simulation  model  to  determine  necessary 
resource  capacities  and  communications  bottlenecks. 
This  evaluation  is  especially  im  wtant  in  the  bus- 
oriented  multimachine  architecture,  where  bus  con- 
tention may  become  the  factor  that  controls  system 
throughput.  For  example,  in  one  particular  case  (ref. 
14),  several  processor,  memory,  and  input/output 
buses  were  required  to  minimise  contention  delays 
in  a fairly  small  multimachine  configuration. 

The  choice  of  an  architecture  for  an  Earth 
resources  data  system  can  also  be  determined  by  fac- 
tors not  directly  related  to  either  the  Earth  resources 
project  to  be  supported  or  the  hardware  being  con- 
sidered. For  instance,  procurement  regulations  may 
determine  the  nature  of  a data  system  by  specifying 
what  type  of  approval  is  necessary  for  purchase  of 
computer  hardware.  Large  expenses  (a  large 
mainframe)  may  require  high-level  approval.  Small 
expenses  (a  minicomputer)  may  require  only  lower 
level  approval.  Thus,  the  construction  of  a configura- 
tion of  minicomputers  over  a span  of  several  years 
may  be  the  easiest  way  for  an  agency  to  obtain  a large 
data  system.  (It  should  be  noted  that  distribution  oi 
costs  over  time  is  not  a legitimate  reason  for  the 
purchase  of  a series  of  minicomputers,  since  the  cost 
of  a large  mainframe  can  also  be  distributed  over  a 
period  of  several  years.) 

A second  example  of  an  externally  imposed  deci- 
sion criterion  would  be  a company-wide  or  organiza- 
tion-wide decree  to  purchase  only  a certain  type  or 
size  of  computer.  The  idea  behind  such  a decree 
might  be  to  facilitate  movement  of  equipment  from 
5ocs»ion  to  location  as  projects  er«i  ind  begin. 

In  the  example  presented  in  the  following  section, 
a cost-effectiveness  analysis  approach  was  used  to 
evaluate  the  candidate  configurations.  External  fac- 
tors such  as  those  cited  were  not  considered, 
although  they  can  easily  influence  the  design  of  any 
Earth  resources  data  system. 


I)  Abram*  and  S Treu,  "A  Methodol,**  for  Interactive 
Cirmputer  Service*  Meavttremcni " In  *ommunicatu«t»  of  the 
ACM,  ur  be  publi*hed 
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SAMPLE  COMPARISON  OF 
CANDIDATE  CONFIGURATIONS 

It  may  be  instructive  to  demonstrate  the  concepts 
of  the  preceding  sections  by  briefly  describing 
preliminary  planning  for  a proposed  data  system  for 
future  JSC  resources  programs  (the  ERDS). 

Proposed  Earth  resources  programs  at  JSC  in  the 
late  1970  s through  the  mid-1980's  suggest  a 
workload  of  200  to  400  SUP  hours  per  week  and  the 
need  for  14  to  20  terminals.  The  range  in  the 
workload  is  due  to  program  options,  uncertainty 
about  which  programs  will  be  funded,  and  specula- 
tion about  how  Landsat-D  data  will  be  used  to 
achieve  various  program  objectives. 

To  support  this  work  ai-L  data  systems  using  each 
of  the  three  architecture-  icscrihed  earlier  were  pro- 
posed. Because  of  the  workload  range,  four  options 
of  each  of  the  three  architectures  were  studied.  In 
order  to  achieve  a realistic  comparison  of  architec- 
tures, the  specific  hardware  selected  represented, 
where  possible,  computer  vendors  and  models  that 
are  currently  in  use  for  Earth  resources  applications. 

In  particular,  the  SEL  32/75  was  chosen  as  a repre- 
sentative 32-bit  minicomputer  for  Earth  resources 
applications.  (The  less  powerful  SEL  32/55  is  in- 
stalled at  the  Earth  Resources  Observation  System 
Data  Center  in  Sioux  Falls,  South  Dakota,  as  the 
EROS  Digital  Image  Processing  System.) 

The  IBM  303x  series  architecture  (and  the  plug- 
compatible  AMDAHL  central  processor)  was 
chosen  as  representative  of  large  mainframe  Earth 
resources  hardware.  Such  hardware  supports  Earth 
resources  applications  at  several  locations  (e.g.,  at 
LARS). 

For  the  multimachine  architecture  option  requir- 
ing a communications  bus,  MITRE  chose  the  off- 
the-shelf  Network  Systems  Corporation  bus  as  the 
intercomputer  interface.  This  communications 
system  offers  a wide-band  data  path  of  1.5  to  50 
megabits  per  second  (depending  on  the  bus  length), 
which  can  be  used  to  interconnect  computers  and 
high-speed  peripherals  such  as  disk  and  image 
analysis  terminals. 

Each  configuration  was  priced  according  to  ven- 
dor price  lists  for  purchase  (one-time)  and  mainte- 
nance (recurring).  Tables  IV  and  V show  the  costs 
for  each  of  the  three  architectures  at  the  four  possible 
design  points:  (1)  190  SUP  hours  per  week  (config- 
uration A),  (2)  245  SUP  hours  per  week  (config- 


uration B),  (3)  300  SUP  hours  per  week  (configura- 
tion C),  and  (4)  410  SUP  hours  per  week  (configura- 
tion D). 

Estimated  operations  costs  are  given  in  table  V. 
These  costs  included  the  use  of  either  two  or  three 
operators  per  shift  (depending  on  the  complexity 
and  size  of  the  data  system)  on  an  18-shift/week 
basis,  system  management  and  support  services,  and 
consumables. 

Integration  costs  are  not  included  in  this  example 
because  of  the  somewhat  arbitrary  nature  of  the 
available  estimates  and  the  lack  of  a rigorous  estima- 
tion procedure.  Moreover,  integration  costs  should 
be  architecture-dependent  (i.e.,  integrating  a 
multicomputer,  multivendor  system  should  cost 
more  than  integrating  a single  large-computer 
system);  however,  there  was  no  clear,  precise  way  of 
quantifying  this  dependence.  One-time  integration 
costs  will,  however,  be  large.  One  rule  of  thumb 
states  that  the  cost  of  integration  will  be  about  40  per- 
cent of  the  cost  of  initial  hardware  and  software  ac- 
quisition. 

The  life  cycle  costs  of  each  data  system  configura- 
tion, priced  using  the  one-time  and  recurring  cost  fac- 
tors discussed  previously,  are  presented  in  table  VI. 
The  last  column  of  this  table  presents  a dollars-per- 
SUP  hour  figure  of  merit  for  each  configuration.  This 
figure  of  merit  is  plotted  versus  configuration  size 
and  architecture  in  figure  5.  Several  conclusions  are 
immediately  evident: 

1.  An  SUP  hour  costs  approximately  $10  less  on 
the  large  single-machine  architecture  than  on  the 
basic  multimachine  architecture. 

2.  An  SUP  hour  costs  approximately  $20  less  on 
the  large  single-machine  architecture  than  on  the 
bus-oriented  multimachine  architecture. 

3.  The  cost  per  SUP  hour  in  all  three  architectures 
decreases  as  the  configuration  grows. 

Considering  the  cost  data  in  the  light  of  both  the 
high  costs  of  operating  the  dispersed  data  system  cur- 
rently in  use  for  Earth  resources  work  at  JSC  and  the 
R&D,  software-development-oriented  nature  of  the 
JSC  workload,  it  was  recommended  that  JSC  consoli- 
date its  Earth  resources  data  processing  and  establish 
a new  data  system  using  the  large  mainframe 
architecture.  These,  however,  are  not  blanket  recom- 
mendations for  all  Earth  resources  data  systems;  a 
particular  Earth  resources  organization  may  have 
legitimate  reasons  for  selecting  any  of  the  three 
architectures  despite  the  trends  shown  in  figure  5. 
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Tabu  K — Estimated  Operations  Cost 
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11 
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FIGURE  S. — Co st /perform* nee  comparison  of  architectures. 


CONCLUSIONS 

Data  systems  in  support  of  remote-sensing  ap- 
plications are  getting  larger.  Many  applications  can 
no  longer  be  satisfied  with  a single  minicomputer  in  a 
convenient  laboratory  operated  prime  shift  by  the 
analyst.  As  the  trend  continues,  the  Earth  resources 
community  will  be  forced  to  consider  both  the  com- 
plexities and  the  efficiencies  offered  by  large  data 
systems,  operated  three  shifts  a day,  by  operations 
personnel.  Data  centers  in  support  of  Earth 
resources  applications  will  appear,  and  users  will  be 
increasingly  separated  from  the  actual  computational 
resources.  On  the  other  hand,  image  terminal 
resources  will  become  increasingly  available  in  user 
work  areas  or  even  at  the  user's  own  desk. 

This  environment  already  exists  and  is  accepted 
by  the  low-bandwidth  alphanumeric  terminal  user. 
The  rapid  advances  in  raster-scan  image  terminal 
technology,  accompanied  by  the  rapid  decline  in  cost 
of  the  refresh  memory  required  by  raster-scan 
systems,  make  the  extension  of  this  environment 
available  even  now  to  Earth  resources  users. 

Significant  economies  of  scale  will  result  if  the 
diverse  data  systems  supporting  Earth  resources  at 
an  installation  are  combined  into  a single  data  center. 
This  paper  has  presented  several  architectures  and 
associated  costs  for  the  large  data  system  supporting 
such  a data  center.  Recurring  cost  factors  (mainte- 
nance and  operations)  currently  slightly  favor  the 
single  large-machine  architecture,  but  other  factors 
may  dictate  the  choice  of  one  of  the  two 
multimachine  architectures  discussed. 

The  construction  of  a quantitative  workload 
model  in  support  of  a data  system  acquisition 


(ERDS)  has  been  demonstrated;  any  user  con- 
templating a data  system  acquisition  should  do  the 
same. 
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Experiment  Results  and  Accuracy 


FOREWORD 

This  session  presents  a detailed  account  of  the 
LACIE  results.  Comprehensive  assessments  of 
LACIE  performance  in  production,  area,  and  yield 
estimation  are  described.  Emphasis  is  placed  on  an 
assessment  of  the  accuracy  of  the  LACIE  estimates 
in  terms  of  error  source  isolation  and  the  resulting 
corrective  measures  applied  throughout  LACIE. 

The  LACIE  was  conducted  in  three  phases.  Phase 
I,  the  1974-75  crop  year,  was  devoted  to  developing 
the  experimental  apparatus;  assembling  data  bases  of 
historical  agronomic  and  weather  data;  developing 
sampling  approaches,  yield  models,  and  specific  pro- 
cedures for  handling  and  analyzing  Landsat  data;  and 
training  people.  Preliminary  testing  and  evaluation 
of  the  U.S.  Great  Plains  region  was  also  ac- 
complished during  this  phase. 

During  Phase  II,  the  1975-76  crop  year,  the  tech- 
nology as  modified  in  Phase  1 was  evaluated  again  in 
the  U.S.  Great  Plains  region,  in  the  prairie  provinces 
of  Canada,  and  in  both  a spring  wheat  and  a winter 
wheat  region  in  the  U.S.S.R.  Exploratory  studies  of 
wheat  identification  and  yield  model  tests  were  con- 
ducted in  five  other  wheat  regions:  India,  the  Peo- 
ple's Republic  of  China,  Australia,  Argentina,  and 
Brazil. 

In  Phase  III,  crop  year  1976-77,  the  evaluation  in 
the  U.S.  Great  Plains  region  was  repeated,  and  the 
region  covered  in  the  U.S.S.R.  was  expanded  to  pro- 
duce total  country  estimates.  The  coverage  in  Canada 
was  reduced  to  30  segments.  The  Canadian  investiga- 
tors collected  ground  observations  for  further  evalua- 
tion of  the  problems  identified  in  Phase  II.  Changes 
made  before  and  during  the  1976-77  crop  year  were 
thought  to  comprise  significant  improvements. 
These  included  the  implementation  of  a new 
machine  classification  process  (known  as  Procedure 
1),  an  improved  stratification  of  the  regions  to  be  in- 
ventoried, relocation  of  selected  samples, and  revised 
wheat  yield  models. 

The  papers  in  this  session  provide  the  detailed 
results  from  the  three  phases  of  LACIE.  Included  are 


papers  describing  the  growing  conditions  under 
which  LACIE  estimates  were  made  and  the  accuracy 
and  performance  of  the  production,  area,  yield,  and 
crop  growth  stage  estimates. 

The  three  crop  years  of  LACIE  have  been  marked 
by  a wide  variety  of  weather  conditions  in  the 
regions  of  interest.  “Tne  LACIE  Crop  Years:  An 
Assessment  of  the  Crop  Conditions  Experienced  in 
the  3 Years  of  LACIE"  describes  the  wheat-growing 
conditions  in  each  crop  year  for  each  region  for 
which  LACIE  estimates  were  made.  "Application  of 
Landsat  Digital  Data  for  Monitoring  Drought"  de- 
scribes the  drought-monitoring  capability  of  Landsat 
data  when  used  with  the  wheat  growth  stage  at  the 
time  of  the  Landsat  acquisition. 

The  three  papers  describing  "LACIE  Area,  Yield, 
and  Production  Estimate  Characteristics"  present 
the  LACIE  estimates  made  during  each  year  for  the 
U.S.  Great  Plains,  the  U.S.S.R.,  and  Canada.  The 
papers  compare  official  country  estimates  to  the 
LACIE  estimates  and  evaluate  the  LACIE  estimates 
with  respect  to  the  90/90  criterion.  They  describe  the 
estimates  with  respect  to  the  scope,  reporting 
schedule,  sampling  scheme,  and  associated  problems 
of  each  LACIE  phase. 

A more  thorough  evaluation  of  the  area  estima- 
tion error  sources  in  the  US,  Great  Plains  region  is 
given  in  “Accuracy  and  Performance  Characteristics 
of  LACIE  Area  Estimates."  This  paper  presents  the 
results  of  the  more  detailed  investigations  based  on 
ground  observations  obtained  in  the  United  States 
during  the  three  phases  of  LACIE. 

“Accuracy  and  Performance  of  LACIE  Yield 
Estimates  in  Mqor  Wheat  Producing  Regions  of  the 
World”  briefly  reviews  the  yield  modeling  approach 
and  evaluation  methodology.  The  results  of  testing 
and  evaluation  of  the  operational  yield  models  are 
presented  with  emphasis  on  the  U.S.  Great  Plains 
region  using  the  most  recent  10  years  of  historical 
data  as  an  independent  test  set.  The  modifications  of 
the  models  and  the  reasons  for  changes  are  also  ad- 
dressed in  the  order  of  occurrence  throughout 
LACIE. 
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"Accuracy  and  Performance  of  LACIE  Crop 
Development  Models"  describes  the  application  of 
such  models  to  LACIE  needs  and  gives  an  evalua- 
tion of  their  performance.  The  form  of  the  original 
spring  wheat  development  models  is  briefly  dis- 
cussed, and  the  modifications  required  for  winter 
wheat  are  presented.  Further  desired  improvements 
are  described  in  light  of  the  performance  evaluation 
conducted  during  LACIE. 

Finally,  "Economic  Evaluation:  Concepts,  Se- 
lected Studies,  System  Costs,  and  a Proposed  Pro- 


gram" presents  a conceptual  framework  for  estimat- 
ing the  value  of  improved  information  and  an  over- 
view of  completed  studies  focused  on  identifying  and 
quantifying  benefits  resulting  from  improved  infor- 
mation. Comparisons  of  the  costs  of  current  and 
satellite-based  crop  information  systems  are  made. 
The  shortcomings  of  current  systems,  which  are  the 
strengths  of  a satellite-based  system,  are  illustrated. 
In  addition,  a proposed  economic  evaluation  pro- 
gram for  a satellite-based  crop  production  estimation 
system  is  discussed. 
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The  LACIE  Crop  Years:  An  Assessment  of  the  Crop 
Conditions  Experienced  In  the  3 Years  of  LACIE 

/.  D.  HHPandD.  R.  Thompson1* 


INTRODUCTION 

The  LACIE  undertook  the  task  of  testing,  evaluat- 
ing, and  developing  the  technology  needed  to  utilize 
remote  sensing  and  associated  information  for 
assessing  potential  wheat  production  globally.  The 
project  extended  over  three  crop  years,  from  fall 
plantings  in  1974  to  summer  harvests  in  1977.  The 
project  scope  was  confined  to  the  U.S.  Great  Plains 
during  the  1974-75  crop  year,  then  expanded  to  in- 
clude Canada  and  the  UJ5.S.R.  during  the  latter 
phases.  As  one  would  expect,  LACIE  encountered  a 
variety  of  growing  conditions  under  which  to 
develop  and  test  its  technical  approach. 

In  order  to  assess  the  LACIE  results,  it  is  impor- 
tant to  keep  in  mind  the  crop  growing  conditions 
under  which  the  results  were  obtained.  For  this 
reason,  a crop  condition  assessment  team  was 
organized  as  an  ad  hoc  group  drawing  on  the 
agronomic,  meteorological,  and  other  expertise  in 
the  various  project  elements.  The  team  used 
meteorological  data  and  Landsat  spectral  data  from 
the  growing  regions  to  make  their  assessments  of  the 
conditions  and  document  where  anomalies  such  as 
drought,  floods,  and  freezes  were  having  an  impact 
on  the  crop  yield  and  appearance.  Weather  data 
available  to  make  the  assessments  included  precipita- 
tion totals  and  average  temperatures  for  periods  of  a 
month,  as  well  as  for  shorter  periods  of  7 or  10  days. 

In  the  United  States,  the  weekly  rainfall  and  tem- 
perature data  were  used  to  estimate  soil  moisture, 
which  was  then  related  to  crop  needs  by  a Crop 
Moisture  Index  (CMI).  This  index  relates  the  avail- 
able water  to  the  usual  supply  for  each  week  during 
the  growing  season.  The  index  has  been  normalized 
so  that  indexes  in  the  range  from  -1.0  to  1.0  repre* 


8NOAA  Environmental  Data  and  Information  Service. 
Houston,  Texas. 

bNASA  Johnson  Space  Center,  Houston,  Texas. 


i pinugrjphy  may  Os  gurcftasal  from* 
l Oats  Center 


sent  typical  moisture  supplies.  Larger  positive  index- 
es indicate  surplus  water,  while  larger  negative  values 
represent  deficiencies.  Weekly  maps  of  the  CMI 
were  used  to  infer  the  general  moisture  situation  in 
the  various  crop  growing  regions  of  the  Uuited 
States. 

Landsat  color-infrared  full-frame  images  were 
used  to  determine  the  existence  of  drought  and 
assess  its  areal  extent.  The  Landsat  digital  data  was 
transformed  into  a Green  Index  Number  (GIN), 
which  was  useful  in  defining  the  degree  of  drought 
and  the  extent  of  drought-stricken  regions  during  the 
final  two  phases  of  LACIE.  The  GIN  is  a technique 
utilizing  transformed  Landsat  digital  data  for  detec- 
tion of  agricultural  vegetative  water  stress.  It  pro- 
vides a procedure  whereby  Landsat  data  from  a 
LACIE  segment  can  be  classified  as  drought  affected 
or  not. 

As  might  be  expected,  the  multiyear  time  period 
and  the  large  area  with  which  LACIE  concerned  it- 
self presented  opportunities  for  extremes  to  be  en- 
countered. The  project  made  no  attempt  to  set  the 
extremes  apart  as  nonrepresentative  but  considered 
them  as  cases  where  the  technology  would  have  its 
most  stringent  performance  tests.  The  period  from 
1974  to  1977  was  one  in  which  the  LACIE  countries 
experienced  a balance  of  extremes,  ranging  from 
serious  drought  in  South  Dakota  during  1976  to  over- 
abundant rain  in  the  European  U.S.S.R.  during  1977. 
The  following  discussion  will  provide  more  detoil 
about  the  specific  growing  seasons  encountered  in 
each  LACIE  country. 


PHA8EI 

The  first  phase  of  LACIE  consisted  of  assembling 
candidate  technology  for  both  crop  acreage  and  yield 
estimation.  The  test  region  covered  the  winter  wheat 
area  of  the  U.S.  Great  Plains  planted  in  the  fall  of 
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1974  and  the  spring  wheat  area  planted  in  the  spring 
of  197$.  Within  this  growing  region,  there  are  many 
factors  which  affect  plant  density  and  the  final  crop 
yields.  For  winter  wheat,  moisture  is  critical  when 
plants  are  established  in  the  fail  and  when  regrowth 
begins  following  dormancy.  Water  is  also  critical  to 
the  development  of  spring  wheat;  the  crop  depends 
heavily  on  preseason-stored  moisture  as  well  as  pre- 
cipitation after  planting.  For  both  crops,  moisture 
plays  a key  role  in  accurate  identification  in  the 
Landsat  imagery  since  the  analysts  rely  extensively 
on  characteristic  unstressed  crop  signatures. 


Winter  Wheat 

During  the  fall  of  1974,  the  U.S.  Great  Plains  had 
near  normal  soil  moisture  at  planting.  Subsequent 
rainfall  was  adequate  for  establishment  in  the 
Southern  Great  Plains,  but  dryness  was  notable  in 
the  winter  wheat  areas  of  northeastern  Colorado, 
Nebraska,  and  Montana.  The  winter  wheat  crop  en- 
tered dormancy  in  good  condition  throughout  the 
Southern  Great  Plains  and  fair  to  good  condition  in 
northern  areas. 

Winter  temperatures  were  near  to  or  slightly 
below  normal,  and  cold  injury  was  minimal  when 
compared  to  that  of  other  years.  Some  wind  damage 
was  notable,  however,  in  parts  of  eastern  Colorado 
and  western  Nebraska  where  fall  rains  had  been 
below  normal  and  the  dry  soil  was  prone  to  blowing. 
Wind  erosion  was  also  reported  in  western  Kansas 
and  the  Panhandle  portions  of  Oklahoma  and  Texas. 

Across  the  entire  Great  Plains,  cool  temperatures 
from  March  through  May  slowed  regrowth  of  the 
wheat  after  dormancy.  Rainfall  from  March  through 
May  was  generally  100  to  ISO  percent  of  normal 
throughout  the  Great  Plains,  except  for  Nebraska, 
southeastern  Colorado,  and  the  western  Texas 
Panhandle,  which  received  little  more  than  half  the 
normal  amount.  Those  ar^as  were  further  affected  by 
wind  damage  during  May.  Greenbugs  and  local  soil 
disease  were  reported  in  central  portions  of  the 
winter  wheat  area,  but  there  were  no  widespread 
serious  disease  or  insect  problems.  In  general,  by 
spring,  most  of  the  wheat  was  in  fair  to  good  condi- 
tion, except  in  Colorado  where  moisture  remained 
critically  short. 

Just  before  harvest,  late  spring  and  early  summer 
showers  created  problems.  On  the  first  of  June, 
heavy  thunderstorms  caused  some  flooding  in  south- 
western and  central  Oklahoma,  while  hail  was 


responsible  for  lodging  in  parts  of  Texas.  Harvest 
was  slowed  by  rains  in  Kansas,  Oklahoma,  and 
Texas,  with  hail  damage  and  lodging  reported  to  be 
greater  than  normal.  Rains  in  June  were  timely 
enough,  however,  to  aid  the  Colorado  wheat,  which 
was  still  in  the  grain-filling  stage. 

The  crop  season  LAC1E  encountered  in  the  U.S.  . 
winter  wheat  area  during  1974-75  was  ideal  for  the 
first  experience  of  the  project.  A wide  variety  of  crop 
conditions  was  experienced,  but  there  were  no 
widespread  problems  of  a catastrophic  nature.  Good 
weather  for  establishment  and  postdormancy  growth 
allowed  a normal  progression  of  crop  signatures,  and 
the  technology  was  tested  in  what  might  be  con- 
sidered a “most  likely"  case. 


Spring  Wheat 

In  the  spring  wheat  area  of  the  Northern  U.S. 
Great  Plains,  seeding  was  delayed  2 to  3 weeks  by  fre- 
quent rain  and  wet  fields.  When  the  wheat  was 
finally  planted,  it  emerged  in  fair  to  good  condition, 
but  the  ample  soil  moisture  promoted  shallow  root 
development.  Timely  rains  maintained  good 
moisture  through  June,  with  very  heavy  rains  occur- 
ring over  eastern  North  Dakota  and  western  Min- 
nesota on  June  28-29.  Considerable  local  flooding  oc- 
curred in  portions  of  the  Red  River  Valley.  During 
July,  hot,  dry  weather  stressed  the  shallow-rooted 
crops  and  forced  early  maturity.  July  temperatures 
averaged  6°  (F)  above  normal  in  some  areas.  Fre- 
quent showers  recurred  at  harvest  and  may  have 
caused  some  lodging  losses  during  that  critical 
period. 

Even  though  some  local  flooding,  insect,  hail, 
drought,  and  disease  problems  were  reported,  they 
were  not  widespread  episodes  likely  to  cause 
anomalous  crop  appearance  over  large  areas.  The  of- 
ficial USDA  yield  estimates  indicated  near-normal 
crop  conditions.  These  estimates  are  shown  in 
table  I. 


PHA8EII 


U.8.8.R. 

During  Phase  II  of  LACIE,  the  technology  to  esti- 
mate wheat  production  was  applied  in  the  principal 
winter  and  spring  wheat  growing  areas  of  the 
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Table  L— 1975  (LACIE  Phase  1) 
V.S.  Great  Plains  Official  Wheat  Yield# 


State 

l97SyMd, 

bu/acre 

4-year 
(I972-7S) 
average  yield, 
bu/acre 

Whiter  wheat 

Tixi* 

23.0 

26.4 

Oklahoma 

24.0 

24.4 

Kansas 

29.0 

31.5 

Colorado 

22.S 

24.3 

Nebraska 

32.0 

34.4 

South  Dakota 

30j0 

30.4 

Montana 

35.0 

30.3 

Spring  wheat 


Minnesota 

31.0 

32.4 

North  Dakota 

29.9 

23.0 

South  Dakota 

18.0 

183 

Montana 

25.8 

23.2 

*USDA  Economics,  Sutijla.  and  Coopcnlive  Service. 


U.S.S.R.  These  indicator  regions  are  shown  in  figure 
1 and  comprised  approximately  83  percent  of  total 
Soviet  winter  wheat  and  37  percent  of  the  spring 
wheat  production  in  1971,  the  latest  year  for  which 
reliable  statistics  were  available. 

Winter  wheat.— I n the  winter  wheat  area,  dry  fall 
weather  allowed  planting  to  proceed  on  schedule. 
During  crop  establishment,  weather  conditions  re* 


FIGURE  I.— UJ5.S.R.  winter  and  spring  wheat  Indicator 
regions. 


mained  dry  and  soil  moisture  became  short.  A region 
of  the  important  Ukraine  area  had  less  than  2S  per* 
cent  of  normal  soil  moisture  at  the  end  of  October. 

During  the  winter  dormancy  period,  precipitation 
was  near  to  or  slightly  below  normal;  however,  snow 
cover  was  more  extensive  than  usual.  By  the  rad  of 
January,  snow  cover  began  retreating  from  the 
southern  portion  of  the  winter  wheat  area,  and  ex* 
tremely  cold  temperatures  throughout  the  first  week 
in  February  caused  cold  injury  to  the  exposed  crop  in 
the  region  east  of  the  Black  Sea. 

After  dormancy,  the  winter  wheat  area  received 
ample  rainfall  and  mild  temperatures  provided 
favorable  conditions  for  crop  development  The  ad* 
verse  conditions  of  fall  establishment  were  com* 
pletely  reversed,  and  there  were  no  serious  soil 
moisture  shortages  during  the  heading  and  grain- 
filling  periods.  The  nearly  ideal  spring  weather  was 
reflected  in  the  final  average  yield  for  Soviet  winter 
wheat  which  was  27  qi/ha,  a near  record.  These  very 
high  yields  show  how  winter  wheat  can  rebound  in 
the  spring  after  undergoing  very  poor  conditions  dur* 
ing  establishment.  This  situation  indicates  caution 
must  be  used  in  making  Judgments  of  final  crop  yield 
on  the  basis  of  early*season  growing  conditions  since 
the  wheat  plant  can  recover  from  extreme  conditions 
if  not  completely  killed. 

Spring  wheat.— The  spring  wheat  area  did  not 
receive  the  early  rains  which  benefited  the  winter 
wheat  crop.  Both  April  and  May  were  drier  than  nor- 
mal in  the  area  east  of  the  Ural  Mountains.  By  mid- 
June,  soil  moisture  was  below  normal  in  all  but  the 
central  portion  of  the  growing  area  LACIE  was  in- 
vestigating. During  the  entire  period  from  April 
when  the  crop  was  being  established  through  July 
when  it  was  heading,  rainfall  was  less  than  normal 
except  in  the  extreme  northern  and  eastern  sections 
of  the  indicator  region.  The  dryness  persisted 
through  the  harvest  season  and  minimized  possible 
harvest  losses. 

Spring  wheat  grown  in  the  European  portion  of 
the  U.S.S.R.,  outside  the  LACIE  test  region,  ex- 
perienced better  moisture  distribution  than  that  in 
the  Asiatic  portion  throughout  the  entire  growing 
season.  As  a result  of  more  favorable  conditions  over 
this  mqjor  spring  wheat  producing  area,  yields  for  the 
entire  Soviet  spring  wheat  crop  averaged  13  ql/ha,  a 
near  record.  No  detailed  yield  statistics  are  available 
yet  to  characterize  the  impact  of  the  dryness  on  the 
37  percent  of  the  crop  for  which  LACIE  prepared 
estimates  or  to  verify  whether  it  was  truly  an 
anomaly  within  the  country. 
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Canada 

During  Phase  II,  LACIE  technology  wa*i  tested  in 
the  spring  wheat  growing  area  of  the  Canadian 
prairie  provinces.  This  area  is  contiguous  to  the 
northern  border  states  of  the  United  States  and 
covers  the  southern  portions  of  Alberta, 
Saskatchewan,  and  Manitoba,  from  the  Canadian 
Rockies  to  Lake  Winnipeg.  The  region  is  outlined  in 
figure  2. 

Wheat  grown  in  the  Canadian  prairie  provinces  is 
normally  planted  during  April  and  May  and  depends 
on  both  preseason  and  growing-season  moisture. 
During  1976,  precipitation  was  near  normal  from 
January  through  June  and  provided  adequate  soil 
moisture  reserves. 

During  the  planting  season,  rainfall  was  near  to  or 
below  normal  and  most  of  the  crop  was  seeded  with- 
out major  delays.  Precipitation  during  the  April-May 
establishment  period  was  less  than  half  of  normal, 
but  rains  improved  in  June  and  encouraged  good 
growth. 

As  the  wheat  progressed  into  the  critical  heading 
and  grain-filling  stages,  July  precipitation  became 
very  erratic;  however,  the  early-season  moisture  was 
apparently  sufficient  to  support  the  crop.  Wheat 
stands  were  excellent,  with  only  light  infestations  of 
disease  and  insects. 

By  late  July,  the  wheat  heads  were  filling  well,  but 
hot,  dry  weather  in  early  August  caused  premature 
ripening  in  parts  of  Saskatchewan  and  Manitoba. 
Generally  dry  conditions  favored  harvest  during 
August  throughout  all  regions  except  portions  of 


FIGURE  2.— Outline  nup  of  the  Canadian  prairie  province* 
showing  wheal  area  shaded. 


Alberta.  Heavy  rains  and  hail  in  the  Peace  River 
region  during  late  August  caused  extensive  lodging. 

The  1976  spring  wheat  growing  season  in  the 
Canadian  prairie  provinces  was  very  good.  Moisture 
was  never  seriously  limiting  and  insect  and  disease 
damage  was  minimal.  There  were  no  widespread  har- 
vest losses.  As  a result  of  these  good  growing  condi- 
tions, yields  of  spring  wheat  were  above  average.  The 
final  1976  yields  officially  released  by  the  Canadian 
government  are  shown  in  table  II. 


U.S.  Great  Plain* 

Winter  wheat.— The  winter  wheat  area  of  the  U.S. 
Great  Plains  from  South  Dakota  to  the  Texas 
Panhandle  had  rainfall  that  was  below  normal  during 
the  summer  of  197S.  Consequently,  soil  moisture 
was  short  during  fall  planting.  From  September 
through  November,  rainfall  in  the  growing  region 
totaled  2 to  4 inches,  which  was  generally  only  SO  to 
75  percent  of  normal  (fig.  3).  Wheat  seeding  pro- 
ceeded on  schedule,  but  emergence  was  below  nor- 
mal because  of  the  dryness.  Some  rainfall  was 
received  during  November,  but  cooler  than  average 
temperatures  limited  plant  growth  and  the  chance 
for  improved  establishment.  Wheat  condition  varied 
widely  from  state  to  state,  but  the  area  most  seriously 
affected  by  drought  was  southwestern  Kansas,  south- 
eastern Colorado,  and  the  Panhandle  regions  of 
Oklahoma  and  Texas. 

Through  the  winter  months,  the  weather  con- 
tinued to  be  dry  with  liule  or  no  rainfall  reported  in 
the  Southern  Great  Plains.  From  December  through 
February,  little  or  no  snow  cover  was  received  and 
the  dry  soil  became  vulnerable  to  serious  erosion. 
Farmers  in  Colorado,  Kansas,  western  Oklahoma, 


Table  II.— Official  1976  Canadian  Yields 
With  Comparison? 


Province  Final  yield,  butane  Average  yield, 

• butane. 

im  ms  ms-74 


Manitoba 

27.; 

25.2 

25.3 

Saskatchewan 

31 .2 

25.6 

23.2 

Alberta 

32.7 

29.9 

26.1 

Hiuiiucs  Canada 
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M(il  Kt  .1. — I .S.  prt'cipilaliun  (percent  «f  normal),  September  lo  Notrmbrr  1175. 


and  the  Texas  Panhandle  began  to  abandon  large 
areas  and  plow  the  land  to  control  wind  erosion  The 
dryness  was  accompanied  by  temperatures  which 
averaged  4°  to  6°  above  normal  for  some  areas.  This 
not  only  increased  the  effect  of  the  dryness  but  en- 
couraged the  development  of  insects,  particularly 
greenbugs 

Following  dormancy  , crop  conditions  were  fair  to 
poor  in  the  Southern  Great  Plains,  however,  the 
northern  growing  regions  had  received  normal  fall 
rainfall  and  winter  snows,  which  provided  excellent 
conditions  for  winter  wheat  Very  warm  tem- 
peratures from  February  through  March  pushed  crop 
development  ahead  of  normal,  and  the  Texas  crop 
ripened  under  considerable  moisture  stress  Ample 
rains  began  in  early  April  at  about  the  last  oppor- 
tunity lo  benefit  wheat  in  Oklahoma  and  Kansas 
Conditions  improved,  but  stands  were  thin  and 
weeds  became  competitive. 

The  unusual  growing  conditions  caused  the  wheat 
to  appear  much  different  from  normal  in  the  Landsat 
imagery . and  analysts  were  not  able  to  correctly  iden- 
tify many  of  the  fields,  particularly  in  Oklahoma  As 


a result,  the  acreage  estimates  were  consistently 
below  the  actual  values  in  that  area  The  FACIE 
yield  estimates  were  also  low.  particularly  in  the 
Panhandle  regions  of  Oklahoma  and  Texas.  This  w as 
due  to  the  large  amount  of  dry  land  area  abandoned, 
leaving  a . .-eater  than  normal  proportion  of  the  pro- 
duction to  the  high-yielding  irrigated  area. 

Ra>ns  benefited  wheat  in  the  Northern  Great 
Plains  during  April,  and  the  entire  area  experienced 
cool  May  temperatures.  The  cooler  weather 
prolonged  the  grain-filling  period  and  reduced 
moisture  demands  By  mid-May,  the  general 
moisture  situation  had  improved  except  in  eastern 
Colorado  and  western  Texas,  which  were  still  ex- 
periencing shortages  The  cool  weather  brought  a late 
free/e  to  eastern  Kansas  on  May  3.  when  tem- 
peratures fell  to  28‘  F.  The  w heat  w as  in  the  jointing- 
heading stage  and  losses  were  estimated  to  be  as 
much  as  50  percent  in  some  counties  A late  free/c 
also  occurred  in  South  Dakota  during  mid-May,  caus- 
ing severe  damage  lo  winter  wheat,  which  was 
mostly  in  the  boot  stage 

The  cool  temperatures  continued  from  May  into 


REPRODUCIBILITY  OF  THE 
ORIGINAL  PAGE  IS  POOR 


415 


ABOVE  3 0 EXCESSIVELY  WET  SOME 

flooding 

2 0 TO  3 0 TOO  WET  SOME  STAND 
ING  WATER 

10  TO  2 0 FAVORABLE  SOME  FIELDS 
TOO  WET 

0 TO  1 0 FAVORABLE  FOR  NORMAL 
GROWTH 


0 TO  10  TOPSOIL  MOISTURE  SHORT 
RAIN  NEEOEL! 

1 0 TO  2 0 TOO  DRV  DETERIORATING 

PROSPECTS 

2 0 TO  3 0 tOO  DRY.  Y'ELD  PROSPECTS 

REDUCED 

3 0 TO  4 0 POTENTIAL  YIELDSCUT  BY 

DROUGHT 

BELOW  4 0 EXTREMELY  DRY.  MOST 
CROPS  RUINED 


Mt>(  Kl  4 — < rup  MiiKlurr  Index  For  June  |Y,  IW 


Juncund  slowed  ripening  in  ihe  Central  Plains.  Har- 
vest activity  loll  slightly  behind  normal  as  a result 
Rains  caused  some  local  harvest  delays  and  were 
responsible  for  losses  in  southeastern  and  south- 
central  Kansas  on  July  1 Heavy  ram  on  that  date 
caused  Hooding  and  lodging  of  the  grain  n that  are;1. 
In  most  other  areas,  the  winter  wheat  harvest  season 
experienced  dry  weather  to  expedite  the  combining 
w:'h  a minimum  of  harvest  losses 

Spring  wheal. — In  the  spring  wheat  areas  of  the 
(ireat  Plains,  dry  April  weather  provided  favorable 
planting  conditions  The  dryness  persisted  .nto  May 
and  slowed  early  plant  development,  while  frost  in 
mid-May  damaged  young  top  growth  The  spring 
wheat  area  received  less  than  75  percent  of  the  nor- 
mal May  precipitation  As  a result,  establishment  of 


the  crop  was  only  fair,  however,  rains  developed  in 
mid-June  to  aid  all  areas  except  South  Dakota  and 
southwestern  Minnesota  Good  moisture  prevailed 
over  Montana.  North  Dakota,  and  part  of  Minnesota 
through  the  heading  and  grain-filling  stages  The  re- 
mainder of  the  area  highlighted  by  large  negative 
crop  moisture  indexes  in  figure  4 remained  drought 
stricken,  and.  when  pastures  began  to  fail,  many 
farmers  cut  their  wheat  for  hay  or  turned  cattle  into 
ii  for  grazing 

Harvest  conditions  were  excellent  and  ample 
periods  of  dry  weather  allowed  combining  to  proceed 
taster  than  normal  Lodging  and  other  harvest  losses 
were  minimized  f or  South  Dakota,  harvest  condi- 
tions could  he  of  little  importance  since  the  severe 
midsummer  dryness  had  already  devastated  the 
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Table  III.— 1976  (LACIE  Phase  II) 
US  Gnat  Plaint  Official  Wheat  Yields0 


Shu 

1976  yleM. 

t-year 

butane 

(I972-7S) 

avenge  yield. 

Moat 

Winter  wheat 


Ts»  as 

220 

2M 

Oklahoma 

MO 

M.4 

Kansu 

J0j0 

31.5 

Colorado 

21.5 

24.3 

Nebraska 

320 

34.4 

South  Dakota 

ISjO 

30.4 

Montana 

320 

30.3 

String  wheat 


Mfftottoti 

32.4 

32.4 

North  Dakota 

M.7 

230 

Sooth  Ptlratt 

13.2 

IS.5 

Montana 

29.4 

212 

HjIQa  iMkanki.  ftHlntri  uid  fnniifilvi 


spring  wheat  crap  in  portions  of  that  sute.  The  rate* 
live  impact  of  the  drought  in  South  Dakota  is  indi- 
cated  by  the  reduced  yields  of  both  the  spring  and 
winter  wheat  for  the  state,  as  shown  in  table  III. 
Statewide,  the  yields  were  40  percent  below  recent 
averages;  however,  they  were  reduced  by  as  much  as 
60  percent  in  some  northeastern  sections  where  the 
drought  was  moat  severe. 

In  the  affected  aree  of  South  Dakota,  corn, 
pasture,  and  other  crops  were  as  severely  decimated 
as  the  raring  wheat.  At  the  time  when  the  spring 
wheat  is  normally  ripening  and  other  oops  are  still 
green,  much  of  the  Landsat  imagery  presented  a col- 
orless appearance  for  all  crops.  When  the  drought- 
stricken  crops  took  on  the  appearance  of  ripened 
grain,  the  analysts  classified  an  excessive  number  of 
fields  as  wheat,  thus  causing  a large  overestimate  of 
spring  wheal  area  in  South  Dakota. 


The  Phase  II  experience  of  LAC1E  in  the  1)5. 
Orest  Plains  was  the  project's  first  encounter  with 
widespread  serious  drought  It  tested  the  capability 
of  the  technology  to  produce  yield  estimates  repre- 


sentative of  severely  stressed  crops  and  to  classify 
wheat  when  its  characteristic  spectral  appearance 
was  changed  by  drought  Fortunately,  sufficient 
weather  data  were  available  in  the  United  States  to 
characterize  the  degree  and  extent  of  drought 
However,  it  is  possible  that  routine  weather  data 
available  from  some  foreign  arose  may  not  allow 
such  precision.  An  effort  was  undertaken  to  develop 
the  Landsat  data  as  a too!  for  monitoring  the  extent 
and  severity  of  drought  in  such  an  area. 

Once  the  problem  arm  was  defined  from 
meteorological  data,  Landsat  color-composite 
trsnspanmciee,  prepared  from  band  4 (0.5  to  0.6 
micrometer),  band  5 (0.6  to  0.7  micrometer),  and 
band  7 (0.8  to  1.1  micrometers)  of  the  satellite's 
multispectral  scanner,  were  used  to  refine  the  initial 
problem  area  delineated  from  meteorological  data.  A 
total  of  33  Landsat  full-frame  images  (100  by  100 
nautical  miles)  wen  required  for  the  Southern  US. 
Orest  Plains  analysis  and  14  Landsat  images  for  the 
South  Dakota  analysis.  These  color  transparencies 
were  evaluated  by  comparison  to  Landsat  imagery 
for  essentially  the  same  date  in  previous  years  and 
also  to  previous  9-day  acquisitions  for  the  current 
year.  Both  Landsat- 1 and  Landsst-2  were  used  to  ac- 
quire 9-day  coverage  for  the  drought  analysis.  Nor- 
mal healthy  green  vegetation  on  the  ground  is 
recorded  on  the  Landsat  color  composites  as  bright 
rod.  As  moisture  stress  reduces  the  vigor  of  the 
vegetation  on  the  ground,  the  Landsat-recorded  sig- 
natures correspondingly  decrease  in  redness  from 
what  one  normally  seas  at  the  same  crop  stage.  Thus, 
by  relating  the  lade  of  redness  in  the  signature  to  the 
rod  signatures  that  should  have  been  present,  the 
areal  extent  of  drought  wu  delineated  on  a mosaic  of 
Landsat  images  over  the  potential  drought  area. 

The  drought-affected  area  in  the  US.  Southern 
Great  Plains  was  determined  from  Landsat  data  to  be 
located  in  the  southwestern  corner  of  Kansas,  in 
southeast  Colorado,  and  in  the  Oklahoma  and  Texas 
Panhandles.  The  areal  extent  of  the  affected  area  as 
of  April  12, 1976.  is  shown  in  figure  5. 

The  drought  severity  within  the  area  was  rated 
subjectively  by  comparing  the  1976  and  1975  Landsat 
imagery.  These  ratings  correlate  well  with  the 
acreage  losses  estimated  from  ground-based  observa- 
tions. The  CM1  for  April  10, 1976,  also  verified  that 
this  general  area  was  undergoing  moisture  stress  (fig. 
6). 

The  initial  drought-affected  area  in  the  Northern 
Great  Plains,  as  determined  from  full-frame  images, 
was  located  within  South  Dakota.  From  April  18  to* 
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June  12,  197b,  the  area  appeared  to  be  deteriorating, 
but  the  full-frame  imagery  did  not  indicate  severe 
effects.  The  June  11  to  13  overpass  showed  the 
effects  of  the  drought  were  becoming  pronounced. 
The  drought-afflicted  area  delineated  at  this  time 
continued  to  expand  until  the  July  8 to  1 1 overpass 
when  it  stabilized  (fig.  7).  From  this  overpass,  the 
drought  area  was  rated  subjectively  as  having  been 
severely  or  moderately  affected  by  the  dryness. 

The  July  10,  197b,  Landsat  100-  by  100-nautical- 
mile  image  (fig.  8)  shows  the  western  edge  of  the 
severe  drought  damage  in  South  Dakota.  There  is  a 
lack  of  red  signatures  on  the  right  side  of  the  image 
when  compared  to  a July  7, 1975,  image  (fig.  9).  The 
1975  image  shows  red  signatures,  especially  in  the 
natural  drainageways,  that  are  not  in  the  197b  image. 

The  drought  analysis  of  the  Great  Plains  indicated 
that  the  Landsat  data  contained  meaningful  informa- 
tion about  moisture  stress.  To  automate  the  analysis, 
the  digital  data  from  Landsat  was  transformed  into  a 
quantitative  measure  of  greenness  called  the  green 
index  number  or  GIN.  which  has  since  proved  useful 


Util  RK.  5. — Druuulil  conditions  determined  from  full-frame 
l.andsal  imauer>  for  April  12,  1976. 


in  the  analysis  of  drought  conditions  in  other  coun- 
tries. The  specific  technical  approach  and  results  are 
presented  in  the  Experiment  Design  Section. 


PHASE  III 


U.S.  Great  Plains 

Winter  wheat. — The  U.S.  Greai  Plains  winter 
wheat  region  was  dry  before  fall  planting,  but  a series 
of  timely  rains  replenished  topsoil  moisture  at  plant- 
ing. Wheat  generally  had  adequate  moisture  for  ger- 
mination and  emergence;  however,  cold  weather  in 
October  caused  the  wheat  to  enter  dormancy  early 
with  little  vegetative  growth.  The  winter  period  was 
colder  than  normal  with  variable  snow  cover  and 
below  normal  precipitation.  During  February,  tem- 
peratures across  the  winter  wheat  region  averaged 
above  normal  and  encouraged  early  green-up.  Condi- 
tions were  conducive  to  rapid  increases  in  ground 
cover  during  March  and  April  as  continued  warm 
temperatures  were  accompanied  by  timely  spring 
rains. 

Moisture  was  ample  in  most  states  during  the  crit- 
ical grain-filling  stage.  The  only  notable  exception 
was  Colorado,  where  dryness  stressed  the  wheat  and 
reduced  potential  yield.  Temperatures  across  the 
Great  Plains  were  very  warm  during  April  and  May. 
ranging  as  much  as  10J  above  normal  in  some  areas. 

No  widespread  adverse  weather  occurred  over 
Texas  and  Oklahoma  during  harvesting.  In  Kansas, 
heavy  rains  affected  the  eastern  sections  of  the  state 
during  the  third  week  in  June  and  some  hail  was 
repoi'ed. 

In  the  northern  winter  wheat  states  of  Montana 
and  South  Dakota,  growing  conditions  were  highly 
variable.  The  extreme  dryness  which  affected  South 
Dakota  in  197b  caused  the  crop  to  be  planted  with  lit- 
tle moisture  available  for  germination  and  early 
growth.  The  wheat  entered  dormancy  in  poor  condi- 
tion and  was  susceptible  to  further  stand  reduction 
by  winterkill.  The  lack  of  vigor  suspected  in  South 
Dakota  winter  wheat  was  confirmed  by  analysts  who 
reported  poor  signatures  on  the  Landsat  imagery  ob- 
tained from  that  region  after  dormancy.  Timely 
showers  during  spring  and  early  summer  improved 
the  condition  of  the  wheat  that  survived 

The  Montana  wheat  had  better  moisture  condi- 
tions for  establishment  than  the  South  Dakota  wheat 
and  received  the  benefit  of  early  spring  showers; 
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IN  INDEX  DURING  WEEK 


0 TO  1.0  - TOPSOIL  MOISTURE  SHORT, 
RAIN  NEEDED 

-1.0  TO  -2.0  - TOO  DRV.  DETERIORATING 
PROSPECTS 

2.0  TO  -3.0  - TOO  DRV.  YIELD  PROSPECTS 
REDUCED 

3.0  TO  4.0  POTENTIAL  YIELDS  CUT  BY 
DROUGHT 

BELOW  -4.0  - EXTREMELY  DRY.  MOST 
CROPS  RUINED 

FIGURE  6.— Crop  Moisture  Index  for  April  10,  1076, 


ABOVE  3.0  - EXCESSIVELY  WET.  SOME 
FLOODING 

2.0  TO  3.0  - TOO  WET.  SOME  STAND 

ING  WATER 

1.0  TO  2.0  - FAVORABLE.  SOME  FIELDS 

TOO  WET 

0 TO  1.0  - FAVORABLE  FOR  NORMAL 
GROWTH 


however,  the  showers  became  less  reliable  during 
June.  The  GIN  was  used  to  estimate  drought  inten- 
sity on  9-  by  1 1 -kilometer  sample  segments  in 
eastern  Montana.  On  May  20,  GIN  indicated 
moisture  stress  was  present.  The  CMI  map  indicated 
abnormally  dry  conditions  at  that  time  only  in 
eastern  Montana,  but  by  late  June,  dryness  was  prev- 
alent over  almost  the  entire  state.  This  intensified 
moisture  stress  during  what  is  usually  the  heading 
stage  caused  significant  reductions  in  yield  in  Mon- 
tana. 

In  summary,  during  Phase  III,  conditions  were 


such  that  the  U.S.  Great  Plains  experienced  no 
widespread  disease  or  insect  problems.  Moisture 
stress  was  a localized  problem  which  did  not  affect 
the  major  winter  wheat  growing  region  after  dorman- 
cy. Even  though  the  crop  had  poor  conditions  for 
overwintering,  growing  season  weather  was  generally 
favorable  after  green-up. 

Winterkill.— Conventional  and  satellite  sources  of 
meteorological  data  were  monitored  and  the  data 
analyzed  throughout  the  late  fall,  winter,  and  early 
spring  to  delineate  areas  where  the  wheat  crop  was 
exposed  to  frigid  temperatures  with  minimal  or  no 
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mow  cove* — the  criteria  for  winterkill.  These  criteria 
were  obtained  from  a search  of  pertinent  literature 
describing  the  tolerance  of  wheat  to  cold.  Although 
the  critical  limits  depend  on  variety,  plant  moisture 
content,  and  several  other  factors,  a value  of  -20°  C 
is  generally  an  appropriate  threshold.  The  only  Great 
Plains  potential  winterkill  locations  that  were  deter* 
mined  to  be  without  snow  cover  during  the  occur- 
rence of  critically  cold  temperatures  were  extreme 
northern  Kansas,  Nebraska,  and  the  central  area  of 
South  Dakota,  where  fall  dryness  had  already  caused 
poor  wheat-stand  development.  Field  reports 
received  from  this  area  later  in  the  season  indicated 
wheat  had  been  affected  by  winterkill  and  some 
fields  seriously  thinned  or  abandoned. 

Spring  wheat.— Much  of  the  U.S.  spring  wheat 
region  entered  the  1977  growing  season  year  with  a 
serious  deficiency  of  subsoil  moisture  carried  over 
from  the  previous  fall.  During  most  of  the  crop 
season,  however,  timely  showers  provided  adequate 
rainfall  to  promote  good  growth  in  South  Dakota, 
Minnesota,  and  most  of  North  Dakota.  Only  Mon- 
tana experienced  widespread  drought,  although  the 
western  portion  of  North  Dakota  had  less  severe 
moisture  shortages.  During  June  and  July,  when  the 
wheat  in  the  Northern  Great  Plains  was  heading  and 


FIGURE  7. — Drought  conditions  determined  from  full-frame 
Lendxat  Imagery  for  July  8 to  II  and  17  to  20, 1976. 


filling,  the  GIN  was  used  to  evaluate  the  area  for  in- 
dications of  stress.  Most  of  Montana's  spring  wheat, 
except  isolated  pockets  in  the  southeastern  comer 
and  northcentral  portion,  exhibited  signs  of  drought 
stress,  but  this  was  the  only  state  where  such  condi- 
tions were  noted. 


U.8.8.R. 

Winter  wheat.— The  U.S.S.R.  winter  grain  planting 
season  was  characterized  by  ample  soil  moisture  and 
the  early  onset  of  cold  temperatures.  Warm  and 
moist  September  weather  promoted  the  initial  estab- 
lishment of  the  crop  in  many  areas,  but  growth  was 
quickly  limited  by  temperatures  in  October  which 
fell  below  the  optimum  needed  for  vigorous  develop- 
ment. In  many  areas,  the  cold  temperatures  occurred 
within  2 to  3 weeks  after  normal  planting  dates 
while,  in  a few,  temperatures  sufficiently  cold  to 
bring  on  dormancy  developed  at  normal  planting 
time.  Landsat  imagery  acquired  in  late  fall,  before 
snow  cover,  indicated  weak,  spotty  wheat  signatures 
when  compared  with  imagery  from  previous  years, 
thus  confirming  the  poor  growing  conditions. 

January  was  colder  than  normal  in  the  winter 
wheat  region;  however,  temperatures  near  cr  above 
normal  during  December  and  February  caused  the  3- 
month  period  to  average  warmer  than  the  long-term 
mean.  The  mild  temperatures  in  February  were  an 
indicator  of  the  warmer  than  normal  weather  which 
was  to  continue  through  the  spring  season.  In  most 
crop  regions,  this  was  accompanied  by  adequate  ana 
timely  precipitation.  The  seasonal  weather  typical  of 
the  important  winter  wheat  growing  region  of  the 
northeast  Caucasus  is  shown  in  figure  10.  It  illus- 
trates the  early  onset  of  cold  fall  temperatures  and 
the  warmer  than  normal  weather  during  February, 
March,  and  early  April  which  pushed  crop  develop- 
ment ahead  of  normal. 

Winterkill.— During  December,  precipitation  was 
near  normal  and  produced  adequate  snow  cover  over 
all  of  the  winter  wheat  region  with  the  exception  of 
the  extreme  southern  portions.  On  January  2 and  S, 
temperatures  dropped  below  -20°  C in  the  areas 
with  poor  snow  cover  and  may  have  caused  loss  of 
»ome  plant  stands.  The  regions  affected  were 
Krasnodar  Kray  and  the  northeast  Caucasus,  along 
with  small  portions  of  the  eastern  Ukraine  and  the 
Lower  Volga. 

In  the  region  where  cold  injury  was  suspected,  the 
Landsat  imagery  was  reviewed  closely  to  determine 
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KM. I KK  8. — UrmiKlil  effect  on  1 andval  imu^e  S448-IM.1S  aet|tiired  on  .lul>  Ml.  |S7f>.  met  central  South  Dakota. 


whether  the  wheat's  characteristic  appearance  was 
altered  in  an  identifiable  way.  It  would  be  suspected 
that  damaged  wheat  would  be  thin  and  irregular  after 
green-up  unless  it  was  severely  injured,  in  which  case 
the  land  would  be  plowed  and  planted  to  another 
crop.  One  of  the  segments  studied  is  shown  in  figure 
11,  where  the  bottom  panel  is  a machine  classifica- 
tion map  on  which  the  fields  identified  as  w heat  are  a 
light  color.  Wheat  planted  in  several  fields  during  the 
fall  shows  the  brightest  red  color  where  it  has 
emerged  and  is  growing  on  October  7.  One  of  the 
fields,  outlined  in  white,  shows  establishment  in  the 


fall  but  has  been  dropped  from  the  inventory  on 
April  4,  possibly  because  it  has  been  damaged  and 
does  not  exhibit  the  characteristic  green-up  after 
dormancy. 

Figure  12  is  imagery  of  a segment  located  on  the 
eastern  edge  of  the  delineated  potential  winterkill 
area.  The  good  response  seen  in  this  segment  during 
dormancy  when  there  is  snow  in  the  fence  rows  indi- 
cates good  fall  establishment.  No  effects  of  the  cold 
temperatures  during  January  can  be  seen  in  the 
winter  grain  response  in  the  spring  acquisition. 
Clearly,  these  two  examples  show  that  evidence  of 
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FKil'RF.  ’ —Normal  lands*!  image  2166-1649.1  acquired  on  .lul>  7.  1975. 


winte.kill  on  the  satellite  imagery  is  very  subtle  and 
requires  extensive  analysis  to  assess  the  degr“**  or  ex- 
tent of  damage. 

Spring  wheat. — The  U.S.S.R.  spring  wheat  growing 
region  experienced  dry  weather  during  the  spring 
months,  and  planting  progressed  in  a timely  manner. 
The  moisture  supply  throughout  the  remainder  of 
the  growing  season  varied  considerably  between 
regions,  however  July  was  a particularly  important 
period  as  the  wheat  moved  through  the  critical  grain- 
filling  stage.  During  July,  the  western  portion  of  the 


spring  wheat  area  received  ..pproximately  1 00  to  125 
millimeters  of  rainfall,  almost  iwice  the  normal 
amount. 

In  the  area  from  the  Middle  Volga  region  eas>- 
ward,  moisture  was  most  variable  July  rainfall 
across  the  northern  portion  of  the  spring  wheat  bell 
totaled  50  to  100  millimcteis — near-normal  prec'oita- 
(ion.  Temperatures  also  averaged  near  normal  there. 

The  southern  portion  of  the  belt  received  less 
rainfall  with  crop  region  averages  ranging  generally 
from  25  to  75  millimeters.  Temperatures  in  those 
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KIC»l  RK  10.— 4 linuuraph  fur  crop  ri'iiion  10.  northeast 
Caucasus  region  of  the  l .S.N.R. 


areas  are  normally  warmer,  however,  than  in  the 
northern  portion  of  the  spring  wheat  area  anil  make 
precipitation  less  effective.  Some  reports  had  indi- 
cated below  normal  soil  moisture  existed  in  that  area 
early  in  the  season  and  the  lack  ol  any  significant  op- 
portunity to  renew  the  supply  meant  that  plant  stress 
was  likely.  Imagery  and  digital  data  obtained  Irom 
Landsat  overpasses  were  used  to  evaluate  whether 
stress  was  actually  present  in  the  area  Irom  the  Urals 
eastward. 

The  GIN  program  developed  by  LAC1E  was  run 
on  two  separate  Landsat  overpasses  in  the  U.S.S.R. 
spring  wheat  region  during  July  when  the  crop  was  in 
the  critical  jointing-heading  stage.  Data  for  June  23 
and  July  2 to  18  (fig.  13)  indicated  that  much  of  the 
U.S.S.R  spring  wheat  region  was  undergoing  stress 
at  that  time.  The  next  Landsat  pass  during  July  19  to 
30  (fig.  14)  indicated  that  stress  conditions  were  still 
present.  The  total  area  of  probable  moisture  stress 
for  July,  using  the  combined  data,  is  shown  in  figure 
15.  These  data  imply  that  duTing  July,  much  of  the 
U.S.S.R  spring  wheat  in  ’he  area  east  of  the  Urals 
and  generally  south  of  a line  from  Orenburg  to  Omsk 
experienced  moisture  stress. 

Landsat  color-infrared  full-frame  100-  by  100- 
nautieal-mile  segments  of  four  separate  areas  were 
also  examined  for  indications  of  distinct  changes  in 
the  spectral  quality  that  would  imply  variations  in 
crop  vigor  or  crop  development  stage  Apparent 
within  the  imagery  was  an  obvious  darkening  of  the 
soil  where  scattered  showers  had  recently  occurred, 
indicating  that  rainfall  was  not  altogether  absent 


Utility  II.— Example  of  possible  winterkill,  northeast 
C aucasus.  I .S.S.K.  lal  Partial  fall  emergence.  Ooloher  7.  117b. 
tbl  Spring  greening  up.  April  4.  1*177.  (cl  Machine  classification 
map.  April  4.  I*>77  (light  color  - wheal,  bordered  area  - 

cits). 
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FKil  RE  14.— Moisture  conditions  user  U.S.S.R.  spring  wheat  from  the  LAl'lE  GIN  monitoring  program:  Landsat  data  acquired  July 
19  to  30.  1977. 


A Landsat  overpass  acquired  on  July  4,  1977,  is 
shown  in  figure  16.  This  area  extends  from  north  to 
south  between  the  cities  of  Omsk  and  Tselinograd. 
The  vigorous  red  crop  signatures  in  the  north  lose 
much  of  their  intensity  as  one  moves  southward.  In 
the  central  portion,  very  few  field  patterns  can  be 
identified.  An  acquisition  showing  the  central  por- 
tion 1 year  earlier  on  July  25,  1976,  indicates  that 
cultivated  fields  do  exist  throughout  this  area  and 
should  be  apparent.  Clearly,  the  spring  wheat  did  not 
develop  well  early  in  the  season  and  stands  do  not 
exhibit  the  signature  one  might  expect  at  a time 
when  the  crop  should  be  near  heading.  Rains  later  in 
July  could  have  caused  a slight  improvement  in  the 
crop  condition,  but  significant  reduction  in  potential 
yield  had  already  occurred  when  moisture  shortages 
limited  development  of  the  plant  stands. 

The  sequence  of  Landsat  imagery  acquired  from 
the  stressed  region  shows  abnormal  crop  signatures, 
and  alternative  cropping  practices  may  have  been  im- 


plemented in  some  areas.  In  figure  17,  a series  of  ac- 
quisitions for  a S-  by  6-nautical-mile  segment  in 
southern  Kustanay  is  shown  which  indicates  some 
likely  problems.  On  April  29,  the  fields  in  the  lower 
portion  are  being  prepared;  on  June  4,  they  have 
emerged  but  show  weak  color-infrared  signatures. 
On  July  28,  much  of  the  crop  has  matured,  while  the 
same  area  on  August  1, 1976,  was  still  showing  active 
growth.  Early  maturity  is  a common  response  when 
wheat  is  under  moisture  stress. 

In  figure  17,  there  is  an  isolated  field  in  the  lower 
right  which  shows  rapid  growth  between  June  4 and 
July  28.  This  is  probably  a fieid  that  was  abandoned 
during  June  and  planted  to  a crop  such  as  millet, 
which  could  be  harvested  for  silage  before  a freeze. 
However,  abandonment  and  reseeding  was  not  a 
widely  identifiable  practice  this  year. 

The  Landsat  imagery  contains  additional  informa- 
tion indicative  of  wheat  condition  in  the  dry  areas.  In 
figure  18,  the  large  area  of  small  grains  in  the  upper 
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FIGURE  15. — Moisture  conditions  over  U.S.S.R.  spring  wheat  from  the  LACIE  GIN  monitoring  program  for  July  1977. 


central  portion  of  the  segment  appears  to  be 
progressing  past  the  heading  stage  on  July  27  and 
presents  a dark-red  signature.  By  mid-August,  these 
should  be  ripe  and  nearing  harvest;  however,  the 
August  14  acquisition  indicates  some  red  color  is  still 
present,  a sign  of  active  growth.  This  is  likely  the 
result  of  vegetation  in  the  form  of  weeds  and  second- 
ary tillers  from  the  wheat  plants  responding  to  the 
rains  which  occurred.  The  rain  probably  had  the 
effect  of  slowing  maturity  as  well.  In  some  areas  of 
the  oblast,  August  rainfall  totaled  up  to  300 
millimeters,  which  is  about  500  percent  above 
normal. 

Only  in  isolated  cases  does  this  secondary  wheat 
growth  produce  a significant  amount  of  additional 
grain;  more  often  it  joins  with  the  weeds  to  interfere 
with  combining  operations.  In  fields  which  are 
allowed  to  stand  while  secondary  growth  develops, 
some  loss  of  standing  grain  will  occur,  causing  a 
possible  reduction  in  yield.  The  extensive  presence 


of  weeds  in  the  fields  reduces  the  quality  of  the 
wheat  that  is  harvested. 

In  summary,  the  wheat  crop  in  the  area  east  of  the 
Ural  Mountains  endured  the  poorest  growing  condi- 
tions of  any  area  harvested  during  1977.  In  the 
southern  portion  of  the  New  Lands,  drought  appears 
to  have  made  a serious  impact;  some  fields  were  ap- 
parently overseeded,  while  those  left  for  harvest 
were  probably  of  low  quality. 


SUMMARY  AND  CONCLUSIONS 

The  LACIE  experience  in  msjor  wheat-producing 
regions  of  the  world  encompassed  a wide  variety  of 
crop  growing  conditions.  These  represented  the 
spectrum  of  growing  conditions  likely  to  be  encoun- 
tered by  an  operational  system  using  the  LACIE 
technology  to  perform  timely  monitoring  of  global 
wheat  production.  It  is  clear  that  an  abundance  of  in- 
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formation  is  available  from  the  meteorological  and 
Landsat  data  which  can  be  used  to  infer  likely  crop 
condition.  The  experience  in  the  US.  Great  Plains 
indicates  that  such  inferences  truly  reflo. . the  actual 
condition  of  the  wheat  growing  there.  Such  in* 
ferences  do  not  follow  directly  from  the  data, 
however,  and  considerable  agronomic  insight  and  an* 
ciliary  data  are  required  to  make  them  meaningful 
and  reliable. 

The  application  of  Landsat  digital  data  to  crop 
condition  assessment  through  development  of  the 
GtN  represents  a major  application  of  that  informa- 
tion source  for  a purpose  other  than  crop  identifies* 
lion  and  acreage  mensuration.  It  has  a distinct  advan- 
tage over  the  meteorological  data  in  that  it  provides 
continuous  spatial  coverage,  whereas  the  weather  ob- 
servations are  only  samples  at  discrete  points. 
However,  die  use  of  the  GIN  also  requires  care  to 
avoid  confounding  crop  conditions  with  crop 
phenologies!  development.  The  GIN,  meteorological 
data,  crop  development  models,  and  ancillary  histori- 
cal data  comprise  a powerful  combination  of  infor- 
mation sources  to  be  exploited  for  making  qualitative 


assessments  of  crop  vigor.  The  LACIE  experience 
has  focused  primarily  on  qualitative  assessments  of 
moisture  stress  and  temperature  effects,  but  the  in- 
formation sources  can  be  extended  to  inferring  the 
presence  or  absence  of  other  detrimental  crop  in- 
fluences such  as  disease,  insects,  or  extreme  desicca- 
tion. 

The  data  sources  available  to  make  assessments  of 
crop  conditions  offer  varying  timeliness  advantages 
but  these  can  be  complementary  to  each  other.  The 
meteorological  data  acquired  each  day  provide  a 
definition  of  environmental  conditions  at  specific 
locations  and  an  early  warning  when  extremes  are 
exceeded.  The  Landsat  data  provide  a measure  of  the 
plants'  integrated  response  to  those  conditions  over  a 
period  of  several  days.  The  present  interval  of  18 
days  between  passes  of  each  Landsat  satellite  over  a 
point  on  the  Earth  may  not  be  timely  enough  to  pro- 
vide rapid  verification  and  define  the  areal  extent  of 
adverse  conditions,  however,  thus  establishing  a 
need  for  Landsat  data  acquisition  at  shorter  intervals 
of  9 or  6 days. 
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Application  of  Landsat  Digital  Data 
for  Monitoring  Drought 

D.  R.  Thompson0  and  0.  A.  Wehmantnb 


SUMMARY 

A technique  utilizing  transformed  Landsat  digital 
dau  for  detection  of  agricultural  vegetative  water 
stress  was  developed  during  the  1976  South  Dakota 
drought.  The  procedure  was  expanded  to  the  U.S. 
Great  Plains  during  1977  to  evaluate  the  technique 
for  detecting  and  monitoring  vegetati  .*e  water  stress 
over  large  areas.  This  technique,  the  Grem  Index 
Number,  used  Landsat  digital  data  from  5*  by  6* 
nautical-mile  sampling  frames  (segments)  to  indicate 
when  the  vegetation  within  the  segment  was  un- 
dergoing  stress.  Segments  were  classified  as  either 
moisture  stressed  or  normal  using  remote-sensing 
techniques  combined  with  a knowledge  of  the  crop 
condition.  The  remote-sensing-based  information 
was  compared  to  a weekly  ground-based  index  (the 
Crop  Moisture  Index)  provided  by  the  U.S.  Depart- 
ment of  Commerce.  This  comparison  demonstrated 
good  agreement  between  the  lt-day  remote-sensing 
technique  and  the  weekly  ground-based  dau.  The 
procedure  developed  over  a small  geographic  area 
(South  Dakou)  for  detecting  moistuie  stress  was 
adapted  to  a larger  geographic  region  (the  U.S.  Great 
Plains). 


INTRODUCTION 

Landsat  color-infrared  images  were  used  in  the 
LAC1E  during  the  1976  droughts  in  the  U.S.  Great 
Plains  to  determine  the  areal  extent  of  these 
droughts  (refs.  1 and  2).  The  use  of  Landsat  images 
for  drought  monitoring  is  dependent  on  the  subjec- 
tive judgment  of  an  analyst-interpreter  in  deciding 


•NASA  Johnson  Space  Cooler.  Hoaotao.  Texas. 

^Lockheed  Electronics  Compsny.  lac..  Systems  end  Services 
Division,  Houston,  Texas. 


that  a region  is  or  is  not  drought  affected.  During  the 
analysis  of  the  U.S.  Southern  Great  Plains  drought, 
studies  were  started  using  Landsat  digiul  dau  from 
LACIE  sample  segments  for  quantifying  the  subjec- 
tive judgment  of  the  analyst-interpreter  (ref.  I).  Dur- 
ing the  drought  in  the  US.  Northern  Great  Plains,  a 
technique  utilizing  transformed  Landsat  digiul  dau 
for  detection  of  agricultural  vegetative  water  stress 
was  developed  (refs.  2 and  3).  The  procedure,  which 
was  developed  over  a small  geographic  region,  was 
expanded  to  selected  LACIE  sample  segments 
throughout  the  U.S.  Great  Plains  during  the  1977 
crcp  rear.  The  remote-sensing  technique,  the  Green 
Index  Number  (GIN),  was  compared  to  a weekly 
ground-based  index,  the  Crop  Moisture  Index,  pro- 
vided by  the  U.S.  Department  of  Commerce.  This 
paper  prasenu  the  approaches  used  for  and  the 
resultt  from  the  GIN  monitoring  program. 


APPROACH 

The  GIN  concept  for  detecting  and  monitoring 
drought  toes  Landsat  muitispectral  scanner  (MSS) 
values  for  LACIE  sample  segments  (figs.  1 and  2) 
and  a knowledge  of  the  wheat  growth  stage  at  the 
time  of  the  Landsat  acquisition. 

Using  ideas  presented  by  Kauth  and  Thomas  (ref. 
4),  a screening  number,  the  GIN,  was  developed  dur- 
ing the  1976  drought  in  South  Dakou.  The  GIN  is  a 
value  designated  to  summarize  the  condition  of 
vegeution  within  a sampling  frame.  It  is  based  on  the 
ability  to  detect  the  area  of  growing  vegeution  using 
all  four  Landsat  bands  and  to  measure  the  area  with- 
in the  region.  In  approximate  terms,  the  GIN  is  the 
percentage  of  land  in  an  area  with  a “healthy”  cover 
of  vegeution. 

The  procedure  developed  during  the  1976  South 
Dakou  drought  was  different  from  the  procw.'tre  ap- 
plied to  the  selected  subset  cf  LACIE  sarnie  sag- 
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FIGURE  I. — Map  of  South  Dakota  show  Inf  locations  of  LACIE 
sample  segments. 


a vector  representing  the  Landsat-l 
version  of  the  Kauth-Thomas 
transformation  of  x1  (6);  the 
subscript  number  indicates  the 
Landsat  channel,  and  the 
superscript  is  the  cluster  number. 
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ments  throughout  the  U.S.  Great  Plains,  which  were 
used  to  test  the  adaptability  of  the  procedure 
developed  over  a small  geographic  area  (South 
Dakota)  to  a larger  geographic  area  (the  U.S.  Great 
Plains).  The  final  GIN  value  is  essentially  the  same 
value  if  computed  by  the  two  different  methods. 
Both  the  original  method  and  the  present  method  of 
computing  the  GIN  will  be  presented. 


1976  GIN  CALCULATION 

The  GIN  is  defined  as  follows.  First,  the  data  in 
the  segment  acquisition  are  summarized  by  cluster- 
ing using  the  Iterative  Self-Organizing  Clustering 
System  (ISOCLS)  algorithm  as  implemented  on  the 
Earth  Resources  Interactive  Processing  System 
(ERIPS)  on  the  special-purpose  processor.  (This 
parallel  processor  clusters  a segment  in  approx- 
imately 30  seconds.)  The  clustering  procedure  sum- 
marizes the  segment  in  20  or  fewer  cluster  means  in 
the  four  Landsat  channels.  The  count  of  picture  ele- 
ments (pixels)  belonging  to  each  cluster  is  also 
calculated.  Each  mean  vector  y is  then  transformed 
by  . 


y = Ax 1 + b' 


FIGURE  2. — Map  of  U.S.  Great  Plains  showing  locations  of 
(1)  LACIE  sample  segments. 
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Each  vector  is  inspected  automatically,  and  any 
vector  having  values  unreasonable  for  agricultural 
data  is  discarded  using  the  following  procedure. 

A clustery  is  accepted  as  good  only  if 


30<y[<ll0 

-10<yJ 

— 10<y{ 
-10<y'<10 


Let  a duster  have  green  number  y and  contain  n 
pixels.  Define  the  weighting  factor  why 

wOlff1*  11 

w*  l mV*  17  (5) 

Otherwise, 


The  greenness  level  m of  the  soil  line  then  is  esti- 
mated by  the  minimum  second-channd  value  y£ 
for  acceptable  clusters.  That  is. 


w = MINy*  (3) 

i is  good 


Then  the  green  number  y is  computed  for  each 
cluster  by 


The  value  of  g1  should  be  a good  measure  of  the 
green  vegetation  present  on  the  pixels  in  cluster  /. 
Experience  from  test  sites  with  spectral  plots  in  the 
brightness-greenness  plane  and  the  corresponding 
imagery  led  to  the  following  assumptions. 

1.  y — 0 indicates  bare  soil. 

2.  y ■»  5 indicates  a trace  of  vegetation. 

3.  y — 1 S indicates  good  cover  of  vegetation. 

Operating  on  these  assumptions,  levd  IS  was 

chosen  and  GIN  was  defined  to  be  the  percentage  of 
pixels  in  the  entire  image  within  clusters  having 
green  numbers  greater  than  IS.  This  value  was  con- 
sidered somewhat  unstable  for  consecutive-day  data 
where,  for  example,  a cluster  would  slip  from  1S.1 
one  day  to  14.9  the  next  day;  therefore,  cubic  weight- 
ing was  added  to  smooth  this  calculation  in  the 
following  manner. 


The  cluster  is  counted  as  having  h n pixels  with 
green  numbers  greater  than  IS.  This  curve  makes  a 
smooth  transition  from  full  counting  to  not  counting 
as  the  green  number  decreases. 

The  GIN  then  is  an  estimate  of  the  percentage  of 
pixels  in  a Landsat  scene  having  green  numbers  high 
enough  (»I5)  to  indicate  full  cover  of  green  vegeta- 
tion. It  is  computed  using  only  Landsat  data.  A sam- 
ple spectral  plot  of  green  numbers  versus  brightness 
is  given  in  figure  3. 


1 977  GIN  CALCULATION 

The  GIN  is  computed  cn  a LACIE  sample  seg- 
ment, an  area  (S  by  6 nautical  miles)  with  22  932 
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FIG II RE  3.— Sample  spectral  plot  of  cluster  statistics. 
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Landsat  image  elements  or  pixels.  The  data  are  pro- 
cessed using  an  automated  screening  procedure  that 
rejects  pixels  with  values  which  are  unreasonable  for 
agricultural  area  (because  of  clouds,  water,  or  bad 
data).  This  procedure  was  established  empirically 
after  inspecting  many  LAC1E  segments.  The  pro- 
cedure for  computing  GIN  is  defined  as  follows. 

For  an  observation  X.  where 


Z is  computed,  where 


Z = RX  (6) 

where 
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Each  vector  is  inspected  automatically,  and  any 
vector  having  values  unreasonable  for  agricultural 
data  (because  of  clouds,  water,  or  bad  data)  is  dis- 
carded using  the  following  procedure. 
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A pixel  is  accepted  as  good  only  if 


Z,  < 100 
-8  <Z2 
-19  < Z3 
-5  < Z4  < IS 


Once  the  screening  has  been  performed,  the 
histogram  of  the  value  Z^,  truncated  to  integer,  is  ac- 
cumulated for  good  pixels.  Z;  is  defined  as  the  green- 
ness channel  in  the  Kauth-Thomas  transformation 
and  is  a weighted  difference  between  the  spectral 
values  in  the  infrared  and  visible  channels.  Bare  soil 
has  a low  greenness  level,  which  changes  with  haze 
level  and  sample  segment  location.  The  soil  green- 
ness 5|  for  each  segment  is  estimated  to  be  the  green- 
ness of  the  pixel  that  is  greener  than  only  228  (ap- 
proximately 1 percent)  of  the  other  good  pixels.  The 
greenness  is  zero;  thus,  the  green  number  g for  a pix- 
el is  g — Zj—  s.  The  green  number  contains  informa- 
tion about  green  vegetation.  To  compute  GIN,  the 
pixels  with  13  are  counted,  divided  by  22932 
(the  number  of  pixels  in  a scene),  and  multiplied  by 
100  to  obtain  the  percentage.  The  level  IS  was  ob- 
served empirically  to  represent  healthy  green 
agricultural  vegetation.  The  GIN  is  an  estimate  of  the 
percentage  of  pixels  in  a Landsat  5-  by  6-nautical- 
mile  scene  having  green  numbers  high  enough 
(>15)  to  indicate  full  cover  of  green  vegetation.  It  is 
computed  using  only  Landsat  data. 

It  was  determined  during  the  1976  South  Dakota 
drought  that  a plot  of  GIN  versus  acquisition  time 
for  a normal,  predominantly  wheat  segment  should 
follow  a predetermined  curve  (ftg.  4).  If  an  observed 
point  for  a segment  fell  into  the  shaded  region,  the 
segment  was  classified  as  drought  affected.  The 
bounds  for  the  shaded  region  were  defined  em- 
pirically with  i defined  as  the  approximate  spring 
emergence  in  days.  For  different  areas  or  years,  the 
shaded  area  is  moved  from  side  to  side  to  match  the 
green-up  curve.  The  bounds  for  the  shaded  region  are 
such  that  a normal  agricultural  segment  greens  up  at 
a rate  greater  than  0.5  percent  per  day  for  40  days  and 
exceeds  a GIN  value  of  20  for  30  days.  Aftf  this 
time,  the  wheat  in  a region  has  completed  grain  fi»- 
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FIGURE  4 — Plot  of  GIN  versus  lime  for  a normal,  predomi- 
nantly wheat  setment. 

ing  and  is  harvested  and  the  greenness  of  the  region 
may  decrease.  Each  acquisition  for  a segment  was 
plotted  and  the  segment  was  classified  as  moisture 
stressed  or  not  moisture  stressed.  For  comparison, 
these  segments  were  also  classified  using  the  Crop 
Moisture  Index  (CMI)  for  Crop  Reporting  Districts 
(CRD's).  The  remote-sensing  classification  pro* 
cedure  of  GIN  was  evaluated  against  the  CMI,  which 
measures  the  degree  to  which  moisture  requirements 
of  growing  crops  were  met  during  the  previous  week. 
The  index  is  computed  from  average  weekly  values 
of  temperature  aitd  precipitation.  Along  with  pre- 
vious soil  moisture  condition  and  current  rainfall, 
the  temperature  and  precipitation  values  are  used  to 
calculate  the  actual  moisture  loss.  If  the  potential 
moisture  demand  or  potential  evapotranspiration  ex- 
ceeds available  moisture  supplies,  actual 
evapotranspiration  is  reduced  and  the  CMI  gives  a 
negative  value.  However,  if  moisture  meets  or  ex- 
ceeds demand,  the  index  is  positive.  The  CMI  repre- 
sents the  average  conditions  over  a several-county 
region  (CRD);  so  local  moisture  conditions  may 
vary  because  of  differences  in  rainfall  distribution  or 
soil  types.  The  specific  type  of  agriculture  is  not  con- 
sidered in  the  CMI,  but  it  assumes  a water-use  curve 
typical  of  the  leaf  area  index  of  the  crops  which  pre- 
dominate in  the  region.  A CRD  was  classified  as 
drought  affected  if  its  CMI  fell  below  —0.5  for  2 con- 
secutive weeks.  Both  classifications  were  restricted 
to  similar  time  frames,  it  was  possible  for  a segment 
to  start  normal  and  then  undergo  moisture  stress.  If, 
in  any  one  instance,  the  GIN  classification  did  not 
agree  with  the  CMI,  the  GIN  was  considered  as  not 
agreeing  with  the  CMI  (tables  I and  II). 


Table  L— Results  of  GIN  and  CMI  Classifications? 
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RESULTS 


South  Dakota,  1 076 

The  data  used  in  this  study  consisted  of  all  LACIE 
segments  in  South  Dakota  which  had  at  least  5 per- 
cent wheat  as  measured  by  the  LACIE  Classification 
and  Mensuration  Subsystem  (CAMS)  in  the  1976 
growing  season.  This  definition  yields  17  segments 
with  34  possible  classifications  or  segment  years.  (A 
segment  year  is  defined  as  an  observation  of  one  seg- 
ment for  a growing  season.)  Of  the  31  classifications, 
12  had  either  insufficient  data  during  the  growing 
season  or  data  that  were  inaccessible  for  other 
reasons.  The  final  data  set  contained  22  segment 
years  for  13  LACIE  segments  (fig.  1 and  table  I).  The 
contingency  table  (table  HI),  which  applies  the  two 
classification  methods  to  the  22  good  segment  years, 
shows  that  the  classifications  based  on  the  CMI  and 
GIN  are  related.  It  was  concluded  that  the  GIN  ob- 
servations at  appropriate  phenological  stages  are 
detecting  moisture  through  crop  responses. 

An  inspection  of  the  five  disagreements  on  the 
classification  results  (table  I)  disclosed  that  on  two 
segments,  the  GIN  algorithm  was  confused  by  a lake. 
Three  of  the  segments  were  on  the  edge  of  their 
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Table  11. — Results  of  GIN  and  CMI  Classifications a 
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Table  III— Contingency  Table  of  GIN  and  CMI 
Classification  Method** 
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CRD’s,  and  the  CMI  classifications  may  not  have 
reflected  the  actual  conditions  in  these  segments. 
One  segment  was  located  on  the  eastern  edge  of  the 
CRD  where  heavy  rains  occurred  at  the  weather  sta- 
tion located  in  the  Black  Hills  (in  the  western  part  of 
the  CRD),  causing  a possible  incorrect  condition  to 
be  reflected  by  the  CMI. 

Examples  of  the  segment  classification  procedure 
are  shown  in  figures  S and  6.  The  GIN  indicates  that 
1975  was  normal  for  the  entire  crop  season  for  seg- 
ment J (fig.  5).  In  1976,  the  GIN  indicated  that  by 
May  24  there  was  moisture  stress  in  segment  J.  This 
indicates  that  the  GIN  detected  vegetation  moisture 
stress  at  the  same  time  as  the  CMI.  Segment  L (fig. 
6)  experienced  drought  as  indicated  by  both  the  GIN 
and  the  CMI  during  the  1975  crop  year.  In  1976,  the 
GIN  indicated  moisture  stress  on  May  26,  which  was 
confirmed  by  the  CMI. 


U.S.  Great  Plains,  1977 

The  data  used  in  this  study  consisted  of  36  LACIE 
segments  located  throughout  the  U.S.  Great  Plains 
wheat  growing  region  (fig.  2).  These  segments  con- 
tained at  least  5 percent  agricultural  cropland.  The 
final  data  set  consisted  of  70  segment  years  for  the  36 
LACIE  segments  (fig.  2). 

The  final  data  set  of  70  segment  years  shows  good 
agreement  between  the  remote-sensing  (GIN) 
classification  and  the  CMI  classification  (table  II). 
The  contingency  table  (table  IV),  which  compares 


436 


"r 


nr 


"h 


«a  - 

ffACfNTMC 
WWW  * M 

fttfMMft  > 1ft 

» - 


crfO.*» 


MNNMD 
ft  - WiWt 
<*•  WI89I 


_ • WfttiMt 

n • waa» 


Otftftf) 


* - 
W ‘ 
*«- 


X 


-~y~ 


&ROUQMT BOUNDS 


i- L— 


IStt  HO 

JULIAN  D*U 


>4r--  — In  ■■■  — 

WAV*  WAV  M AJNt  11 

CALENDAR  OATS 


X *1 
o >mtaw 


n • itfftoMi 


KWCNtMl 

OtMXiUI 

with  anew 

NUMftlftSIft 


Lt»  Z . 


-DROUGHT  ROUND* 


0(14*1 


in  MS 

AA.MMOAH 


no 


MO 


■ 1 1 ■ » ■ 

*M  1 KUkV  1 MAV  M JUNE  14 

CALENDAR  DATE 


FIGURE  Grspbfc  |hi  vrf  GIN  versus  lime  wttb  CMI  values  FIGURE  4.— Graphic  plot  of  GIN  versus  lime  with  CM!  values 
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TABLE  IV.— Contingency  Table  of  GIN  and  CMI 
Ctassificaiiort  Methods  Throughout  the  USGP° 


CMI 


Normal 

Stressed 

|| 

Normal 

43 

5 

48 

Stressed 

6 

16 

22 

49 

21 

IK3 
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the  two  classification  methods,  shows  that  the 
classifications  based  on  the  CMI  and  GIN  are  the 
same  85  percent  of  the  time.  It  was  concluded  that 
the  GIN  is  detecting  moisture  through  crop 
responses  and  that  this  procedure,  which  was 
developed  over  a small  region,  is  extendable  to  larger 
areas. 

An  inspection  of  the  11  disagreements  on  the 
classification  results  (table  II)  disclosed  that  the  soil 
types  at  segment  locations  related  to  5 of  the  dis- 
agreements have  different  water-holding  capacities 


than  those  used  in  the  CMI  model.  Also,  rainfall  pat- 
terns produced  amounts  at  the  segment  location 
which  differed  from  the  amounts  recorded  at  the 
weather  stations  used  in  computing  the  CMI.  This  is 
reflected  also  in  the  other  six  disagreements,  which 
occurred  in  segments  that  were  located  on  the  edge 
of  the  CRD;  thus,  the  CMI  does  not  necessarily 
represent  the  conditions  that  existed  at  the  segment 
location. 


CONCLUSIONS 

A technique  was  developed,  using  Landsat  digital 
data  from  5-  by  6-nautical-mile  sample  segments, 
which  indicates  when  agricultural  vegetation  is  un- 
dergoing moisture  stress.  A relation  between  this 
technique,  which  utilizes  remote  sensing,  and  a 
ground-based  criterion  (the  CMI)  has  been  shown. 
The  remote-sensing  procedure  was  shown  to  be  ex- 
pandable to  a larger  geographic  area  and  repeatable 
for  different  areas  and  years.  Thus,  in  areas  of  the 
world  where  ground  truth  is  not  available  or  reliable, 
it  is  possible  to  detect  and  determine  the  areal  extent 
of  moisture  stress  using  Landsat  data  in  an  automatic 
mode.  The  GIN  is  now  automatically  calculated  for 
all  LACIE  segments  as  they  are  loaded  into  the  data 
base.  The  procedure  has  been  implemented  on  the 
U S.  Department  of  Agriculture  (USDA)  system. 

While  this  procedure  was  developed  for  detecting 


437 


and  monitoring  moisture  stress,  variations  of  the 
idea  were  implemented  in  on-line  CAMS  processing. 
These  included  the  green  number  and  brightness  for 
each  of  the  209  grid  interactions,  green  number  ver- 
sus brightness  scatter  plots,  and  trajectory  plots  of 
green  number  and  brightness.  Thus,  a procedure 
which  was  developed  for  a particular  application  has 
had  a key  role  in  helping  to  solve  other  LACIE  prob- 
lems, such  as  aiding  the  analyst  in  verifying  the  con- 
sistency of  dot  labels  (Procedure  1)  and  separation  of 
wheat  from  other  small  grains. 
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LACIE  Area,  Yield,  and  Production  Estimate 
Characteristics:  U.8.  Great  Plains 

Duane  L.  Marquis0 


OVERVIEW 

The  objective  of  the  LACIE  is  to  estimate  produc- 
tion of  wheat  on  a country-by-country  basis.  LACIE 
was  designed  to  meet  U.S.  Department  of 
Agriculture  (USDA)  needs  in  areas  where  ground- 
truth  information  is  not  readily  available.  However, 
in  order  to  test  the  design  (to  determine  the  accuracy 
and  reliability),  an  area  where  comparison  informa- 
tion was  available  was  chosen.  This  area  was  the  nine 
states  of  the  U.S.  Great  Plains  (Colorado,  Kansas, 
Minnesota,  Montana,  Nebraska,  North  Dakota, 
Oklahoma,  South  Dakota,  and  Texas).  LACIE  was 
not  designed  to  improve  the  accuracy  of  the  U S. 
crop  reports. 

In  1974,  the  U.S,  Great  Plains  (USGP)  accr  ,ted 
for  over  64  percent  of  the  U.S.  winter  whee’  .ea  (56 
percent  of  production),  over  93  percen*  . the  U.S. 
spring  wheat  area  (89  percent  of  prof*  .ion),  and  73 
percent  of  all  wheat  area  in  the  U c ,o4  percent  of  all 
wheat  production).  By  1977  ^e  percentages  had 
all  increased.  These  nine  r‘  represented  the  wide 
range  of  soil  types,  d*'  .uc  conditions,  topography, 
cultural  practice'  ^n  as  crop  rotation,  strip  crop- 
ping, irrigatic  .,  summer  fallowing),  and  crop 
varieties  which  was  needed  to  test  the  adequacy  of 
the  design  (accuracy  of  the  technology).  Of  the  nine 
states  that  comprise  the  Great  Plains  region,  five 
states  (Colorado,  Kansas,  Nebraska,  Oklahoma,  and 
Texas)  are  almost  entirely  producers  of  winter 
wheat;  two  states  (Minnesota  and  North  Dakota)  are 
producers  of  spring  wheat;  and  two  states  (Montana 
and  South  Dakota)  produce  significant  amounts  of 
both  winter  and  spring  wheat.  As  the  LACIE  tech- 
nology changed,  comparison  information  had  to  be 
readily  available  to  assess  the  capability  of  the  new 
technology. 


•NASA  Johnson  Space  Center.  Houston.  Texas. 


To  estimate  wheat  production  on  a country  basis, 
the  country  is  subdivided  into  areas  (strata)  where 
yield  and  the  prevalence  of  wheat  planted  are 
relatively  uniform.  Yield  and  the  areal  extent  of 
wheat  within  each  stratum  are  estimated  by  indepen- 
dent methods  and  then  multiplied  together  to  obtain 
production  at  the  stratum  level.  The  production  esti- 
mates in  each  stratum  are  then  added  to  obtain  the 
production  estimates  at  other  geographical  or  politi- 
cal levels.  In  addition,  area  and  yield  are  aggregated 
to  determine  wheat  area  and  yield  at  other  hierarchi- 
cal levels  within  the  country. 

The  LACIE  was  designed  as  a three-phase  opera- 
tion to  cover  three  global  crop  seasons.  The  U.S. 
Great  Plains  has  played  a significant  role  in  all  three 
LACIE  phases.  The  next  three  sections  will  describe 
each  phase  in  more  detail  as  it  pertains  to  the  U.S. 
Great  Plains.  Items  discussed  include  scope,  sam- 
pling strategy,  Landsat  and  yield  data,  estimates  re- 
ported, accuracy  of  the  estimates,  and  technical 
issues  raised  in  each  phase. 


PHAtei 

Scope 

Phase  I of  LACIE  had  as  its  major  objective  the 
development  and  testing  of  a system  that  used  Land- 
sat  data  as  the  primary  input  for  estimating  wheat 
area  in  selected  regions  of  the  United  States.  Phase  I 
also  was  devoted  to  yield  model  development  and  to 
determining  the  feasibility  of  estimating  production 
in  the  United  States.  During  the  Phase  I developmen- 
tal period,  techniques  and  operational  procedures 
were  constantly  being  evaluated.  Based  on  such 
evaluations,  several  procedural  changes  were  imple- 
mented. 
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Sampling 

Sample  segments  were  allocated  to  the  county 
level  based  on  the  total  area  used  for  wheat  produc- 
tion  as  reported  In  the  1969  Census  or  Agriculture. 
On  the  basis  of  these  1969  data,  counties  within  the 
Great  Plains  were  classified  into  three  categories: 

1.  Group  1 counties— Those  counties  that  pro* 
duccd  sufficient  wheat  in  1969  to  justify  the  alloca- 
tion of  one  or  more  sample  segments, 

2.  Group  11  counties— Those  counties  for  which 
the  historical  wheat  area  in  an  individual  county  did 
not  justify  the  allocation  of  a sample  segment; 
however,  when  counties  in  the  same  Crop  Reporting 
District  (CRD)  were  combined,  the  allocation  of  one 
or  more  sample  segments  was  justified.  Sample  seg- 
ments in  Group  11  counties  were  allocated  on  their 
probability  proportional  to  size  and  were  referred  to 
as  PPS  segments.  In  PPS  sampling,  the  probability 
that  a particular  sample  unit  will  be  selected  is  pro- 
portional to  some  measure  of  size  associated  with  it. 

3.  Group  III  counties— Those  counties  in  which 
historical  wheat  area  did  not  justify  allocation  of  any 
sample  segments. 

The  above  sampling  procedure  allocated  41 1 sam- 
ple segments,  5 by  6 nautical  miles  in  size,  to  the  nine 
Great  Plains  states— 359  Group  I segments  and  52 
Group  II  segments  (table  I).  (For  a more  detailed  dis- 
cussion of  the  sampling  methodology  employed,  see 
LAC1E  C00200,  Vol.  IV,  Rev.  C,  Oct.  1977.) 


Table  I.-— Number  of  Sample  Segments  Allocated 
by  Slate,  by  Crop  Type,  and  by  Group; 

V.S.  Great  Plains,  Phase  / 


Stare 

Summer  at  tegmentt  atheated 

W ruler  xtieat 

Spring  nticai 

Group 

1 

Group 

II 

Tola 1 

Group 

t 

Group 

II 

Total 

Colorado 

V 

5 

32 

Kansas 

1} 

II 

84 

Minnesota 

8 

5 

13 

Moniarfit 

14 

2 

16 

43 

1 

44 

Nebraska 

28 

7 

35 

North  Dakota 

65 

0 

65 

Oklahoma 

33 

7 

40 

South  Dakota 

9 

1 

10 

21 

2 

23 

Tesas 

JL8 

M 

«, 

_ 

Total 

222 

44 

266 

137 

8 

145 

Lendut  Data 

The  collection  of  Landsat  data  for  each  segment 
was  scheduled  to  coincide  with  the  major  wheat 
growth  stages.  Due  primarily  to  cloud  cover,  the 
number  of  usable  images  was  substantially  reduced. 
On  the  average,  2.3  acquisitions  per  segment  were  ob- 
tained. 

At  the  beginning  of  Phase  I,  acquisitions  were 
assigned  to  average  or  nominal  biowindows  based  on 
historical  wheat  growth  stage  (crop  calendar)  data. 
Biowindow  1 included  Robertson  crop  growth  stages 
1 and  2 (planting  to  jointing);  biowindow  2 included 
stage  3 (jointing  to  heading);  biowindow  3 included 
stage  4 (heading  to  soft  dough);  and  biowindow  4 in- 
cluded stages  5, 6,  and  7 (soft  dough  through  har- 
vest). When  the  crop  calendar  models  based  on  ac- 
tual development  became  operative  in  May  1975, 
biowindows  were  updated  at  the  CRD  level;  a second 
updating  of  the  biowindows  was  performed  at  the 
end  of  the  reason  based  on  actual  crop  development. 
Actual  biowindow  definitions  remained  constant  for 
biowindows  2 through  4,  but  biowindow  1 was  ad- 
justed to  eliminate  growth  stages  prior  to  Robertson 
stage  2.3.  Growth  stage  2.3  corresponds  to  the 
minimum  plant  cover  for  detection. 

In  each  segment  for  which  Landsat  data  were  ac- 
quired, the  Classification  and  Mensuration  Sub- 
system (CAMS)  estimated  the  proportion  of  area 
within  the  segment  that  was  devoted  to  wheat  or 
small  grains  production.  These  estimates  were  inter- 
nally evaluated  by  CAMS  for  classification  accuracy 
and  transmitted  to  the  Crop  Assessment  Subsystem 
(CAS)  where  they  were  used  to  estimate  wheat 
acreage  at  the  stratum  (CRD),  the  zone  (state),  and 
the  region  (Great  Plains)  levels.* 

After  all  Landsat  data  for  Phare  I had  been  proc- 
essed, procedures  used  in  CAMS  were  reviewed  and 
ultimately  revised  based  on  the  experience  obtained 
during  Phase  i operations.  A decision  was  made  to 
use  the  “new  CAMS  procedures"  to  take  a retrospec- 
tive look  at  the  Landsat  data  acquired  in  Phase  I. 
There  data  were  reprocessed  and  analyzed  using  a 
multitemporal  approach  to  provide  the  "best  possi- 
ble estimate"  of  wheat  proportion  in  all  segments  for 
which  data  were  available.  The  proportion  estimates 
were  set  up  as  the  CAMS  rework  data  bare,  and  ag- 
gregations were  performed.  The  results  of  there  ag- 
gregations are  referred  to  as  the  “at-harvest"  esti- 
mates. These  estimates  cannot  be  compared  to  the 
real-time  estimates  reported  from  April  through 
August  because  in  the  earlier  estimates  (1)  bare 


440 


ground  was  included  as  potential  wheat  and  (2) 
Landsat  data  in  biowindow  1 were  acquired  before 
wheat  greened  up  enough  to  allow  positive  identifies* 
don.  Table  II  shows  the  number  of  sepnents  alio* 
cated  and  the  number  of  segments  used  by  crop  type 
and  by  state  for  the  aggregation  of  the  reworked  data. 


Agriculture.  The  five-state  winter  wheat  area  was 
within  1 percent  of  the  USDA  final  estimate  for 
1975.  (The  five  states  are  Colorado,  Kansas, 
Nebraska,  Oklahoma,  and  Texas.  The  seven  winter 
wheat  states  include  Montana  and  South  Dakota 
with  the  five  winter  wheat  states.  The  four  spring 
wheat  states  are  Minnesota,  Montana,  North  Dakota, 
and  South  Dakota.)  The  Oklahoma  estimate  was 


Table  ID  shows  the  remits  of  the  aggregation 
based  on  die  CAMS  rework  data  with  PPS  sepnents 
used  in  the  anregation  plus  the  corresponding 
acreage  estimates  reported  by  the  Crop  Reporting 
Board  of  the  Economics,  Statistics,  and  Cooperatives 
Service  (ESCS)  (formerly  the  Statistical  Reporting 
Service  (SRS))  of  the  U.S.  Department  of 


Table  II.— Number  rtf  Sample  Semens  Allocated 
and  Number  Used/or  the  CAMS  Rework 
Aggregation,  by  State  and  by  Type  of  Wheat 


Stale 

Segment!  allocated 

CAMS 

rework  segment 

Group  1 

Group  II 

Group  1 

Group  II 

Winter  wheat 

Colorado 

27 

5 

21 

3 

Kama* 

73 

11 

SO 

S 

Nebraska 

28 

7 

19 

4 

Oklahoma 

33 

7 

23 

6 

Texas 

38 

II 

23 

S 

— 

“ 

— 

Subtotal 

199 

41 

136 

23 

Spring  wheal 

Minntsota 

8 

5 

5 

4 

North  Dakota 

65 

0 

42 

0 

Subtotal 

73 

5 

47 

4 

Mixed  wheat 

Montana 

57 

3 

36 

3 

South  Dakota 

30 

3 

21 

2 

Subtotal 

87 

6 

57 

5 

Total  wheat 

Total  set  menu 

359 

52 

240 

32 

Table  til.— LACIE  A i-Harvest  Estimates 
With  PPS  Segments  Included  In  the  Aggregation 
Compared  to  1975  Final  Estimate 
of  Harvested  Acres  by  USDA 


Slate 

LACIE 
estimate, 
thousands 
ttf  acres 

CKa 

percent 

IV7f  final 
USDA 
estimate " 
thousands 
aj  acres 

Relative 
different e.1' 
percent 

Winter  wheat 

Colorado 

3058 

20.8 

2470 

+ 19.2 

Kansas 

12940 

7.1 

12100 

+6.5 

Nebraska 

2657 

28.0 

3070 

-IS.S 

Oklahoma 

6906 

IIT 

6700 

+3.0 

Tesaa 

4218 

326 

5700 

-35.1 

Subtotal 

29779 

7.0 

30  040 

-0.9 

Spring  wheal 

Minneaoia 

d2  150 

15.7 

2 787 

-296 

North  Dakota 

dS8S3 

14.8 

10090 

-72.4 

Subtotal 

d8003 

11.6 

12877 

-60.9 

Mixed  wheat 

Montana 

d3  999 

25.9 

4 975 

-244 

South  Dakota 

d4  154 

17.7 

296$ 

+286 

Subtotal 

d8  153 

15.6 

7 940 

+26 

Total  wheat 

Total  estimates 

45935 

5.7 

50  857 

-10  7 

- . _ . Sample  atenderd  error 

Tiefittiem  nt  tiruiHifl  “ » 100 

I.  AC  It  estimate 

Vnf  Production  1 97?  Annuel  Sommer).  Crop  Reporting  B»erd.  I lfS,  I'SDA. 
('rPiM(Ti).  Ion  16.1971 

IAC.II  SRb 

iR«Uiisr  difference  - * ItiO 

^he  CAM*  rework  revolted  in  w inter  wheel  end  spring  vnuU-grsm*  proportion 
e»iimeiea  The%c  c*timet«>  were  adjusted  trstu«dl  et  the  segment  level  l»r  tun.  heriev 
and  flea  harvested  in  I97g 
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closest  to  the  USDA  state  estimate  at  3 percent 
above.  Then  followed  Kansas  at  6.5  percent  above, 
Nebraska  at  1S.S  percent  below,  Colorado  at  19.2  per* 
cent  above,  and  Texas  at  35.1  percent  below. 

For  the  mixed  wheat  states,  LACIE's  Montana 
estimate  of  total  wheat  was  24.4  percent  below  the 
USDA  estimate,  and  the  South  Dakota  estimate  was 
28.6  percent  above.  In  the  spring  wheat  states,  the 
Minnesota  estimate  was  29.6  percent  below,  and  the 
North  Dakota  estimate  was  72.4  percent  below  the 
USDA  estimate.  Investigations  into  the  causes  of 
these  errors  are  discussed  in  more  detail  in  the  next 
section. 


Accuracy  of  tho  Eathnatea 

In  Phase  I,  the  LACIE  accuracy  goal  was  applied 
at  the  national  level.  The  LACIE  accuracy  goal, 
referred  to  as  the  90/90  criterion,  specified  that  the 
at-harvest  wheat  production  estimator  be  within  10 
percent  of  the  true  value  90  percent  of  the  time.  In 
order  to  assess  whether  the  acreage  estimates  would 
support  this  criterion,  it  was  assumed  that  area  and 
yield  estimators  were  unbiased  and  independent  and 
that  the  coefficient  of  variation  of  the  yield  predictor 
was  equal  to  that  of  the  acreage  estimator.  Under 
these  assumptions,  the  90/90  criterion  would  be 
satisfied  if  the  coefficient  of  variation  of  an  acreage 
estimator,  at  the  national  level,  was  less  than  4.3  per* 
cent. 

The  coefficient  of  variation  of  the  acreage  estima* 
tor  at  the  U.S.  Great  Plains  level  was  estimated  to  be 
5.7  percent  and  was  projected  to  be  3.7  percent  at  the 
national  level.  However,  a significant  difference  of 
— 10.7  percent  was  observed  between  the  LACIE 
estimate  for  the  US  Great  Plains  and  the  corre* 
sponding  USDA  estimate,  indicating  a negative  bias. 
Snce  the  projected  coefficient  of  variation  of  3.7  per- 
cent was  less  than  the  required  4.3  percent,  some  bias 
was  tolerable,  and  it  was  inferred  that  the  LACIE 
acreage  estimator  marginally  supported  the  90/90  ac- 
curacy goal.  However,  problems  with  sampling  and 
classification  encountered  during  Phase  I indicated 
that  improvements  could  and  should  be  made  before 
concluding  that  the  LACIE  acreage  estimator  met 
the  90/90  criterion. 

A significant  contributor  to  the  wheat  acreage 
underestimate  at  the  Great  Plains  level  was  the  un- 
derage (-72.4  percent)  observed  for  North  Dakota 
spring  wheat  (see  table  III).  Analyse*  based  on  com- 
parisons of  segment  wheat  proportion  estimates  with 


corresponding  ground-observed  proportions  for  20 
sample  segments  and  with  historical  wheat  propor- 
tions for  the  corresponding  counties  indicated  that 
sampling  was  the  major  problem  rather  than 
classification.  Additional  segments  woe  added  in 
North  Dakota  to  alleviate  this  problem  in  Phase  U. 

The  comparisons  of  segment  wheat  proportion 
estimates  with  ground-observed  wheat  proportion 
estimates  also  indicated  a difficulty  in  differentiating 
wheat  from  other  related  small  grains.  This  required 
that  wheat  area  estimates  be  obtained  by  redoing 
small  grains  area  estimates  in  accordance  with  the 
historic  prevalence  of  these  crops.  Also,  wheat  iden- 
tification was  found  to  be  more  difficult  in  regions  of 
marginal  wheat  production,  small  fields,  or  large 
amounts  of  confusion  crops. 


Tnetmteal  leauee 

There  were  technical  problems  which  arose  dur- 
ing Phase  I.  Those  described  in  this  section  are  the 
major  ones  affecting  the  .«ul*u  obtained  in  Phase  I. 

WheatismaU  grains  separation.— A major  source  of 
error  found  in  Phase  I was  that  spring  wheat  could 
not  be  reliably  distinguished  from  other  spring  small 
grains,  although  spring  small  grains  could  be  dis- 
tinguished from  other  crops.  For  winter  wheat,  the 
major  source  of  error  appeared  to  be  classification  er- 
ror in  marginal  areas,  where  confusion  crops  such  as 
alfalfa  were  in  abundance.  The  fact  that  wheat  could 
not  be  consistently  and  accurately  separated  from 
other  small  grains  leads  to  a second  issue. 

Wheatismall  grains  railoing. — If  total  small  grains 
proportion  estimates  were  the  products  generated  by 
CAMS  for  the  spring  and  mixed  wheat  areas,  then 
CAS  was  forced  to  develop  wheat  estimates  from 
these  smsll  grains  estimates.  The  procedure  used  was 
to  “ratio”  the  spring  small  grains  estimate  to  obtain  a 
spring  wheat  estimate.  This  ratio  was  based  on  some 
historical  proportion  of  spring  wheat  to  spring  small 
grains  or  to  total  small  grains  in  the  state.  This  ratio- 
ing  technique  introduced  errors  into  the  system  since 
the  current-year  ratio  of  wheat  to  small  grains  is  not 
likely  to  be  the  same  as  the  ratio  in  some  year  past. 

Group  II  sampling.— A review  of  the  LACIE  pro- 
cedures in  early  August  1975  pinpointed  the  fact  that 
a relatively  large  part  of  the  positive  bias  (over- 
estimation) observed  in  the  acreage  estimates  made 
to  that  time  was  directly  attributable  to  classification 
of  wheat  in  the  Group  II  (PPS)  sample  segments. 
These  overestimates  indicated  a potential  source  of 
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error  in  the  PPS  approach.  While  such  a sampling 
strategy  is  conceptually  sound  from  a statistical  point 
of  view,  questions  were  asked  about  the  practicality 
(applicability)  of  the  approach  since  CAMS  tended 
to  overestimate  wheat  area  in  these  low-density 
wheat  areas  and  the  PPS  approach  requires  an  un- 
biased estimate. 

The  major  inference  drawn  from  experiences  with 
the  PPS  segments  in  Phase  I was  that  the  PPS  seg- 
ments appeared  to  have  characteristics  (such  as  low 
percentage  wheat)  that  made  accurate  classification 
more  difficult  than  classification  in  Group  I seg- 
ments. It  was  also  found  that  the  aggregation  logic 
was  particularly  sensitive  to  errors  in  acreage  esti- 
mates for  the  PPS  segments. 

Eatiy-seoson  estimation. — Because  an  estimate  of 
wheat  production  early  in  the  crop  year  was  con- 
sidered especially  valuable,  it  was  a project  concern 
to  produce  estimates  as  early  as  possible.  During 
Phase  I,  an  attempt  was  made  to  arrive  at  an  area  esti- 
mate using  fall  data,  which  showed  little  wheat 
emerged.  The  approach  was  to  classify  areas  of 
seedbed  preparation  or  bare  soil  as  “potential 
wheat."  However,  fall  plowing  and  seedbed  prepara- 
tion were  conducted  in  many  areas  for  purposes 
other  than  planting  wheat,  and  thus  the  LACIE  esti- 
mate was  initially  considerably  higher  than  the 
USDA  estimate.  Other  major  causes  of  the  high  esti- 
mates. in  addition  to  this  “potential  wheat"  problem, 
seemed  to  be  (1)  cases  in  which  wheat  could  not  be 
sept  rated  from  small  grains  and  other  crops  and  (2) 
cases  in  which  classifications  would  be  made  with 
the  estimate  results  in  three  overlapping  classes  (i.e., 
winter  wheat,  spring  wheat,  and  total  wheat). 

Sampling. — Certain  problems  were  found  in  sam- 
pling. One  was  placing  samples  in  nonagricultural 
areas  because  of  a lack  of  full-frame  Lindsat  data  to 
support  the  proper  delineation  of  such  areas. 
Another  problem  concerned  the  assumption  that 
counties  were  relatively  homogeneous.  Actual  ex- 
perience did  not  support  this  assumption.  Landsat 
data  provided  the  basis  for  the  delineation  of  areas  to 
be  sampled. 


PHASE  II 
Scope 

The  Phase  II  scope  of  LACIE  in  the  U.S.  Great 
Plains  “yardstick"  region  was  to  tat  the  total  system; 


that  is,  to  estimate  area,  yield,  and  production  com- 
ponents, with  emphasis  on  early-season  estimates. 
The  plan  tailed  for  the  generation  of  area,  yield,  and 
production  estimates  for  the  seven-state  area  from 
February  through  October,  four-state  estimates  from 
July  through  October,  end  nine  state  estimates  from 
July  through  October.  However,  insufficient  Lnndsat 
data  in  Montana  and  South  Dakota  caused  the  Febru- 
ary through  May  estimates  for  winter  wheat  to  be 
made  for  the  five  states  of  the  Southern  Great  Plains 
only.  Spring  wheat  was  not  detectable  until  the  mid- 
June  Landsat  pass.  With  the  30-day  period  from  ac- 
quisition to  receipt  by  CAS,  the  first  spring  wheat 
and  total  wheat  estimates  were  delayed  until  August. 


Sampling 

The  sampling  strategy  used  was  the  same  as  that 
used  in  Phase  I with  the  exception  of  the  redelincs 
lion  of  agricultural  and  nonagricultural  areas  in 
North  Dakota,  which  resulted  in  moving  some  seg- 
ments. Also,  as  mentioned  earlier.  20  sample  seg- 
ments were  added  in  North  Dakota.  Thus,  Phase  II 
included  431  sample  segments  in  the  Great  Plains 
region. 


LandMt  Data 

In  Phase  II,  all  Landsat  acquisitions  were  utilized. 
Table  IV  shows  the  number  of  segments  allocated  by 
state  and  by  crop  type  plus  the  number  of  sample 
segments  used  in  each  of  the  CAS  reports. 

At  the  five-suite  level,  of  those  segments  allo- 
cated, the  number  of  segments  used  ranged  from 
over  $4  percent  for  the  February  report  to  97  percent 
for  the  end-of-season  report.  For  the  seven-state 
area,  the  number  of  segments  used  ranged  from  over 
81  percent  in  June  to  96.5  percent  by  the  end  of 
season.  At  the  four-state  level,  usable  acquisitions 
were  obtained  from  48.3  percent  of  those  segments 
allocated  in  August  to  over  90  percent  by  the  end  of 
season,  and  for  the  nine  states  combined,  the  usable 
acquisitions  ranged  from  76.1  percent  in  August  to 
94.4  percent  by  the  end  of  season.  CAS  Annual  Re- 
port 03.  December  15. 1976,  con»»ins  tables  showing 
the  average  percent  wheat  per  segment  used  in  each 
report,  the  average  number  of  elapsed  days  from 
Landsat  acquisition  until  receipt  in  CAS,  and  the  dis- 
tribution of  segments  used  in  each  report  by  month 
of  acquisition. 
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Figure  1 shows  the  number  of  classifications 
posed  as  wheal  or  small  grains  by  monthly  report 
for  the  nine  states,  the  five  states,  and  the  seven 
states.  All  the  classifications  for  the  four  spring 
wheat  states  were  small  grains.  Most  of  the  classifica- 
tions were  made  for  small  grains,  even  for  the  five 
states. 

YMdOeta 

Yield  estimates  were  received  in  CAS  on  the 
fourth  working  day  of  the  month.  The  yield  model* 
are  documented  in  LACIE-004J1  (June  1975).  The 
models  were  developed  at  the  state  level  and  were 
run  at  the  CRD  level.  Dau  bases  for  the  CAS  soft- 
ware used  the  individual  CRD-ievei  yield  estimates. 


The  yield  models  were  developed  from  a yield  and 
climatic  data  base  of  approximately  45  years.  The 
yield  data  were  USDA  yield  pm  harvested  acre,  and 
the  climatic  data  were  National  Oceanic  and  At- 
mospheric Administration  (NOAA)  monthly 
climatic  division  averages  of  precipitation  amt  tem- 
perature. A piecewise  linear  curve  was  used  to  model 
the  technology  trend.  A more  detailed  explanation  of 
the  models  is  contained  in  the  document  referred  to 
above. 

Estimate?  of  Area,  Yield,  and  Production 

The  discussion  of  the  Phase  II  results  refers  to  the 
estimates  as  revised  in  CAS  Annual  Report  03, 
December  15, 1976.  The  revised  estimates  resulted 


Table  IV.— Number  Sample  Segments  Allocated  ar.1  Npmber  of  Segments  Used  in  Each  Report, 
by  Stale  and  by  Crop  Type ; U.S.  Great  Plains,  Phase  II 
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FIGURE  I. — DistriWtisa  of  Lanfcat  Brtjufcitioos  ky  the  lumber 
classified  as  wheal  and  the  aunter  classified  as  small  trains  far 
each  CAS  Monthly  Report,  Phase  II.  (a)  U.S.  Great  Plains 
wheat,  sine  states;  (h)  US.  Great  Plains  winter  wheat,  five 
states;  (c)  U.S.  Great  Plains  winter  wheat,  seven  stales. 


from  the  use  of  the  CAS  technology  available  at  the 
end  of  Phase  II  to  provide  a consistent  set  of  esti- 
mates. Known  data  errors  were  also  corrected.  Ap- 
pendix A contains  the  Phase  11  estimates  of  area, 
yield,  and  production  by  state,  by  crop  type,  and  by 
report.  Tables  A-I,  A-II,  and  A-HI  contain  the  revised 
area,  production,  and  yield  estimates.  Tables  A-IV, 
A-V,  and  A-VI  contain  the  corresponding  coeffi- 
cients of  variation  for  the  revised  estimates.  Tables 
A-VII,  A-VIII,  and  A-1X  contain  the  real-time 
L ACIE  estimates.  However,  the  statistics  reported  in 
real  time  during  Phase  II  were  not  correct  and  will 
not  he  presented  here. 

US.  Southern  Great  Plains  winter  wheat. — Figure  2 
shows  the  revised  monthly  LACIE  estimates  of  area, 
production,  and  yield  for  the  Five  states  compared  to 
the  corresponding  monthly  USDA  estimates.  The 
area  estimate  in  February  was  22.7  million  acres.  The 
area  estimates  for  Nebraska  and  Colorado  were  quite 
high  when  compared  to  historical  data,  and  the  esti- 
mates for  Kansas,  Oklahoma,  and  Texas  were  low  by 
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the  same  comparison.  The  average  estimated  percent 
wheat  per  segment  was  13.7  for  the  live  states.  With 
additional  segments  acquired,  the  April  estimate  was 
21.8  million  acres.  This  decrease  occurred  because 
the  Colorado  ai  d Nebraska  estimates  dropped  to 
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near  their  historical  averages  The  Kansas  and  Texas 
estimates  increased  somewhat.  The  average  percent 
wheat  per  segment  Tor  the  five  states  was  14.5  in 
April.  The  area  estimate  reached  its  peak  a;  26.? 
million  acres  in  late  June.  The  average  percent  wheat 
per  segment  for  Colorado,  Nebraska,  and  Texas 
declined  after  this  report,  while  Kansas  and 
Oklahoma  percent  wheat  averages  increased.  The 
end-of-season  aiea  estimate  was  25.8  million  acres 
compared  to  the  final  USD  A estimate  of  27.65 
million  acres  (a  -“7-percent  relative  difference). 

The  production  estimate  for  the  five  states  was 
626.1  million  bushels  in  February  (table  A-1I).  The 
estimate  declined  to  564.1  million  bushels  because  of 
both  the  area  estimate  changes  discussed  above  and 
the  reductions  in  yield  model  estimates.  By  the  late 
June  report,  the  estimate  was  706.2  million  bushels. 


This  increase  resulted  from  large  area  increases  in 
Kansas,  Oklahoma,  and  Texas,  plus  yield  increases 
in  ail  states  except  Nebraska.  The  final  LACIE  esti- 
mate was  686.2  million  bushels  compared  to  the 
USDA  final  production  estimate  of  739.6  million 
bushels  (a  -7.8-percent  relative  difference). 

The  derived  yield  estimate  for  the  five  states  was 
27.6  bushels  per  acre  in  February,  (Yield  is  derived 
by  dividing  the  production  estimate  by  the  area  esti- 
mate. This  yield  is  a weighted  average  of  all  the  yield 
model  estimates.)  Yield  model  estimates  in  Colorado 
and  Kansas  declined  by  nearly  2 bushels  per  acre, 
and  the  Nebraska  estimate  declined  by  3 bushels  per 
acre  by  May  (table  A-Ill).  This  resulted  in  a five- 
state  yield  estimate  of  25.3  bushels  per  acre.  By  the 
end  of  the  season,  the  Kansas,  Nebraska,  Oklahoma, 
and  Texas  yields  increased,  resulting  in  a five-state 
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FIGURE  2.— Monthly  comparison  of  LACIE  and  USDA  estimates;  Phase  II;  Southern  Great  Plains  winter  wheat  states  (Colorado, 
Kansas.  Nebraska,  Oklahoma,  and  Texas). 
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yield  estimate  of  26.6  bushels  per  acre.  The  final 
USDA  estimate  was  26.7  bushels  per  acre  (a  -0.4- 
percent  relative  difference). 

U.S.  Great  Plains  winter  wheat. — The  first  esti- 
mates for  the  seven  states  were  made  in  June.  Figure 
3 shows  the  revised  LACIE  area,  production,  and 
yield  estimates  compared  to  the  USDA  estimates. 
The  June  area  estimate  was  28  million  acres.  The 
estimates  for  Nebraska  and  South  Dakota  were  over- 
estimates, while  the  estimates  for  Oklahoma,  Texas, 
and  Montana  were  underestimates  (table  A-l).  The 
estimate  increase  I to  29.9  million  acres  in  August 
due  primarily  tc  he  area  increases  in  Montana  and 


South  Dakota.  The  final  LACIE  estimate  was  29.4 
million  acres  compared  to  the  final  USDA  estimate 
of  31.7  million  acres  (a  —7.8-percent  relative 
difference). 

The  initial  seven-state  production  estimate  was 
740.7  million  bushels  (table  A-II).  The  estimate  in- 
creased to  798.3  million  bushels  by  ^ugust  due  pri- 
marily to  area  and  yield  estimate  increases  in  Mon- 
tana and  South  Dakota  (tables  A-I  and  A-IIt).  The 
final  estimate  declined  to  794.2  million  bushels,  due 
primarily  to  a decline  in  the  Nebraska  area  estimate. 
The  final  USDA  estimate  was'855.6  million  bushels 
(a  -7.7-percent  relative  difference). 
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FIGURE  3.— Monthly  comparison  of  LACIE  and  USDA  estimates;  Phase  II;  winter  wheat,  seven  stales  (Colorado,  Kansas,  Nebraska, 
Oklahoma,  Texas,  Montana,  and  South  Dakota). 
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The  June  derived  yield  estimate  was  26,5  bushels 
per  acre.  The  estimate  increased  to  27.0  bushels  per 
acre  by  September  due  mainly  to  yield  model  esti- 
mate increases  in  Nebraska,  South  Dakota,  and  Mon- 
tana. The  final  estimate  was  27.0  bushels  per  acre, 
the  same  as  the  final  USDA  yield  estimate, 

US.  Great  Plains  spring  ' The  first  spring 

wheat  estimates  made  by  LAC1E  were  in  August. 
Figure  4 shows  the  revised  LACIE  area,  production, 
and  yield  estimates  compared  to  the  USDA  esti- 
mates. The  initial  area  estimate  was  13.2  million 
acres.  The  estimates  Tor  Minnesota,  Montana,  and 
North  Dakota  were  under  their  respective  historical 
estimates  (table  A-I).  By  September,  the  area  esti- 


mate increased  to  15.6  million  acres  due  largely  to  in- 
creases in  Minnesota  and  North  Dakota.  The  average 
percent  wheat  per  segment  increased  to  18.5  from 

14.4  in  August,  The  area  estimate  decreased  in  Octo- 
ber due  to  a drop  in  the  area  estimate  for  Minnesota. 
The  end-of-season  estimate  increased  to  1S.6S 
million  acres  due  to  area  increases  in  Montana  and 
North  Dakota.  The  average  percent  wheat  per  seg- 
ment was  19.5.  The  final  USDA  area  estimate  was 
19.8  million  acres  (a  - 26.9-percent  relative 
difference). 

The  August  spring  wheat  production  estimate  was 

347.4  million  bushels  (table  A-II).  By  September,  the 
estimate  was  409.4  million  bushels.  The  increase  was 
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FIGURE  4. — Menthl*  comparison  of  LACIE  and  USDA  estimates;  Phase  II;  spring  wheat,  tour  slates  (Minnesota,  Montana,  North 
Dakota,  and  South  Dakota). 
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due  primarily  to  the  area  estimate  increases  since 
only  the  Montana  spring  wheat  yield  increased  sub* 
stantially  (tables  A-I  and  A*U).  In  October,  the  pro* 
duction  estimate  was  406.2  million  bushels;  the 
decrease  was  due  to  the  Minnesota  area  estimate 
decline.  The  Anal  production  estimate  by  LACIE 
was  409.9  million  bushels,  the  increase  due  entirely 
to  area  estimate  increases.  The  final  USDA  estimate 
was  S01.1  million  bushels  (a  -22. 2-percent  relative 
difference). 

The  derived  yield  for  the  four  spring  wheat  states 
was  26.3  bushels  per  acre  in  August  and  September 
and  26.2  bushels  per  acre  in  October  and  for  the  final 
LACIE  estimate  (table  A*IU).  The  final  USDA  esti- 
mate was  2S.3  bushels  per  acre  (a  relative  difference 
of  3.4  percent). 

U.S.  Great  Plains  total  wheal. — The  first  LACIE 


estimate  of  total  wheat  for  the  nine  states  in  the  U.S. 
Great  Plains  came  in  August  (fig.  S).  The  revised 
estimate  was  43.1  million  acres  (table  A*I).  In  Sep* 
tember  and  October,  the  estimate  was  44.8  million 
acres.  The  increase  was  due  to  the  spring  wheat  esti- 
mates. The  final  LACIE  area  estimate  was  4S.0 
million  acres.  The  final  USDA  estimate  was  Sl.S 
million  acres  (a  -14.4-percent  relative  difference). 

The  initial  production  estimate  was  1 14S.8  million 
bushels  (table  A-II).  In  September,  the  estimate  was 
1199.9  million  bushels;  in  October,  it  was  1197.$ 
million  bushels;  and  the  final  estimate  was  1204.1 
million  bushels.  The  final  USDA  estimate  was  1356.7 
million  bushels,  for  a relative  difference  of 
— 12.7  percent. 

The  initial  derived  yield  estimate  was  26.6  bushels 
per  acre  (table  A-III).  The  final  LACIE  yield  esti- 
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niaic  <*«  26.7  bushels  per  acre  compared  to  the  final 
USD  A estimate  of  26.4  bushels  per  acre  (a  1.1 -per- 
cent  relative  difference). 


Aeeuraey  of  the  Estimate* 

In  Phase  II,  LACIE  estimates  were  made  for  &.ea, 
yield,  and  production.  Generally  the  yield  estimates 
were  quite  close  to  USDA  estimates  and  were  con- 
sidered satisfactory.  However,  the  area  and  produc- 
tion estimates  at  the  U.S.  Great  Plains  level  were  low 
compared  to  the  USDA  estimates  due  primarily  to 
significant  underestimates  of  spring  wheat  area  in 
the  four  U.S.  Northern  Great  Plains  states  and  to  a 
significant  underestimate  of  winter  wheat  area  in 
Oklahoma. 

An  evaluation  of  the  LACIE  total  wheat  produc- 
tion estimator  for  the  USGP  in  terms  of  the  90/90  cri- 
terion indicated  that  the  coefficient  of  variation  of 
the  estimator,  calculated  to  be  S percent,  was  suffi- 
ciently small  to  tolerate  a relative  bias  of  4 percent. 
However,  a relative  difference  between  the  LACIE 
production  estimate  and  the  USDA  estimate  of 
-12.3  percent  indicated  that  the  relative  bias  ct  the 
production  estimator  was  likely  to  be  larger  than  was 
tolerable.  An  estimate  of  the  bias  using  ground-truth 
information  from  the  blind  sites  also  indicated  that 
the  relative  bias  was  larger  than  the  tolerable  4 per- 
cent, supporting  the  difference  observed  from  the 
USDA  estimate.  As  a result,  it  was  concluded  that 
the  90/90  criterion  was  not  met.  It  was  inferred, 
however,  based  on  the  blind  site  analysis,  that  an  ac- 
curacy goal  of  90/75  was  achievable. 

For  winter  wheat  production  in  the  USGP,  the 
LACIE  estimate  was  not  significantly  different  from 
the  USDA  estimate.  However,  significant  acreage  un- 
derestimation  problems  were  indicated  for 
Oklahoma,  a problem  not  observed  in  Phase  I.  Dur- 
ing Phase  II,  Oklahoma  and  other  states  of  the 
Southern  Great  Plains  experienced  generally  dry 
conditions  through  April  1976.  These  conditions  cre- 
ated poor  wheat  stands  and  caused  these  wheat  sig- 
natures to  differ  significantly  from  those  of  normal 
wheat.  In  some  cases,  sparsely  vegetated  fields  were 
not  detected  as  “emerged"  acreage  in  the  Landsat  or 
on  the  aircraft  ground-truth  color-infrared  imagery. 
April  rains  greatly  improved  the  wheat  stands; 
however,  the  dro>ight-altered  growth  cycle  misled 
the  analysts  to  believe  the  late-recovering  wheat  to  be 
spring-planted  crops. 


For  spring  wheat  production  in  the  USGP,  a sig- 
nificant difference  was  observed  between  the  LACIE 
and  USDA  estimates.  The  major  contributors  were 
spring  wheat  acreage  underestimates  for  Minnesota 
and  Montana.  As  was  indicated  in  Phase  I,  spring 
wheat  could  not  be  reliably  differentiated  from  some 
other  spring  small  grains.  As  a result,  historic  ratios 
of  spring  wheat  acreage  to  spring  small  grains  acreage 
were  used  to  obtain  spring  wheat  acreage  estimates. 
This  introduced  additional  error  into  the  spring 
wheat  acreage  estimates,  particularly  in  Phase  11 
when  the  planting  of  wheat  in  preference  to  other 
small  grains  greatly  increased  from  previous  years. 
Blind  site  comparisons  of  LACIE  small  grains  pro- 
portion estimates  with  ground-truth  proportion  esti- 
mates also  indicated  a tendency  towards  under- 
estimation of  small  grains  proportions.  Both  ratioing 
and  classification  were  significant  contributors  to  the 
underestimation  problem. 

The  small  grains  proportion  underestimation  was 
partially  due  to  the  strip-fallow  cropping  practice  in 
the  spring  wheat  region,  particularly  in  Montana. 
Strip-fallow  fields,  some  of  which  are  sma'I  com- 
pared to  Landsat  resolution,  are  difficult  to  detect 
and  measure  on  the  imagery.  In  Minnesota,  under- 
estimation generally  occurred  in  segments  with  high 
spring  wheat  density.  Analysis  indicated  that 
unusual  wheat  signatures,  partially  due  to  color  dis- 
tortions in  the  Landsat  imagery,  were  the  major 
cause  of  the  underestimation.  Analysis  also  indicated 
that  sampling  was  a problem  in  Minnesota.  The 
reallocation  in  Phase  III  resulted  in  an  increase  from 
13  segments  to  47  segments. 

Regarding  the  performance  of  the  first-generation 
yield  models  developed  during  LACIE  Phase  I and 
implemented  in  Phase  II,  tests  of  them  by  com- 
parison with  10  years  of  historic  data  indicate  ade- 
quate performance  in  estimating  wheat  yields  when 
aggregated  to  the  USGP  level.  At  state  levels, 
however,  investigations  have  indicated  the  need  to 
improve  yield  predictions  for  extreme  weather  con- 
ditions. For  example,  1975-76  was  an  extremely  dry 
year  for  South  Dakota,  and  USDA  estimated  the 
spring  wheat  yield  at  10.9  bushels  per  acre  and  the 
winter  wheat  yield  at  18.0  bushels  per  acre.  The 
LACIE  South  Dakota  yield  models,  on  the  other 
hand,  estimated  17.2  and  31.6  bushels  per  acre  for 
spring  wheat  and  winter  wheat  yields,  respectively. 
Even  if  a zero  value  of  precipitation  had  been  en- 
tered in  the  spring  wheat  model,  the  estimate  would 
have  been  13.0  bushels  per  acre.  This  indicates  the 


450 


inadequacy  of  these  yield  model  forms  to  reflect  the 
total  dynamic  range  of  the  plant's  response  to  its  en- 
vironment. 


Technical  Issue* 

As  a result  of  the  LACIE  experience  through 
Phase  II,  several  technical  issues  remained  which  re- 
quired further  study. 

Differentiation  of  small  grams. — Wheat  was  not 
reliably  differentiated  from  other  small  grains.  In 
Phase  I,  an  analysis  of  North  Dakota  blind  sites 
revealed  that  barley  was  not  being  reliably  dis- 
tinguished from  spring  wheat.  In  addition,  other 
crops  such  as  alfalfa  and  pasture  became  confusion 
crops.  Efforts  were  begun  late  in  Phase  I to  develop 
improved  analysis  procedures— procedures  that 
would  take  advantage  of  any  spectral  separability 
that  exists  between  the  crops.  For  Phase  II,  however, 
the  classification  and  mensuration  procedures  were 
used  to  estimate  total  small  grains,  and  ratios  based 
on  historic  proportions  of  spring  wheat  to  small 
grains  were  used  to  convert  the  Landsat-based  small 
grains  estimates  to  wheat  estimates. 

Historic  ratios  of  wheat  to  small  grains. — In  Phase 
11,  the  ratios  from  the  latest  year  for  which  data  were 
available  at  the  county  level  were  used  to  estimate 
wheat,  given  the  Landsat-based  estimate  of  small 
grains.  In  most  cases,  the  current-year  prevalence  of 
wheat  was  not  the  same  as  the  given  historic  year.  In 
the  United  States,  the  current-year  wheat  ratio 
averaged  about  10  percent  higher  than  the  historic 
ratio  used.  Thus,  the  use  of  the  historic  ratios  con- 
tributed to  the  underestimation  problem. 

A second  issue  concerning  ratios  is  that  the 
historic  ratio  that  was  used  to  derive  the  wheat  esti- 
mate from  the  CAMS  small  grains  estimate  needed 
to  include  as  “small  grains"  all  crops  that  CAMS  in- 
cluded in  their  Landsat-based  estimate  of  small 
grains.  It  was  not  readily  apparent  from  the  Landsat 
imagery  what  crops  were  included.  Also,  for  the 
ratios  to  be  more  effective,  it  was  necessary  for  the 
CAMS  analyst  to  conscientiously  identify  all  “small 
grains”  on  the  imagery. 

Improved  yield  models. — While  the  yield  models 
had  performed  well  in  several  regions,  they  tended  to 
underestimate  or  overestimate  yields  in  regions  en- 
countering extreme  weather  conditions.  While  the 
extreme  weather  conditions  had  been  somewhat 
local  within  the  LACIE  regions,  the  models  were  not 
expected  to  perform  well  in  a year  for  which  a coun- 


try was  subjected  to  extreme  conditions  over  a ma- 
jority of  its  regions. 

A second  problem  with  the  yield  models  resulted 
from  the  overlapping  of  the  models  in  some  areas. 
For  example,  during  the  development  of  the  models, 
the  Nebraska  Panhandle  area  was  used  for  both  the 
“Badlands"  model  and  the  Nebraska  model.  This 
overlapping,  plus  the  fact  that  the  models  were 
developed  at  the  state  level  and  used  at  the  CRD 
level,  introduced  significant  correlation  problems 
with  the  statistical  descriptors  of  the  yield  estimate. 

Sampling  mixed  spring  and  winter  wheat  areas.— In 
LACIE  Phases  I and  11,  segments  in  areas  containing 
both  spring  and  winter  wheat  were  designated  winter 
or  spring  in  proportion  to  the  historic  percentage  of 
winter  or  spring  wheat  grown  in  the  county.  Once 
these  segments  were  so  designated,  each  segment 
was  analyzed  only  for  spring  or  only  for  winter  small 
grains  acreage  and  data  were  collected  only  during 
the  growing  season  appropriate  to  either  the  winter 
or  the  spring  grain  crop  calendar,  but  not  both. 
However,  a mixed  area  by  definition  has  a prob- 
ability of  both  winter  wheat  and  spring  wheat  being 
grown  in  a sample  segment  area.  Thus,  it  appeared 
logical  to  collect  Landsat  data  for  both  winter  and 
spring  wheat  in  all  sample  segments  allocated  to 
mixed  wheat  areas  throughout  the  complete  growing 
season.  This  was,  in  fact,  implemented  for  Phase  III. 

Sampling  strategy.— The  initial  sample  segment 
allocation  was  based  on  1969  Census  of  Agriculture 
data.  However,  significant  shifts  in  production  pat- 
terns for  wheat  since  1969  resulted  in  an  apparent 
undersampling  in  Minnesota.  In  addition,  analysis  of 
Landsat  imagery  indicated  a need  for  further  delinea- 
tion of  agricultural/nonagricuitural  areas. 


PHASE  III 
Scop* 

The  scope  of  Phase  III  in  the  U.S.  Great  Plains 
region  was  similar  to  that  of  Phase  II— to  estimate 
area,  yield,  and  production  components.  The  plan 
called  for  the  generation  of  the  estimates  for  the 
seven  winter  wheat  states  from  February  through 
October,  the  four-state  spring  wheat  estimates  from 
July  through  October,  and  the  nine-state  total  wheat 
estimates  from  July  through  October.  The  July 
spring  wheat  and  total  wheat  estimates  were  not 
released  in  real  time  but  were  released  with  the  an- 
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nual  report  in  December.  The  Phew  III  scope  was 
farther  expanded  to  indude  parallel  evaluations  of 
tin  second-generation  acreage  sampling  and  yield 
estimation  technology  over  portions  of  the  yardstick 
region. 

A reallocation  was  performed  in  the  U.S.  Great 
Plains  for  Phase  III  (to  be  discussed  later).  This 
modified  allocation  was  not  completed  prior  to  the 
Phase  III  data  order  submission  in  August  1976  for 
the  1977  crop  year.  Therefore,  the  Phase  II  sample 
segments  were  ordered  at  that  time.  The  initial 
LACIE  Phase  HI  crop  report  in  February  1977  was 
based  on  these  Phase  II  sample  segments  acquired 
through  December  1976.  The  Phase  111  sample  loca* 
dons  were  completed  and  ordered  retrospectively  on 
January  31,  1977.  The  new  segments  acquired 
through  December  1976  were  processed,  and  a 601- 
segment  “February”  report,  replacing  the  earlier  431- 
segment  report,  was  generated  on  April  6, 1977. 


Sampling 

The  first-generation  sampling  strategy  utilized  in 
Phases  I and  1!  was  designed  to  achieve  a 2-percent 
sampling  error  at  the  U.S.  country  level.  This  sam- 
pling strategy  was  modified  in  Phase  III  to  achieve  a 
5-percent  coefficient  of  variation  (CV)  of  the  pro- 
duction estimate  for  the  U.S.  Great  Plains  region. 
This  CV  would  permit  the  90/90  criterion  to  be  met 
with  a reasonable  degree  of  bias  in  the  production 
estimate. 

The  sampling  strategy  consisted  of  a two-stage 
stratified  probability  sample  in  which  substrata 
(counties)  were  the  primary  sampling  units  and  the 
5-  by  6-nautical-miie  segments  were  the  secondary 
units.  Sample  segments  were  allocated  to  the  coun- 
ties on  the  basis  of  weights  which  were  a function  of 
(1)  the  agricultural  area  in  the  county,  (2)  the  within- 
county  standard  deviation  of  small  grains  area  from 
segment  to  segment,  (3)  the  classification  error 
variance,  (4)  the  county  yield,  and  (5)  the  county 
yield  prediction  error.  Depending  on  these  weights, 
the  counties  were  designated  as  Group  I (high  sam- 
pling rate).  Group  II  (low  sampling  rate),  and  Group 
III  (not  sampled).  (See  Appendix  B,  LACIE  C00200, 
Vol.  IV,  Rev.  C,  CAS  Requirements  Document,  for 
more  details.) 

The  segment-to-segment  small  grains  standard 
deviation  was  determined  from  small  grains  iden- 
tified in  the  Landsat  imagery  used  during  Phase  II. 
The  assumption  was  made  that  the  distribution  of 


wheat  was  proportional  to  the  distribution  of  small 
grains.  Full-frame  Landsat  data  were  used  to  deter- 
mine the  agricultural  areas. 

The  reallocation  resulted  in  an  increase  in  sample 
segments  firom  431  to  601 . In  Phase  III,  data  were  col- 
lected in  the  mixed  wheat  areas  for  die  total  wheat 
growing  season  essentially  the  entire  year.  Thus, 
the  assumption  was  that  a mixed  wheat  area  has  a 
probability  of  both  winter  and  spring  wheat  being 
grown  in  a sample  segment.  Table  V shows  the  initial 
Phase  111  allocation  of  sample  segments  by  state  and 
by  crop  type. 

As  will  be  discussed  later,  problems  occurred  dur- 
ing Phase  III  operations  which  caused  the  allocation 
to  be  modified  to  reflect  wheat  rather  than  small 


Table  V. — Number  qf  Sample  Segments  Allocated 
by  Slate,  by  Crop  Type,  and  by  Group; 

U.S.  Great  Plains,  Phase  III 


Stale 

Segments  allocated 

Group  1 

Group  II 

Total 

Ending 
Phase  III 

Winter  reheat 

Colorado 

28 

4 

32 

31 

Kansas 

103 

18 

121 

121 

Nebraska 

40 

27 

67 

$6 

Oklahoma 

42 

4 

46 

46 

Texas 

II 

27 

38 

35 

S-slate  toial 

224 

80 

304 

289 

Montana 

79 

1 

80 

58 

South  Dakota 

37 

19 

S6 

21 

2-state  total 

116 

20 

136 

79 

7-state  total 

340 

100 

440 

368 

Sprint!  h heat 

Minnesota 

46 

12 

$8 

47 

Montana 

79 

1 

80 

48 

North  Dakota 

103 

0 

103 

103 

South  Dakota 

37 

19 

$6 

37 

4-state  total 

26$ 

32 

297 

.'35 

Total  hIhvi 

9-state  total 

489 

112 

601 

557 
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grains  and  the  segments  in  the  mixed  areas  of  Mon- 
tana and  South  Dakota  to  be  redesignated  as  spring 
only,  winter  only,  mixed  spring  and  winter,  or  not  to 
be  used.  The  last  column  of  table  V reflects  these 
changes  during  Phase  III.  Thus,  at  the  close  of  Phase 
111,  the  allocation  was  modified  to  557  sample  seg- 
ments for  the  nine  states. 


Landeat  Data 

In  Phase  III,  all  acquisitions  for  sample  segments 
in  the  U.S.  Great  Plains  were  analyzed  by  CAMS. 
Over  5300  acquisitions  were  acquired  for  the  601  seg- 
ments allocated.  Over  a third  of  these  acquisitions 
were  usable  in  CAS  for  aggregation  purposes. 

In  Phase  III,  a thresholding  procedure  was 
developed  to  eliminate,  from  consideration  in  the 
acreage  estimation  procedure,  estimates  from  seg- 


ments suspected  of  being  incompletely  emerged. 
This  procedure  consisted  of  monitoring  the  rates  of 
change  of  segment  wheat  percentages  between 
classification  dates  for  a given  segment.  At  the 
average  date  when  the  rate  of  change  was  small,  the 
crop  growth  stage  was  computed,  and  all  segment 
wheat  percentage  estimates  based  on  Landsat  ac- 
quisitions before  that  growth  stage  were  deleted  flom 
the  acreage  estimation  procedure.  This  procedure 
was  shown  to  decrease  the  magnitude  of  the  acreage 
underestimate.  The  underlying  assumption  was  that 
it  was  more  accurate  to  estimate  an  area  using  a 
Group  III  ratio  than  to  use  incomplete  data. 

Also,  during  Phase  III,  a procedure  was  developed 
to  “screen"  the  segment  wheat  estimates.  This 
screening  procedure  was  aimed  at  identifying  the 
outlier  segments  and  eliminating  these  segments 
from  the  area  aggregation  procedure. 

Table  VI  shows  the  number  of  segments  used  by 


Table  VI. — Number  of  Sample  Segments  Used  by  State  for  Each  Monthly  Report; 
U.S.  Great  Plains,  Phase  III0 


Slate 

Feb. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

End  of 

CMR 

CMR 

CMR 

CMR 

CMR 

CMR 

CMR 

season 

Winter  wheat 


Colorado 

2S 

22 

22 

21 

26 

25 

24 

24 

Kansas 

82 

98 

104 

96 

103 

107 

108 

106 

Nebraska 

41 

38 

40 

29 

31 

40 

39 

39 

Oklahoma 

35 

39 

40 

35 

37 

38 

41 

42 

Texas 

25 

30 

30 

24 

28 

28 

29 

29 

5-siate  tot* 

208 

227 

236 

205 

225 

238 

241 

240 

Montana 

30 

28 

29 

27 

39 

39 

43 

43 

South  Dakota 

6 

3 

7 

9 

12 

13 

14 

15 

2-state  total 

36 

31 

36 

36 

51 

52 

57 

58 

7-state  total 

244 

258 

272 

241 

276 

290 

298 

298 

Spring  ttheol 

Minnesota 

22 

30 

33 

37 

38 

Montana 

5 

23 

30 

32 

32 

North  Dakota 

13 

39 

62 

70 

73 

South  Dakota 

5 

24 

26 

32 

35 

4-state  total 

45 

116 

151 

172 

178 

*May  and  June  estimates  ate  screened  estimate*  only . remaining  estimates  are  screened  and  thresholded  estimates 
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state  and  by  crop  type  for  each  report  during  Phan 
111,  as  revised  in  CAS  Annual  Report  OS,  December 
22,  1977.  For  the  Southern  Plains  states,  of  the  sag- 
ments  allocated,  the  number  of  segments  used 
ranged  from  71  percent  for  the  July  report  to  over  83 
percent  for  the  October  report.  For  the  seven  winter 
wheat  states,  the  number  of  segments  used,  as  a per* 
cent  of  segments  allocated,  ranged  from  66  percent  in 
the  July  report  to  81  percent  in  October.  FOr  the  four 
spring  wheat  states,  the  percentage  of  segments  alio* 
cated  that  were  used  ranged  from  19  in  July  to  over 
7$  by  the  end  of  season.  CAS  Annual  Report  OS  con- 
tains rabies  showing  the  average  percent  wheat  used 
in  each  report,  the  distribution  of  segments  used  by 
percent  wheat  classified,  the  number  of  usable  seg- 
ments available,  the  number  «f  usable  segments 
available  but  not  used,  the  average  acquisition  date 
and  segment  distribution  by  month  of  acquisition, 
the  average  number  of  elapsed  days  and  the  segment 
distribution  by  number  of  elapsed  days,  and  the  seg- 
ment distribution  by  biostage. 

All  classifications  were  small  grains  classification  j 
from  which  the  wheat  was  estimated  using  wheat  to 
small  grains  ratios.  For  the  five  states,  the  ratios  were 
based  on  the  latest  historical  data  available  at  the 
county  level.  For  the  four  Northern  Plains  states,  the 
ratios  were  based  on  econometric  modeling  to  esti- 
mate the  current-year  ratios  of  wheat  to  small  grains. 


VMM  Date 

Yield  estimates  were  received  in  CAS  on  the 
fourth  working  day  of  the  month.  Hie  yield  models 
were  reworked  to  eliminate  the  area  overlap  that  had 
caused  problems  with  statistics  during  Phase  II.  In 
addition,  the  results  of  the  yield  model  when  run  at 
the  state  level  were  applied  to  all  CRD's  within  the 
model  boundaries.  This  was  a change  from  Phase  II 
where  the  state  model  was  run  using  CRD-level  in- 
puts to  derive  individual  CRD-level  yield  estimates. 
The  Phase  111  yield  models  are  documented  in 
LACIE  00431,  Rev.  A,  June  1977.  The  models  were 
developed  from  historical  data  as  described  pre- 
viously for  Phase  II. 


Eatimatos  of  Area,  Yield,  and  Production 

The  discussion  of  the  Phase  III  results  refers  to 
the  estimates  as  revised  in  CAS  Annual  Report  OS, 
December  22,  1977.  These  estimates  were  revised 


using  final  Phase  111  CAS  technology  to  provide  a set 
of  consistent  estimates,  Known  data  errors  were  also 
corrected.  Appendix  B contains  the  Phase  til  esti- 
mates of  area,  yield,  and  production  by  state,  by  crop 
type,  and  by  report  Tables  B-L  B-Il,  and  B-I1I  con- 
tain the  revised  area,  production,  and  yield  estimates, 
respectively,  and  tables  B-IV,  B-V,  and  B-VI  contain 
the  coefficients  of  variation  for  the  revised  estimates. 
Tables  B-V1I,  B-VII1,  and  B-1X  contain  the  estimates 
reported  throughout  Phase  111. 

U.S.  Southern  Great  Plains  winter  wheat. — Figure  6 
shows  the  revised  monthly  LACIE  estimates  of  area, 
production,  and  yield  for  the  five  states  compared  to 
the  corresponding  monthly  estimates  of  the  USDA. 
The  area  estimates  started  at  18.$  million  acres  in 
February.  The  estimates  for  Kansas,  Oklahoma,  and 
Texas  were  extremely  low  in  comparison  to  the 
historical  data  (table  B-I).  In  May.  the  estimates  in- 
creased to  26.3  million  acres.  Area  estimates  in- 
creased in  all  five  states  due  mainly  to  increased  per- 
centage wheat  estimates.  Hie  Colorado  estimate  was 
too  high,  while  the  Kansas,  Oklahoma,  and  Texas 
estimates  remained  lower  than  historical  data  indi- 
cated was  average.  By  July,  the  estimate  had  reached 
a peak  of  30.8  million  acres.  The  average  per  segment 
percentage  wheat  also  peaked  in  July  at  24.9  percent 
The  Colorado,  Nebraska,  and  Texas  estimates  were 
higher  than  their  historical  averages,  Kansas  was 
slightly  high,  and  Oklahoma  remained  slightly  low. 
The  find  LACIE  estimate  was  29.S  million  acres. 
The  Colorado  and  Nebraska  estimates  remained  high 
and  the  Oklahoma  estimate  low.  The  average  per  seg- 
ment percentage  wheat  was  24.3  percent  The  final 
USDA  area  estimate  was  28.8  million  acres  (a  2.5- 
percent  relative  difference). 

The  initial  production  estimate  for  the  five  states 
was  472.9  million  bushels  (table  B-H).  In  May,  the 
estimate  was  662.1  million  bushels.  The  increase  was 
due  to  the  area  estimate  increases  since  the  average 
yield  declined  from  February  to  May.  The  produc- 
tion estimate  peak  occurred  in  July  at  785.1  million 
bushels.  Again,  area  estimate  increases  were  pri- 
marily responsible  for  the  production  increase, 
although  some  individual  state  yields  did  increase. 
The  final  LACIE  estimate  was  752.0  million  bushels, 
with  the  decline  attributed  to  the  decreased  area  esti- 
mate. The  final  USDA  estimate  was  7971  million 
bushels  (a  6.0-percent  relative  difference). 

The  derived  yield  estimate  was  25.5  bushels  per 
acre  in  February's  report  (table  B-Ill).  The  estimate 
dropped  to  25.1  bushels  per  acre  during  May  and 
June  and  increased  to  25.5  bushels  per  acre  in  July. 
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The  Anal  USDA  estimate  was  27.7  bushels  per  acre 
(a  - 8,6-percent  relative  difference). 

US.  Gnat  Plains  winter  wheat.— The  first  esti- 
mates for  the  seven  winter  wheat  states  were  made  in 
February  (table  B-I).  Figure  7 shows  the  revised 
LACIE  area,  production,  and  yield  estimates  with 
comparisons  to  the  USDA  estimates.  The  initial  area 
estimate  was  21.45  million  acres.  All  state  estimates 
exxipt  for  those  from  Nebraska  and  South  Dakota 
were  underestimated,  with  the  Kansas,  Oklahoma, 
and  Texas  estimates  being  the  largest  underesti- 
mates. The  area  estimate  increased  to  30.8  million 
acres  in  May.  The  Colorado,  South  Dakota, 
Nebraska,  and  Montana  estimates  were  over  their 
respective  historical  averages.  The  Kansas, 
Oklahoma,  and  Texas  estimates  remained  lower  than 
normal.  The  area  estimate  reached  its  peak  in  July  at 
35.4  million  acres.  The  area  estimates  for  South 
Dakota,  Nebraska,  Colorado,  Kansas,  and  Texas 


were  all  higher  than  normal,  with  the  South  Dakota 
estimate  three  times  the  actual  USDA  estimate.  The 
final  LACIE  estimate  was  33.8  million  acres  com- 
pared to  the  final  USDA  estimate  of  32J  million 
acres  (a  4,6-percent  relative  difference). 

The  initial  seven-state  production  estimate  was 
551 .5  million  bushels  (table  B-II).  This  estimate  in- 
creased to  787,1  in  May  due  primarily  to  the  ares  esti- 
mate increase.  The  production  estimate  reached  a 
peak  in  July,  again  due  primarily  to  the  area  estimate 
changes.  The  final  LACIE  estimate  was  865.9 
million  bushels;  the  decrease  was  due  to  area  esti- 
mate changes.  The  final  USDA  production  estimate 
was  895.4  million  bushels  (a  - 3.4-percent  relative 
difference). 

Tire  initial  yield  estimate  for  the  seven  states  was 
25.7  bushels  per  acre  (table  B-IH).  The  estimate 
declined  to  25.5  t'.-hels  per  acre  in  May  and  Jute. 
The  final  LACIE  estimate  was  25.6  bushels  pr  -ere. 
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compared  to  27.7  bushels  per  sere  for  the  final  million  acres  due  to  decreases  in  Minnesota  and 

USDA  estimate  (a  relative  difference  of  -8.5  South  Dakota.  The  final  USDA  estimate  was  16.97 

percent).  million  aims  (a  -8. 5 -percent  relative  difference). 

U.S.  Great  Plains  spring  wheat.— The  first  spring  The  initial  spring  wheat  production  estimate  was 
wheat  estimates  of  area,  production,  and  yield  were  363.7  million  bushels  (table  B-ll).  This  estimate  was 
made  in  July  (fig.  8).  The  initial  area  estimate  was  low  due  to  both  area  and  yield  underestimation.  The 

14.7  million  acres  (table  B-l).  The  area  estimates  for  August  estimate  was  highest  at  374.5  million  bushels. 

Minnesota  and  South  Dakota  were  below  their  actual  This  increase  from  July  was  due  to  area  estimate  in- 
levels. The  North  Dakota  estimate  was  within  1 pm-  cremes  since  the  yield  estimates  declined.  The  final 

cent  of  the  actual  harvested  area.  The  August  esti-  LACIE  estimate  was  366.4  million  bushels.  Again, 
mate  increased  to  16.0  million  acres  due  primarily  to  the  change  was  due  to  the  area  estimates.  The  final 

the  area  estimate  increase  in  South  Dakota.  The  Min-  USDA  estimate  was  460.6  million  bushels  (a  -25.7- 

nesota  estimate  remained  lower  than  the  USDA  esti-  percent  relative  difference), 
mate.  The  final  LACIE  estimate  dropped  to  1S.64  The  derived  yield  for  the  four  states  was  24.8 


60 


40 

AREA.  MILLIONS  OF  ACRES 

20 


0 

40 

30 


YIELD,  BUSHELS/ACRE  20 

10 

0 

1250 

1000 

750 

PRODUCTION. 

MILLIONS  OF  BUSHELS  gOO 
250 
0 


FIGURE  7.— Month!)  eootparlaon  of  LACIE  and  USDA  rsilmiln;  Phase  ill;  winter  wheal.  ae*en  atetea  (Colorado,  Kansr,,  Nebraska, 
Oklahoma.  Teaaa.  Montana,  and  South  Dakota). 


•REVISED 


456 


bushels  per  acre  in  July  (table  B-1II).  The  estimate 
was  23.4  bushels  per  acre  in  August,  23.6  bushels  per 
acre  in  September,  and  23.4  bushels  per  acre  in  Octo- 
ber and  for  the  final  estimate.  The  final  USDA  yield 
estimate  for  the  four  states  was  27.1  bushels  per  acre 
(a  -15.8-percent  relative  difference). 

US.  Great  Plains  total  wheat.— The  initial  area 
estimate  for  the  total  wheat  in  the  nine  states  was 
50.0  million  acres  (fig.  9 and  table  B-i).  The  winter 
wheat  portion  was  overestimated  by  3.0  million  acres 
(8.7  percent)  while  the  spring  wheat  was  underesti- 
mated by  2.3  million  acres  (-15.8  percent).  The 
August  estimate  was  50.9  million  acres,  resulting 
from  the  winter  wheat  overestimate  of  2.6  million 


acres  (7.S  percent)  and  the  spring  wheat  underesti- 
mate of  0.9  million  acres  (-59  percent).  The  final 
L AC1E  estimate  was  49.46  million  acres  and  resulted 
from  the  winter  wheat  overestimate  of  1.5  million 
acres  (4.6  percent)  and  the  spring  wheat  underesti- 
mate of  1.3  million  acres  (-8.5  percent).  The  final 
USDA  estimate  was  49.2  million  acres  (a  0.4-percent 
relative  difference). 

The  July  production  estimate  for  all  wheat  was 
1270.0  million  bushels,  6.8  percent  below  the  final 
USDA  estimate  of  1356.0  million  bushel!  (table  B- 
11).  The  winter  wheat  production  estimate  was  10.9 
million  bushels  (1.2  percent)  above  the  final  USDA 
winter  wheat  estimate,  and  the  spring  wheat  estimate 
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was  96.6  million  bushels  (26.6  percent)  below  the 
USDA  final  spring  wheat  estimate.  The  August  pro- 
duction estimate  was  1269.0  million  bushels.  6.9  per* 
cent  below  the  final  USDA  estimate.  The  August 
winter  wheat  production  estimate  was  1.0  million 
bushels  (0.1  percent)  below  the  USDA  estimate,  a>td 
the  spring  wheat  estimate  was  8S.7  million  bushels 
(22.9  percent)  below  the  USDA  estimate.  The  final 
LACIE  production  estimate  was  1232.3  million 
bushels,  10.0  percent  below  the  final  USDA  estimate. 
The  LACIE  winter  wheat  estimate  was  29.S  million 
bushels  (3.4  percent)  below  the  final  USDA  esti- 
mate, and  the  spring  wheat  estimate  was  94.3  million 
bushels  (25.7  percent)  below  the  fins!  USDA  esti- 
mate. 


The  nine-state  yield  estimate  was  25.4  bushels  per 
acre  in  July  and  24.9  bushels  per  acre  in  all  other 
months  (table  8*111),  The  final  USDA  estimate  was 
27.5  bushels  per  acre  (a  relative  difference  of  - 10.4 
percent). 


Aecumcy  of  tho  Bstfcnotus 

In  Phase  111,  LACIE  estimates  were  made  for 
area,  yield,  and  production.  The  final  USDA  total 
wheat  production  eatimste  for  the  USGP  at  1.36 
billion  bushels  was  10.0  percent  above  the  final 
LACIE  estimate  of  1 .23  billion  bushels.  This  retail ve 
difference  of  - 10.0  percent  indicates  the  presence  of 
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a negative  bias  in  the  LAC1E  production  estimator. 
The  coefficient  of  variation  of  the  LAC1E  produc- 
tion estimator  was  calculated  to  be  4.8  percent.  This 
would  allow  a relative  bias  between  -4.2  percent  and 
3.4  percent  and  still  satisfy  the  90/90  criterion.  If  the 
true  relative  bias  of  the  L ACIE  production  estimator 
is  -4.2  percent,  then  the  probability  of  observing  a 
relative  difference  less  than  or  equal  to  - 10.0  per- 
cent is  only  about  1 1 percent.  Thus,  it  is  concluded 
that  there  is  probably  a bias  in  the  LAC1E  estimator 
large  enough  to  cause  more  than  1 of  10  estimates  to 
fall  outside  the  ± 10-percent  accuracy  bounds  re- 
quired by  the  90/90  criterion.  However,  even  with  a 
relative  bias  as  large  as  — 10  percent,  the  variability 
of  4.8  percent  is  small  enough  to  produce  estimates 
within  ±15  percent  in  9 of  10  years;  i.e.,  a 90/85 
estimator.  Thus,  it  would  appear  from  these  analyses 
that,  while  the  LACIE  estimator  of  USGP  total 
wheat  production  did  not  satisfy  the  90/90  criterion, 
it  only  marginally  missed  doing  so. 

For  winter  wheat  production  in  the  USGP,  the 
LACIE  estimate  was  not  significantly  different  from 
the  final  USDA  estimate.  The  LACIE  winter  wheat 
acreage  estimate  was  in  close  agreement  with  USDA 
figures;  however,  for  the  first  time  in  LACIE,  the 
yield  model  predictions  were  consistently  below  the 
USDA  estimates.  The  biggest  contributors  to  the 
yield  underestimate  for  both  the  five-  and  the  seven- 
state  regions  were  the  Oklahoma  and  Texas  yield 
models.  Investigations  indicated  two  primary  factors 
contributing  to  these  underestimates. 

In  both  states,  the  technology  trend  term  was 
selected  such  that  no  average  increase  in  yield  oc- 
curred due  to  technology  since  1960.  On  the  contrary, 
ancillary  data  show  that  an  irrigated  winter  wheat 
area  in  Texas  now  produces  almost  25  percent  of  the 
total  winter  wheat  acreage  for  Texas.  Nearly  all  of 
this  additional  irrigated  acreage  has  been  introduced 
since  1960.  The  weather  terms  in  the  Texas  model 
did  not  alter  the  yield  estimate  significantly  from 
trend.  Therefore,  it  is  likely  that  the  constant  trend 
since  1960  is  a major  contributor  to  the  underesti- 
mate for  Texas  winter  wheat  yield. 

In  Oklahoma,  both  the  weather  terms  and  the  con- 
stant trend  term  were  factors  in  the  underestimate. 
The  model  underestimate  in  Oklahoma  resulted 
mainly  from  below-normal  precipitation  between 
August  and  February  (over  the  winter  period),  a 
March  precipitation  deficit  relative  to  potential 
evapotranspiration,  and  an  above-average  May  pre- 
cipitation. The  weather  factors  which  most  likely 
contributed  to  the  improved  Oklahoma  yields  and 


which  were  overlooked  by  the  LACIE  yield  model 
were  the  above-normal  April  temperatures  and  pre- 
cipitation and  the  temporal  distribution  of  the  May 
precipitation  in  Oklahoma. 

The  April  temperatures  were  about  5°  F above 
normal  (upper  60's)  in  Oklahoma,  which  would 
make  them  nearly  ideal  for  wheat.  Three  inches  or 
more  of  well-distributed  precipitation  occurred  in 
April  and  4 inches  fell  in  May.  Good  April  rainfall 
amounts  following  moisture-deficit  periods,  such  as 
those  which  occurred  during  the  previous  winter 
months  and  even  the  previous  season,  typically  give 
an  extra  stimulus  to  yield  by  encouraging  more  ex- 
tensive crop  rooting.  This  results  in  improved  utiliza- 
tion of  nutrients  when  moisture  becomes  available. 

The  monthly  averaging  of  precipitation  in  the 
Oklahoma  model  also  created  an  unrealistic  response 
to  the  rather  well-distributed  May  rainfall,  which 
nearly  doubled  the  average  May  precipitation.  Since 
wheat  in  Oklahoma  is  harvested  at  the  end  of  May 
and  the  first  of  June,  large  amounts  of  rainfall  near 
the  end  of  May  tend  to  reduce  yields.  However,  a ma- 
jority of  the  1977  May  precipitation  came  in  mid- 
May,  with  lesser  amounts  in  late  May.  The  mid-May 
precipitation  came  during  the  heading-to-ripening 
period  for  Oklahoma  winter  wheat  and  thus  con- 
tributed to  increased  yields,  as  opposed  to  the 
decrease  predicted  by  the  LACIE  models. 

For  spring  wheat  production  in  the  USGP,  the 
LACIE  estimate  was  significantly  smaller  than  the 
USDA  estimates  and  the  90/90  accuracy  goal  was  not 
supported.  The  underestimation  of  spring  wheat  pro- 
duction was  primarily  due  to  yield  underestimation, 
although  area  was  significantly  underestimated  also. 
However,  significant  improvement  was  realized  in 
the  spring  wheat  area  estimate  (as  compared  to 
USDA)  over  Phase  I and  II  results. 

The  LACIE  spring  wheat  yield  estimates  were  sig- 
nificantly smaller  than  the  corresponding  USDA 
estimates  throughout  Phase  III,  primarily  due  to  un- 
derestimates of  Minnesota  and  Montana  spring 
wheat  yields.  These  errors  were  due,  in  part,  to  trend 
terms  which  failed  to  account  for  new  varieties  of 
wheat  in  Minnesota  and  for  increased  fertilizer  usage 
in  Montana  during  the  past  5 years. 


Technical  leeues 

At  the  completion  of  Phase  HI  processing,  the 
following  technical  issues  were  apparent. 

Wheatlsmall  grains  separation. — As  has  been  a 
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problem  throughout  L AC1E,  CAMS  analysis  has  not 
reliably,  nor  consistently,  been  able  to  differentiate 
between  wheat  and  small  grains.  However,  for  the 
first  time  during  LAC1E,  a technique  for  separating 
wheat  and  small  grains  was  tested  in  North  Dakota. 
The  results  appear  encouraging  thus  far,  pending 
completion  of  a more  thorough  evaluation. 

Improved  yield  models.— The  first-generation  yield 
model  estimates  were  noticeably  below  the  USDA 
estimates  of  yield.  Although  the  10-year  tests  and  the 
3 years  of  LAC1E  operations  indicate  that  the  yield 
models  performed  adequately  (in  terms  of  satisfying 
the  90/90  criterion),  investigations  of  model  perfor- 
mance at  the  subregional  levels  have  indicated  that 
the  models  could  and  should  be  improved.  In  a year 
with  extended  episodic  conditions,  the  yield  models 
are  not  adequately  responsive  to  extreme  climatic 
conditions,  and,  during  such  years,  considerable 
yield  estimation  error  can  result. 

Sampling  in  mixed  wheat  regions. — During  Phases  I 
and  II,  segments  were  allocated  on  the  basis  of  total 
wheat  statistics,  and  areas  containing  both  spring  and 
winter  wheat  were  designated  either  winter  or  spring 
depending  on  the  historical  predominance  of  winter 
or  spring  wheat  in  the  county.  Once  designated,  each 
segment  was  analyzed  for  one  crop  type  only,  and 
data  were  collected  only  during  the  appropriate  grow- 
ing season.  This  strategy  created  problems  for  those 
segments  which  contained  significant  amounts  of 
both  winter  and  spring  wheat. 

In  Phase  111,  Landsat  data  were  collected  in  the 
mixed  wheat  areas  for  the  total  growing  season  of 
both  winter  and  spring  wheat.  This  collection 
scheme  acquired  satellite  data  to  estimate  both  spring 
and  winter  wheat  grown  in  all  segments. 

However,  in  Phase  III  a problem  occurred  in 
South  Dakota,  a mixed  wheat  state.  The  real-time 
LAC1E  estimates  for  South  Dakota  winter  wheat 
were  significantly  larger  than  one  would  expect  on 
the  basis  of  historical  data.  An  investigation  dis- 
closed the  fact  that  many  South  Dakota  segments 
contained  almost  no  winter  wheat.  In  low-density 
segments,  the  CAMS  errors  tend  to  be  overestimates. 
While  the  absolute  overestimate  was  not  large,  the 
relative  overestimate  was  quite  large.  This  relative 
overestimate  resulted  in  the  large  South  Dakota  over- 
estimate. 

The  problem  was  not  caused  by  the  mixed  wheat 
sampling  plan  but  resulted  from  (1)  the  sampling 
strategy  based  on  total  small  grains  which  resulted  in 


sample  segments  being  placed  in  low  wheat  den- 
sity/high small  grains  density  areas,  (2)  the  CAMS 
procedures  indicating  that  both  winter  and  spring 
wheat  must  be  found  in  all  segments,  and  (3)  the 
CAMS  inability  to  accurately  identify  wheat  in  low 
wheat  density  regions. 

Ratio  modeling.— The  CAMS  inability  to  identify 
wheat  resulted  in  the  use  of  ratios  of  wheat  to  small 
grains  to  estimate  wheat  from  satellite-based  small 
grains  estimates.  During  Phase  II,  the  ratios  were 
based  on  data  for  the  most  recent  year  for  which 
statistics  were  available  at  the  county  level.  However, 
the  recent  predominance  of  wheat  was  probably  sig- 
nificantly different  from  the  current  year.  This 
resulted  in  the  introduction  of  another  error  into  the 
estimation  process.  At  the  end  of  Phase  II,  an  effort 
was  initiated  to  econometrically  model  the  current- 
year  ratios  of  wheat  to  small  grains  for  the  four 
Northern  Plains  states.  The  accuracy  of  the  models 
in  predicting  current-year  ratios  is  unknown  at  pres- 
ent, but  one  point  is  certain — ratio  modeling  is  no 
substitute  for  the  ability  to  identify  crops. 

Sampling  based  on  total  small  grains. — The  Phase 
III  sampling  strategy  was  based  on  the  incidence  of 
total  small  grains.  The  assumption  was  that  the  dis- 
tribution of  wheat  was  proportional  to  the  distribu- 
tion of  small  grains.  This  assumption  was  in  error  in 
large  areas  of  the  U.S.  Great  Plains  and  resulted  in 
the  placement  of  segments  in  areas  of  low  wheat  den- 
sity, high  small  grains  density.  The  low  wheat  den- 
sity segments  then  resulted  in  area  estimation  prob- 
lems as  discussed  previously. 

Data  editing  techniques. — During  Phase  III,  two 
data  editing  procedures  were  applied  in  the  U.S. 
Great  Plains.  Thresholding  was  intended  to  elimi- 
nate segment  data  that  were  suspected  of  being 
biased  low  because  of  an  early  acquisition  date.  (An 
early  acquisition  date  was  synonymous  with  low 
ground  cover  and  therefore  marginal  signatures  on 
the  imagery.)  The  screening  procedure  was  applied 
to  determine  whether  the  CAMS  classification  result 
was  significantly  different  from  the  estimates  of  seg- 
ments in  the  same  group  stratified  by  historical  data. 

Neither  procedure  was  subject  to  rigorous  evalua- 
tion, and  both  should  have  further  testing  and 
verification  before  being  used  again.  In  any  event, 
such  procedures  should  be  considered  as  data 
analysis  procedures  to  flag  anomalies,  which  should 
then  be  sent  back  to  the  CAMS  for  further  investiga- 
tion. 
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RECOMMENDATIONS 


At  the  conclusion  of  LACIE  we  should  ask  our- 
selves one  question:  can  we  separate  wheat,  consis- 
tently and  accurately,  from  other  crops  without  the 
use  of  ancillary  data  or  ratioing  techniques?  The 
answer  is  NO.  As  we  proceed  to  other  crops,  it 
should  be  obvious  that  the  most  important  goal  must 
be  f*uc'  crop  separation/identification. 


A second  priority  item  should  be  the  development 
of  more  accurate  yield  models.  These  models  must 
be  able  to  estimate  yields  accurately  through  ex- 
tended episodic  events.  The  use  of  Landsat  data  in 
the  models  is  a worthy  goal. 

Finally,  as  LACIE  moves  on  to  other  crops,  it 
becomes  more  important  to  develop  a sampling 
strategy  that  is  both  accurate  for  the  particular  crops 
of  interest  and  efficient  in  terms  of  multiple-crop 
uses  of  each  segment. 
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ADoendix  A 

LACIE  Phase  II  Estimates 


Table  A -l.—LACIE  Area  Estimates  for  Phase  II,  by  Crop  Type,  by  State,  and  by  Monthly  Report, 
as  Revised  in  CAS  Annual  Report  03,  December  15, 1976,  U.S.  Great  Plains 

[Thousands  of  acres  I 


State 

CAS  monthly  report 

Feb. 

Mar. 

Apr. 

May 

JtmeS 

June  29 

July 

Aug. 

Sept. 

Oct. 

Endef 

season 

Winter  wheat 

Colorado 

3539 

2768 

2 768 

2 807 

2 995 

2 960 

2867 

2830 

2704 

2704 

2704 

Kansas 

8013 

8536 

8 536 

9 392 

10  535 

10744 

10795 

10932 

10989 

10989 

11125 

Nebraska 

4500 

3632 

3583 

3653 

4104 

4178 

4133 

4086 

3 399 

3 399 

3 399 

Oklahoma 

3499 

3450 

3450 

3897 

4 148 

4182 

4025 

4305 

4261 

4261 

4 261 

Texas 

3170 

3 72S 

3479 

4810 

4 556 

4619 

4 314 

4310 

4344 

4344 

4344 

S-state  total® 

22  721 

22111 

21  816 

24559 

26338 

26  683 

26134 

26463 

25  697 

25697 

25833 

Montana 

488 

885 

1044 

1911 

2103 

2131 

2079 

South  Dakota 

1 159 

1 210 

1482 

1482 

1452 

1452 

1452 

2-state  total* 

1647 

2095 

2 526 

3 393 

3 555 

3583 

3S31 

7-state  total® 

279 86 

28778 

28  660 

29856 

29252 

29280 

29  363 

Spring  wheal 


Minnesota 

1741 

2 551 

2198 

2198 

Montana 

1127 

1291 

1487 

1516 

North  Dakota 

8161 

9650 

9 735 

9 856 

South  Dakota 

2169 

2095 

2079 

2079 

4-state  total® 

13198 

15  586 

15499 

15650 

Total  wheal 

9-state  total* 

43054 

44838 

44  779 

45013 

*Touls  may  not  add  correctly  became  of  round!  n*. 
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Table  A-1I.—LAC1E  Production  Estimates  for  Phase  II,  by  Crop  Type,  by  State,  and  by  Monthly  Report, 
as  Revised  In  CAS  Annual  Report  OS,  December  15, 1976,  V.S.  Great  Plains 

[ Thousands  of  tnahetsl 


State  CAS  monthly  report 

Feb.  Mar.  Apr.  May  June  8 June  29  July  Aug.  Sept.  Oct.  End  of 

season 


Winter  wheat 


? ♦ 

i 

Colorado 

76418 

60759 

S6089 

55  285 

61  191 

60500 

51492 

50024 

52924 

52924 

52  924 

3 

} 

Kansas 

258  074 

269638 

255  147 

283  124 

326  677 

333644 

334107 

338078 

339  974 

339974 

344472 

] 

Nebraska 

151  762 

124342 

118458 

1104% 

128692 

131019 

132118 

130547 

110972 

110972 

110972 

1 

Oklahoma 

80264 

76041 

74  823 

84699 

94975 

95  667 

92052 

98156 

96491 

96  491 

96491 

• 

r; 

V 

Texas 

59550 

66676 

59  559 

86910 

84094 

85  324 

80797 

80637 

81  312 

81  312 

81  312 

1 

) 

i 

£ 

5-state  total 

626068 

597  456 

S64076 

620  SI4 

695629 

706154 

690566 

697442 

681  673 

681  673 

686  171 

i 

i 

#:• 

Montana 

13  527 

24  808 

30082 

55  788 

62  877 

63758 

62167 

j 

K 

South  Dakota 

31  553 

32931 

450% 

450% 

45904 

45904 

45904 

2-state  total 

45080 

57739 

75178 

100884 

108781 

109662 

108071 

j 

3 

f, 

7-state  total 

740709 

763893 

765744 

798326 

790454 

791  335 

794  242 

s 

i 

Spring  wheat 

Minnesota 

55490 

77  230 

66  589 

66589 

y 

Montana 

29188 

3S064 

40240 

41058 

North  Dakota 

226034 

261  197 

263  703 

266529 

r> 

South  Dakota 

36  719 

35908 

35  675 

35675 

\ 

4-state  total 

347  431 

409  399 

406  207 

409851 

Total  whea' 


9-state  total 


1 145  757  1 199853  1 197542  1 204093 


Table  A 'III. — LA  CIE  Yield  Estimates  for  Phase  II,  by  Crop  Type,  by  State,  and  by  Monthly  Report 
as  Revised  In  CAS  Annual  Report  03,  December  IS,  1976,  U.S.  Great  Plains 

l Bushels  per  acrtl 


State 

CAS  monthly  report 

Feb. 

Mar. 

Apr. 

May 

June  8 

June  29 

July 

Aug. 

Sept. 

Oct. 

End  of 
season 

Winter  wheat 

Colorado 

21.6 

22.0 

20.3 

19.7 

20.4 

20.4 

18.0 

17.7 

19.6 

19.6 

19.6 

Kansas 

52.2 

31.6 

29.9 

30.1 

31.0 

31.1 

30.9 

30.9 

30.9 

30.9 

31.0 

Nebraska 

33.7 

34.2 

33.1 

30.2 

31.4 

31.4 

32.0 

32.0 

32.7 

32.7 

32.7 

Oklahoma 

22.9 

22.0 

21.7 

21.7 

22.9 

22.9 

22.9 

22.8 

22.6 

22.6 

22.6 

Texas 

18.8 

17.9 

17.1 

18.1 

18.5 

18.5 

18.7 

18.7 

18.7 

18.7 

18.7 

5-state  average 

27.6 

27.0 

25.9 

25.3 

26.4 

26i 

26.4 

26.4 

26.5 

26.5 

26.6 

Montana 

27.7 

28.0 

28.8 

29.2 

29.9 

29.9 

29.9 

South  Dakota 

27.2 

27.2 

30.4 

30.4 

31.6 

31.6 

31.6 

2-stale  average 

27.4 

27.6 

29.8 

29.7 

30.6 

30.6 

30.6 

7-state  average 

26.5 

26.5 

26.7 

26.7 

27.0 

27.0 

27.0 

Sprint!  wheat 

Minnesota 

31.9 

30.3 

30.3 

30.3 

Montana 

25.9 

27.2 

27.1 

27.1 

North  Dakota 

27.7 

27.1 

27.1 

27.0 

South  Dakota 

16.9 

17.1 

17.2 

17.2 

4-state  average 

26.3 

26.3 

26.2 

26.2 

total  wheal 

9-state  average 

26.6 

26.8 

26.7 

26.7 
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Table  A -/  V. — Coefficient  of  Variation  of  the  LACtE  Revised  Area  Estimates  for  Phase  II, 
by  Crop  Type,  by  State,  and  by  Monthly  Report,  CAS  Annual  Report  OS,  December  15, 1976, 

US  Great  Plains 


{Percent/ 


Slate 

CAS  monthly  report 

Feb 

Mar. 

Apr. 

May 

June  8 

June  29 

July 

Aug. 

Sept. 

Oct. 

End  of 
season 

Winter  wheal 

Colorado 

26 

25 

25 

24 

23 

22 

2$ 

24 

24 

24 

24 

Kansas 

12 

8 

8 

6 

6 

6 

6 

5 

5 

5 

5 

Nebraska 

18 

13 

13 

13 

12 

12 

II 

II 

11 

tl 

II 

Oklahoma 

24 

18 

<8 

16 

14 

14 

15 

15 

14 

14 

14 

Texas 

25 

30 

20 

14 

15 

14 

IS 

16 

16 

16 

16 

5-staie  CV 

9 

8 

7 

6 

5 

5 

5 

5 

5 

5 

5 

Montana 

193 

56 

52 

35 

29 

28 

28 

South  Dakota 

44 

34 

23 

23 

23 

23 

23 

2-state  CV 

65 

31 

25 

22 

20 

19 

19 

7-state  CV 

6 

5 

5 

5 

5 

5 

5 

Spring  wheat 

Minnesota 

40 

27 

30 

30 

Montana 

28 

23 

24 

22 

North  Dakota 

14 

5 

5 

5 

South  Dakota 

12 

13 

13 

13 

4-state  CV 

10 

6 

6 

6 

Total  wheat 

9-state  C*' 

5 

4 

4 

4 
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Table  A‘V.— Coefficient  of  Variation  of  the  LACIE  Revised  Production  Estimates  for  Phase  11, 
by  Crop  Type,  by  State,  and  by  Monthly  Report,  CASA  nnual  Report  03,  December  IS,  1976, 

U.S.  Great  Plains 

I Percent  I 


Slate 

CAS  monthly  report 

Feb. 

Mar. 

Apr. 

May 

June  8 

June  .’V 

July 

Auk. 

Sept. 

On 

End  of 
season 

Winter  wheat 

Colorado 

a 

32 

32 

31 

28 

28 

30 

29 

29 

29 

29 

Kansas 

17 

14 

13 

12 

11 

II 

II 

10 

10 

10 

10 

Nebraska 

23 

19 

19 

19 

17 

17 

16 

16 

16 

16 

16 

Oklahoma 

29 

25 

22 

21 

17 

17 

18 

18 

18 

18 

18 

Texas 

28 

32 

22 

17 

17 

17 

17 

18 

18 

18 

18 

5-siaie  CV 

11 

10 

8 

8 

7 

7 

7 

7 

7 

7 

7 

Montana 

192 

57 

53 

36 

30 

29 

30 

South  Dakota 

46 

37 

27 

26 

26 

26 

26 

2-state  CV 

63 

32 

27 

23 

21 

20 

20 

7-state  CV 

8 

7 

7 

7 

7 

7 

7 

Spring  wheat 

Minnesota 

42 

29 

32 

32 

Montana 

29 

25 

25 

24 

North  Dakota 

17 

12 

12 

12 

South  Dakota 

18 

19 

18 

18 

4-state  CV 

13 

10 

10 

10 

Total  wheat 

9-state  CV 

6 

5 

5 

5 
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Table  A-  V!.— Coefficient  of  Variation  of  the  LA  CIE  Revised  Yield  Estimates  for  Phase  II,  by  Crop 
Type,  by  State,  and  by  Monthly  Report,  CAS  Annual  Report  OX  December  15, 1976,  U.S.  Great  Plains 

I Percent  I 


State 

C4S  monthly  report 

Feb. 

Mar. 

Apr. 

May 

June  8 

June  211 

July 

Aug. 

Sept. 

Oct. 

End  of 
teaion 

Winter  reheat 

Colorado 

21 

21 

21 

20 

17 

17 

17 

17 

17 

17 

17 

Kansas 

12 

12 

10 

10 

9 

9 

9 

9 

9 

9 

9 

Nebraska 

14 

14 

14 

14 

13 

13 

12 

12 

12 

12 

12 

Oklahoma 

17 

17 

14 

14 

10 

10 

10 

10 

10 

10 

10 

Texas 

19 

IS 

14 

13 

12 

12 

12 

12 

12 

12 

12 

5-staieCV 

7 

7 

6 

6 

5 

5 

5 

5 

5 

i 

5 

Montana 

12 

12 

9 

9 

9 

9 

9 

South  Dakota 

1$ 

15 

1$ 

14 

14 

14 

14 

2-state  CV 

9 

10 

9 

8 

8 

8 

8 

7-state  CV 

5 

5 

5 

5 

5 

5 

5 

Spring  reheat 

Minnesota 

II 

II 

II 

11 

Montana 

9 

9 

9 

9 

North  Dakota 

11 

It 

II 

11 

South  Dakota 

14 

13 

13 

13 

4-state  CV 

7 

7 

7 

7 

Total  reheat 

9-state  CV 

4 

4 

4 

4 
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Table  A • VII— Real-  Time  LA  CIE  Area  Estimates  for  Phase  11,  by  Crop  Type,  by  State,  andby  Monthly 

Report,  US.  Great  Plains 

[Thousandt  of  acral 


State 

CAS  monthly  report 

Feb. 

Mar. 

Apr. 

May 

JuneS 

June  29 

July 

Aug. 

Sept. 

Oct. 

End  qf 
season 

Winter  wheat 

Colorado 

3900 

2 7SS 

2 7SS 

2 79S 

2 969 

3023 

2856 

2851 

2 735 

2 735 

2 704 

Kintal 

8413 

8 468 

8499 

9463 

10623 

10  855 

10937 

10956 

10969 

10989 

11 125 

Nebraska 

$ 38S 

3 750 

3 609 

3679 

4111 

4184 

4140 

4092 

3 399 

3 399 

3399 

Oklahoma 

3498 

3433 

3449 

3917 

4148 

4181 

4 031 

4 311 

4 267 

4268 

4261 

Texas 

3208 

3947 

3602 

5644 

4578 

4642 

4266 

4 313 

4344 

4344 

4344 

5-state  total* 

24404 

22  3$3 

21914 

25  498 

26429 

26  885 

26230 

26523 

25  714 

25  735 

25833 

Montana 

511 

836 

918 

1448 

1783 

2128 

2 079 

South  Dakota 

573 

613 

783 

1 305 

1263 

1415 

1452 

2-state  total* 

1084 

1449 

1701 

2753 

3 046 

3 543 

3531 

7-state  total* 

27  513 

28334 

27931 

29276 

28  760 

29278 

29363 

Spring  wheal 


Minnesota 

1 300 

2 583 

2192 

2198 

Montana 

1205 

1382 

1487 

1516 

North  Dakota 

A 835 

9 598 

9600 

9 856 

South  Dakota 

1837 

2063 

2140 

2 079 

4-state  total* 

11  177 

15  626 

15419 

15650 

Total  wheat 

9-state  total* 

40453 

44  386 

44697 

45013 

*Touli  may  not  add  correctly  became  o f rounding 
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Table  A'  VIII.— Real-  Time  LA  CIE  Production  Estimates  for  Phase  II,  hy  Crop  Type, 
by  State,  and  by  Monthly  Report,  US  Great  Plains 

[Thoiaandi  ofbutheUj 


State 

CAS  monthly  report 

Feb.  Mar.  Apr.  May  June  8 June  29  July  Aug . 

Sept. 

Oct. 

End  of 
tetuon 

Winter  wheat 


Colorado 

84  $16 

60  $00 

$5847 

$4973 

60634 

61676 

$1290 

$$697 

$3  $3$ 

S3  534 

$2924 

Kansaa 

27041$ 

267199 

253948 

28$  $72 

329  607 

337214 

338940 

340092 

339728 

339  974 

344472 

Nebraska 

168  944 

128366 

119  359 

III  280 

128890 

131  216 

132322 

134040 

110970 

110972 

110972 

Oklahoma 

8038$ 

7$  765 

74808 

8$  027 

94962 

9$  64$ 

92214 

97663 

9664$ 

96670 

96491 

Texan 

60  286 

70637 

61642 

101308 

84470 

8$  723 

79817 

80798 

8132$ 

81  312 

81  312 

S-itate  total 

664  $46 

602467 

$65604 

638  160 

698  $63 

711474 

694  $83 

708  290 

682203 

682  462 

686171 

Montana 

I4IS4 

23  341 

26  $12 

43  $28 

$3  260 

63666 

62167 

South  Dakota 

17  7$$ 

18941 

24  360 

41  8S8 

39117 

44  722 

4$  904 

2 -state  tout 

31909 

42282 

$0872 

8$  386 

92  377 

108388 

108  071 

7-state  total 

730472 

7S3  756 

7454$$ 

793676 

774  $80 

790  850 

794  242 

Spring  wheal 

Minnesota 

39361 

78  200 

66404 

66589 

Montana 

33  411 

37  406 

40240 

41  OSS 

North  Dakota 

l«»  08$ 

259  81  $ 

260198 

266  $29 

South  Dakota 

•!  236 

3$  41 7 

3676$ 

3$  67$ 

• 

1 

” — 

— — 

4-state  total 

292093 

410838 

403607 

409  SSI 

Total  wheat 

9-state  tout 

1085  769 

1 18$ 418 

1 194457 

1 204093 
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Table  A-IX.— Real-Time  LACiE  Yield  Estimates  for  Phase  ll 
by  Crop  Type,  by  State,  and  by  Monthly  Report,  VS  Great  Plains 

IBu»Ms  pet  acrrl 


Sane 

CAS  monthly  tepon 

Feb.  Mat.  Apr.  May  Junes  June  29  July  Aug.  Sept. 

Oct . End  qf 

Reason 

Wmtet  wheat 


Colorado 

21.7 

220 

20.3 

19.7 

20.4 

20.4 

110 

19.5 

19* 

19* 

19* 

Kama* 

32.1 

31* 

29.9 

30.2 

31.0 

31.1 

31.0 

3IO 

31.0 

30.9 

31.0 

Nefcraaka 

31.4 

34.2 

33.1 

30.2 

3M 

31.4 

32.0 

32.8 

32.7 

32.7 

32.7 

Oklahoma 

230 

22.1 

21.7 

21.7 

22.9 

22.9 

22.9 

22.7 

22.7 

22.7 

22.6 

Texaa 

111 

17.9 

17.1 

17.9 

11.4 

115 

117 

117 

117 

117 

117 

S-aute  average 

27.2 

27.0 

2SJ 

25.0 

264 

26.5 

26.5 

26.7 

26.5 

26.5 

26.6 

Montana 

27.7 

27.9 

21.9 

30.1 

29.9 

29.9 

29.9 

South  Dakou 

31.0 

30.9 

31.1 

32.1 

3IO 

31* 

31.6 

2-atatc  average 

29.4 

29.2 

29.9 

31.0 

30.3 

30.6 

30.6 

7-aUte  average 

26* 

26* 

26.7 

27.1 

26.9 

27.0 

270 

Spring  wheal 


Minnesota 

30.3 

30.3 

30.3 

30.3 

Montana 

27.7 

27.1 

27.1 

27.1 

North  Dakota 

27.5 

27.1 

27.1 

270 

South  Dakou 

17.0 

17.2 

17.2 

17.2 

4-alate  average 

26.1 

26.3 

26.2 

26.2 

Total  wheat 

9-aiaie  avaragt 

26.1 

26.7 

26.7 

26.7 
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Appendix  B 

LACIE  Phase  III  Estimates 


Table  B-i. — Revised  LA  CIE  Area  Estimates  by  State  for  Each  CAS  Report  in  Phase  III, 
U.S  Great  Plains,  CAS  Annual  Report  OS,  December  22,  1977 


fThouandi  of  atm! 


Stole  CAS  monthly  report 

Feb.  May  June  July  Aug.  Sept.  Ort.  End  of 

■MM 


Winter  wheat 


Colorado 

1997 

2600 

3601 

3 261 

32S3 

3059 

3 39$ 

3459 

Kanaas 

b*ti 

10439 

II  OSS 

12  919 

I2S79 

12  461 

12669 

12494 

Nebraska 

3 Obi 

3 27» 

3139 

3144 

3558 

3130 

3 375 

3433 

Oklahoma 

3 206 

4172 

S 221 

S7SS 

$963 

6013 

S6S8 

$675 

Texas 

3 36$ 

4196 

4462 

SOU 

4600 

4613 

4476 

4476 

{•state  total* 

IS  523 

26  34$ 

28  192 

30  797 

299S3 

29  3S3 

29  573 

29  537 

Montana 

2127 

3 369 

3704 

2 626 

3 3SS 

3628 

3 314 

3371 

South  Dakota 

WO 

1 107 

1401 

1943 

I $94 

989 

883 

912 

2*statc  total* 

2927 

4 476 

S 109 

4 S69 

4949 

4617 

4197 

4 283 

7 -a  late  total* 

21 450 

30121 

33297 

3$  366 

34902 

33969 

33  771 

33120 

Spring  wheat 


Minnesota 

2 420 

2 $$3 

2 474 

2 289 

2 344 

Montana 

119$ 

1942 

2117 

2 ISO 

2174 

North  Dakota 

9071 

9 220 

1*73 

9 ITS 

9183 

South  Dakota 

1269 

2309 

1 9SI 

1909 

1936 

4-atate  total* 

14  65$ 

16024 

IS  142 

1$  $22 

1$  638 

Total  wheat 

9-ttate  total* 

*0021 

$0926 

49  111 

49  293 

49  351 

*Tuul»  mty  iwi  <44  <wfall>  bwaux  <4  loundini 
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Table  B-IL— Revised  LA  CIE  Production  Estimates  by  State  for  Each  CAS  Report  in  Phase  til, 
U.S.  Great  Plains,  CAS  Annual  Report  05,  December  22, 1977 

l Thousands  of  bushels i 


Slate  CAS  monthly  report 


Feb. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

End  of 
sea son 

Winter  wheat 

Colorado 

4$  520 

81898 

85  314 

73  383 

73031 

68675 

76  226 

77  666 

Kansas 

199125 

293  385 

312  339 

372688 

362  866 

359652 

36S465 

360410 

Nebraska 

93  931 

102497 

115  745 

122819 

114 134 

100 106 

107830 

109823 

Oklahoma 

69688 

102  554 

103413 

114  725 

119  208 

121845 

113  064 

113  387 

Texas 

64623 

81789 

90667 

101  510 

93  261 

93510 

90695 

90695 

$-state  total8 

472885 

662123 

707  478 

785  125 

762  500 

743  788 

753  280 

751 981 

Montana 

56  803 

96173 

104  087 

69  502 

88  789 

96021 

87  712 

89224 

South  Dakota 

21849 

28809 

36457 

51  718 

43143 

26  760 

23907 

24682 

2-state  total8 

78  652 

124982 

140544 

121  220 

131  932 

122781 

111619 

113906 

7-state  total8 

551  536 

787  105 

848022 

906345 

894  432 

866  570 

864900 

865888 

Spring  wheat 

Minnesota 

78481 

80840 

79043 

73  213 

74955 

Montana 

34939 

34  939 

39  357 

38  683 

39112 

North  Dakota 

223257 

210668 

197  503 

211  253 

211990 

South  Dakota 

26977 

48075 

40  759 

39748 

40309 

4-state  total8 

363654 

374  522 

356  662 

362896 

366  367 

Total  wheat 

9-state  total8 

1269999 

1 268954 

1 223  233 

1 227  796 

1 232  255 

*Touls  may  not  add  correcily  because  of  roundint. 


Table  B-lll.— Revised  LACIE  Yield  Estimates  by  State  Jbr  Each  CAS  Report  In  Phase  III, 
U.S.  Great  Plains,  CAS  Annual  Report  OS,  December  22, 1977 

(Bushels  pet  acre/ 


State  CAS  monthly  report 


Feb. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

End  qf 
season 

Winter  wheal 

Colorado 

22.8 

22.8 

23.6  22.5 

22.5 

226 

226 

22.5 

Kansas 

28.9 

28.1 

28J  28.8 

28.8 

28.8 

28.8 

289 

Nebraska 

30.6 

31.3 

30.2  31.9 

32.1 

32.0 

32.0 

32.0 

Oklahoma 

21.7 

21.2 

19.8  19.9 

20.0 

20.0 

20.0 

20.0 

Texas 

19.2 

19.5 

20.3  20.3 

20.3 

20.3 

20.3 

20.3 

5-stale  average 

256 

25.1 

25.1  256 

25.5 

25.3 

25.5 

256 

Montana 

26.7 

28.5 

28.1  26.5 

26.5 

26.5 

26.5 

26.5 

South  Dakota 

27.3 

26.0 

26.0  26.6 

27.1 

27.1 

27.1 

27.1 

2-state  average 

26.9 

27.9 

27.5  266 

26.7 

26.6 

26.6 

26.6 

7-state  average 

25.7 

25.5 

256  25.6 

25.6 

25.5 

25.6 

25.6 

Spring  wheat 

Minnesota 

32.4 

31.7 

31.9 

32.0 

32.0 

Montana 

18.4 

18.0 

18.0 

18.0 

18.0 

North  Dakota 

24.6 

22.8 

23.2 

23.0 

23.1 

South  Dakota 

21.3 

20.8 

20.8 

20.8 

20.8 

4-state  average 

24.8 

23.4 

23.6 

23.4 

23.4 

Total  wheat 

9-state  average 

25.4 

24.9 

24.9 

24.9 

24.9 
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Table  B-l  V. — Coefficient  of  Variation  for  the  Revised  LACIE  Area  Estimates  for  Phase  III,  by  State  and  by 
Monthly  Report,  U.S.  Great  Plains,  CAS  Annual  Report  OS,  December  22,  1977 

IPerctntf 


Stale  CAS  monthly  report 

Feb.  May  June  July  Atiy  Sept.  Oet.  End  of 

season 


Winter  wheat 


Colorado 

21.0 

14.2 

136 

13.4 

11.3 

10.3 

9.9 

9.8 

Kansas 

13.9 

6.2 

5.8 

4.5 

4.8 

4.5 

4.2 

4.0 

Nebraska 

14.9 

11.4 

9.5 

11.6 

10.2 

9.2 

9.6 

9.2 

Oklahoma 

9.6 

10.0 

9.0 

7.1 

6.7 

7.2 

7.7 

7.6 

Texas 

16.7 

14.2 

12.5 

11.6 

12.8 

12.7 

13.7 

13.7 

S-siate  CV 

7.1 

4.5 

4.1 

36 

3.6 

3.5 

3.5 

3.4 

Montana 

21  1 

18.8 

17.8 

9.8 

7.9 

6.9 

7.8 

7.9 

South  Dakota 

60.0 

43.1 

25.0 

40.3 

38.1 

26.5 

25.7 

250 

2-state  CV 

22.4 

17.7 

146 

18.1 

13.4 

7.8 

8.2 

8.2 

7-state  CV 

6.8 

4.6 

4.1 

3.9 

3.6 

3.2 

3.2 

32 

Spa  wheal 


Minnesota 

12.2 

13.0 

116 

9.9 

9.5 

Montana 

37.2 

18.0 

12.2 

10.3 

10.2 

North  Dakota 

10.7 

5.7 

5.0 

4.4 

4.4 

South  Dakota 

40.4 

13.4 

13.1 

11.6 

96 

4-state  CV 

9.2 

4.8 

4.2 

3.6 

3.5 

Total  wheat 

9-state  CV 

3.4 

26 

2.5 

2.4 

2.4 
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Table  B-  V. — Coefficient  of  Variation  for  the  Revised  LACIE  Production  Estimates  for  Phase  111, 
by  State  and  by  Monthly  Report,  U.S.  Great  Plains,  CAS  Annual  Report  05,  December  22, 1977 

{Percent] 


Stale  CAS  monthly  report 


Feb. 

May 

June 

July 

Aug. 

Sept. 

Oct. 

End  qf 
season 

Winter  wheat 

Colorado 

28.0 

22.3 

20.3 

19.8 

18.6 

17.9 

17.7 

17.7 

Kansas 

18.4 

12.5 

11.5 

10.7 

10.8 

10.7 

10.5 

10.5 

Nebraska 

18.1 

15.4 

14.3 

15.0 

13.9 

13.1 

13.4 

13.1 

Oklahoma 

16.7 

15.9 

14.1 

12.7 

12.3 

12.4 

12.9 

12.9 

Texas 

20.2 

16.5 

14.7 

14.0 

14.9 

14.9 

15.7 

15.7 

S-state  CV 

9.7 

7.4 

6.8 

6.5 

6.4 

6.4 

6.4 

6.3 

Montana 

30.4 

231 

22.0 

15.5 

14.4 

13.9 

14.4 

14.4 

South  Dakota 

61.6 

46.2 

30.8 

43.9 

41.8 

31.9 

31.3 

30.7 

2 -state  CV 

27.9 

20.8 

182 

20.7 

16.8 

12.9 

13.1 

13.1 

7-state  CV 

9.3 

7.1 

6.5 

6.4 

6.1 

5.8 

5.9 

5.8 

Spring  wheat 

Minnesota 

16.1 

16.3 

15.1 

13.9 

13.6 

Montana 

40.0 

22.7 

186 

17.4 

17.3 

North  Dakota 

16.1 

13.7 

13.1 

13.1 

13.0 

South  Dakota 

41.9 

17.7 

17.5 

16.4 

15.0 

4-state  CV 

12.1 

9.6 

9.1 

9.1 

9.0 

Total  wheal 

9-state  CV 

5.0 

4.9 

4.8 

4.9 

4.8 

Table  B - Vi— Coefficient  of  Variation  for  the  Revised  LA  CIE  Yield  Estimates  for  nose  111,  by  State  and  by 
Monthly  Report,  U.S.  Great  Plains,  CAS  Annual  Report  OS,  December  22, 1977 

[Percent] 


State  CAS  monthly  report 

Feb.  May  June  July  Aug.  Sept.  Oct.  End  of 

season 


Winter  wheat 


Colorado 

18.9 

17.4 

15? 

14.8 

14.8 

14.8 

14.8 

14.8 

Kansas 

12.1 

10.8 

10.0 

9.7 

9.7 

9.7 

9.7 

9.7 

Nebraska 

10.5 

10.8 

10.7 

9.7 

9.5 

9.3 

9.3 

9.3 

Oklahoma 

13.8 

12.5 

11.0 

10.7 

10.3 

10.2 

10.4 

10.4 

Texas 

16.5 

11.6 

10.6 

10.8 

11.3 

11.3 

11.7 

11.7 

5-state  CV 

6.7 

6.1 

5.6 

5.6 

5.6 

5.6 

5.6 

5.6 

Montana 

22.5 

13.7 

13.2 

12.1 

12.1 

12.1 

12.1 

12.1 

South  Dakota 

17.5 

18.6 

18.6 

18.9 

18.5 

18.5 

18.5 

18.5 

2-state  CV 

16.3 

11.1 

10.7 

10.1 

9.9 

10.2 

10.2 

10.2 

7-state  CV 

6.3 

5.5 

5.1 

5.1 

5.1 

5.1 

5.1 

S.l 

Spring  wheat 

Minnesota 

12.8 

11.6 

11.2 

10.8 

10.7 

Montana 

14.9 

14.0 

14.0 

14.0 

14.0 

North  Dakota 

15.1 

12.8 

12.3 

12.4 

12.4 

South  Dakota 

12.1 

11.6 

11.6 

11.6 

11.6 

4-state  CV 

10.5 

8.6 

8.3 

8.5 

8.4 

Total  wheat 

9-state  CV 

3.9 

3.9 

4.2 

43 

4.3 
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Table  B-  VU. — Real-Time  LACIE  Ana  Estimates  for  Phase  t!l,  by  Crop  Type,  by  State, 
and  by  Monthly  Report,  U.S.  Great  Plains 


{Thousands  of  acresj 


Stale  CAS  monthly  report 


Feb.a 

Feb.b 

Apr.  22 

May 

June 

July 

Aug.c 

Septf 

Oct* 

End  of 
seasonf 

Winter  wheat 

Colorado 

2 

1*3 

2 

135 

2 

189 

3 

093 

3 

065 

2 

962 

3 

059 

3 

059 

3 

432 

3 

459 

Kansas 

6 

719 

6 

491 

6 

794 

to 

190 

to 

91S 

11 

764 

12 

38S 

12 

501 

12 

669 

12 

494 

Nebraska 

2 

977 

2 

892 

3 

072 

3 

169 

3 

610 

3 

475 

3 

423 

3 

105 

3 

325 

3 

433 

Oklahoma 

2 

953 

2 

943 

3 

061 

4 

506 

4 

875 

5 

264 

5 

543 

6 

074 

5 

950 

5 

675 

Texas 

2 

954 

3 

294 

3 

517 

4 

262 

4 

529 

4 

511 

4 

311 

4 

513 

4 

581 

4 

476 

5-state  totalS 

17 

786 

17 

755 

18 

633 

25 

220 

26 

994 

27 

976 

28 

721 

29 

252 

29 

957 

29 

:37 

Montana 

2 

763 

2 

274 

2 

274 

2 

973 

3 

253 

3 

097 

2 

746 

3 

597 

3 

416 

3 

371 

South  Dakota 

1 

044 

1 

721 

1 

721 

2 

261 

2 

601 

4 

629 

1 

353 

1 

039 

963 

912 

2-state  total* 

3 

807 

3 

995 

3 

995 

5 

234 

5 

854 

7 

726 

4 

099 

4 

636 

4 

379 

4 

283 

7-state  total* 

21 

594 

21 

750 

22 

627 

30 

453 

32 

848 

35 

701 

32 

819 

33 

888 

34 

336 

33 

820 

Spring  wheal 


Minnesota 

2 238 

2 461 

2 289 

2 344 

Montana 

1 369 

2 187 

2 150 

2 174 

North  Dakota 

6 761 

8 678 

9 173 

9 183 

South  Dakota 

2 167 

2 160 

1 909 

1 936 

4-state  total* 

12  53S 

15  487 

15  522 

15  638 

Total  wheat 


9-state  total® 


45  355  49  375  49  857  49  458 


*CAS  monthly  report  released  February  8. 1977;  results  based  on  Phase  H sampling  strategy  with  431  sample  segments  allocated,  used  data  through  December  1976 
**CAS  monthly  report  released  April  6. 1977.  mutts  based  on  bOl  -sample-segment  Phase  111  strategy,  used  data  through  December  1976;  duplicated  the  February  CMR  using 
Phase  III  sampling  strategy 

vThe  results  contained  in  the  August  1977  CMR  were  obtained  by  redesignating  the  Montana  and  South  Dakota  segments  as  spring  only,  winter  only,  and  miacd  In  addition, 
the  Landsat  data  were  "thresholdcd"  to  eliminate  early-season  data  See  CMR-29,  August  10, 1977.  for  more  details 

^The  mulls  in  the  September  1977  CMR  were  obtained  by  “thresholding"  and  “screening  “ the  Landsat  data  to  eliminate  early-season  data  and  data  that  were  significantly 
different  from  historically  similar  data  Sec  CMR*3t.  September  9,  1977.  for  more  details 

tThc  results  in  the  October  1977  CMR  were  obtained  by  thresholding  and  screening  plus  a reallocation  of  the  Phase  III  segments  based  on  wheat  rather  than  small  grains 
Thirty-eight  segments  were  dropped  See  CMR-33.  October  It.  1977,  for  more  details 

*The  cnd-of-scaion  results  were  obtained  by  screening,  thresholding,  segment  rcdesignation.  and  reallocation  as  described  in  footnotes  c,  d,  and  e above. 

^Totals  may  not  add  correctly  because  of  rounding 
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Table  B-  VIII, — Real- Time  LACIE  Production  Estimates  for  Phase  III,  by  Crop  Type,  by  State,  and  by 

Monthly  Report,  U.S  Great  Plains 

[Thousands  of  bushels) 


Stale 

CAS  monthly  report 

Feb.a 

FebP 

Apr.  22 

May 

June 

July 

Augf 

SeptP 

Oct.e 

End  of 
season? 

Winter  wheat 

Colorado 

49  772 

48659 

49037 

70  357 

72456 

66516 

68682 

68675 

77070 

77  666 

Kansas 

194220 

187  644 

190  941 

286  373 

308  387 

339  348 

357  263 

360616 

365465 

360410 

Nebraska 

90058 

88  444 

96  579 

99038 

108793 

111903 

109960 

99264 

106120 

109823 

Oklahoma 

64  391 

63918 

64413 

95  560 

% 550 

104  907 

110463 

121 671 

119  208 

113  387 

Texas 

56  726 

63  305 

63516 

83068 

91  965 

91691 

87  579 

91594 

92885 

90695 

S-state  total8 

455  167 

451  970 

464486 

634  396 

678 151 

714  365 

733947 

741  820 

760  748 

751  981 

Montana 

73  799 

60723 

65  712 

85  751 

91  417 

81983 

72678 

95  206 

90411 

89224 

South  Dakota 

28  513 

46978 

46  057 

58  836 

67685 

123196 

36621 

28130 

26072 

24682 

2-state  total8 

102312 

107  701 

111769 

144  587 

159102 

205179 

109299 

123  336 

116483 

113906 

7-state  total8 

557479 

559672 

576  255 

778  982 

837  254 

919  544 

843247 

865  156 

877  231 

865888 

Spring  wheal 

Minnesota 

71  199 

78  744 

73  213 

74955 

Montana 

24634 

39  357 

38683 

39112 

North  Dakota 

157  751 

200  529 

211  247 

211990 

South  Dakota 

45  103 

44  969 

39  748 

40309 

4-state  total8 

298686 

363  599 

362  890 

366  367 

Total  wheal 

9-state  total8 

1 141  933  1 228754  1 240121 

1 232  255 

*C  AS  monthly  report  released  February  8. 1977;  results  based  on  Phase  li  sampling  strategy  with  431  sample  segments  allocated:  used  data  through  December  197b 

AS  monthly  report  released  April  b.  1977,  results  based  on  601 -sample-segment  Phase  III  strategy,  used  data  through  December  1976:  duplicated  the  February  CMR 
using  Phase  III  sampling  strategy 

cThc  results  contained  in  the  August  1977  CMR  were  obtained  by  redesignating  the  Montana  and  South  Dakota  segments  as  spring  only,  winter  only,  and  mixed.  In  ad- 
dition. the  Landsat  data  were  “thrcsholded"  to  eliminate  early-scason  data  See  CMR-29.  August  10. 1977.  for  more  details 

‘H’hc  results  in  the  September  1977  CMR  were  obtained  by  'thresholding"  and  "screening"  the  Landsat  data  to  eliminate  early -season  data  and  data  that  were  signifi- 
cantly different  from  historically  similar  data  See  CMR-Jt.  September  9. 1977.  for  mote  details 

cThe  results  in  the  October  1977  CMR  were  obtained  by  thresholding  and  scrcenutg  plus  a reallocation  of  the  Phase  111  segments  based  on  wheat  rather  than  small 
y,  .ms  Thmy-«»ght  segments  were  dropped  See  CMR-33.  October  II,  1977,  for  more  details 

^Thc  end-of-scason  results  were  obtained  by  screening,  thresholding,  segment  redesignation,  and  reallocation  as  described  in  footnotes  c,  d.  and  e above 

•Totals  may  not  add  correctly  because  of  rounding 
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Table  B-IX. — Real-Time  LACIE  Yield  Estimates  for  Phase  III,  by  Crop  Type,  by  State, 
and  by  Monthly  Report,  U.S.  Great  Plains 

IBushets  per  acre) 


Stoic 

CAS  monthly  repori 

Feb.a 

FebP 

Apr.  22 

May 

June 

July 

A ug.c 

Sepif 

Ocl.e 

End  of 
seam f 

Wittier  wheat 

Colorado 

22.8 

22.8 

22.4 

22.8 

23.6 

22.5 

22.5 

22.5 

22.5 

22.5 

Kansas 

28.9 

28.9 

28.1 

28.1 

28.3 

28.8 

28.8 

28.8 

28.8 

28.8 

Nebraska 

30.2 

30.6 

31.4 

31.1 

30.1 

32.2 

32.1 

32.0 

31.9 

32.0 

Oklahoma 

21.8 

21.7 

21.0 

21.2 

19.8 

19.9 

19.9 

20.0 

20.0 

20.0 

Texas 

19.2 

19.2 

18.1 

19J 

20.3 

20.3 

20.3 

20.3 

20.3 

20.3 

5-state  average 

25.6 

25.5 

24.9 

25.2 

25.1 

25.5 

25.6 

25.4 

25.4 

25.5 

Montana 

26.7 

26.7 

28.9 

28.8 

28.1 

26.5 

26.5 

26.5 

26.5 

26.5 

South  Dakota 

27.3 

27.3 

26.8 

26.0 

26.0 

26.6 

27.1 

27.1 

27.1 

27.1 

2-state  average 

26.9 

27.0 

28.0 

27.6 

27.2 

26.6 

26.7 

26.6 

26.6 

26.6 

7-state  average 

25.8 

25.7 

25.5 

25.6 

25.5 

2S.8 

25.7 

25.5 

25.5 

25.6 

Spring  Utica! 

Minnesota 

31.8 

32.0 

32.0 

32.0 

Montana 

18.0 

18.0 

18.0 

18.0 

North  Dakota 

23.3 

23.1 

23.0 

23.1 

South  Dakota 

20.8 

20.8 

20.8 

208 

4-state  average 

23.8 

23.5 

234 

23.4 

Total  wheal 

9-state  average 

25.2 

24.9 

24.9 

24.9 

*C AS  monthly  report  released  February  8. 1977.  result* **  based  on  Phase  it  sampling  strategy  with  431  sample  segments  allocated,  used  data  through  December  1976 

**C  AS  monthly  repori  released  April  6. 19""\  results  based  on  601  -sample  -segment  Phase  HI  strategy . used  data  through  December  197*».  duplicated  the  February  (‘MR 
using  Phase  til  sampling  strategy 

vThe  results  contained  m the  August  1977CMR  were  obtained  by  redesignating  the  Montana  and  S*»uth  Dakota  segments  as  spring  only . winter  onlc . and  nosed  In  ad- 
dition, the  Landsat  data  were  "thresholded"  to  eliminate  cjrly-scjson  data  See  ( MR  29.  August  10.  1977.  for  more  details 

^Thc  results  in  the  September  1977  (’MR  were  obtained  by  ’ thresholding'*  and  ’ screening1*  the  Landsat  data  to  eliminate  early -season  data  and  data  that  were  signifi- 
cantly different  from  historically  similar  data  Sec  ( MR-31.  September  9. 1977.  for  more  details 

eThc  results  in  the  (Xtober  1977  CMR  were  obtained  by  thresholding  and  screening  plus  a t (allocation  of  the  Phase  III  segments  based  or.  wheat  rather  than  small 
grains.  Thirty-eight  segments  were  dropped  See  CMR-33.  October  II.  1977.  for  more  details 

fThc  cnd-of-scason  results  were  obtained  by  screening,  thresholding,  segment  redesrgnaiton.  and  reallocation  as  described  in  footnotes  c.  d.  and  c abuse 


479 


LACIE  Area,  Yield,  and  Production  Estimate 
Characteristics:  U.S.8.R. 

J.R.  Hickman* 


OVERVIEW 

The  U.S.S.R.  is  one  of  the  major  food  producers  of 
the  world;  however,  because  of  the  climatic 
differences  and  fluctuations  in  the  vast  area  involved 
(20°  to  140°  E longitude  and  40°  to  60s  N latitude), 
the  environmental  limitations  often  render  the  coun- 
try incapable  of  maintaining  self-sufficiency  on  a 
yearly  basis.  Although  the  U.S.S.R.  is  the  world’s 
largest  producer  of  wheat  (the  principal  food  grain 
used  for  human  consumption)  and  caloric  intake 
supplied  by  grains  is  declining,  the  U.S.S.R.  is  a net 
importer  of  this  commodity  in  almost  as  many  years 
as  it  has  a surplus. 

The  Soviet  position  regarding  wheat  supply  and 
demand  is  not  unique,  but  four  aspects  of  the  supply 
picture  make  the  wheat  position  of  this  country  a 
dominant  factor  in  the  world  market  for  grains 
because  of  the  magnitude  of  U.S.S.R.  import  require- 
ments in  years  of  non-self-sufficiency.  The  four 
parts,  all  covered  by  the  umbrella  of  nonavailability 
of  information,  are  (1)  stock  position  at  any  given 
time,  (2)  crop  condition  during  the  growing  season, 
(3)  total  production  for  the  year,  and  (4)  magnitude 
of  the  total  import  requirement  in  years  of  short  sup- 
ply. The  sensitivity  of  the  market  to  the  U.S.S.R. 
wheat  situation  and  the  varied  climatic,  topographi- 
cal, agronomic,  and  cultural  features  encountered  in 
the  wheat-growing  area  made  the  U.S.S.R.  a natural 
choice  for  LACIE's  first  attempt  at  crop  estimation 
outside  North  America. 

The  objective  of  this  paper  is  to  discuss  produc- 
tion, area,  and  yield  estimates  in  the  U.S.S.R.  during 
Phases  II  and  III  of  LACIE  (no  estimates  were 
generated  in  Phase  I).  For  Phases  II  and  III,  the 
following  topics  are  discussed. 

1.  Scope 

7.  Sampling  strategy 

3.  Data  base 


“USDA  Foreign  Agricultural  Service.  Houston.  Texas. 


4.  Landsat  data 

5.  Yield  analysis  for  winter  wheat  and  spring 
wheat 

6.  Area  and  production  estimates  for  winter 
wheat  and  spring  wheat 

7.  Technical  issues  and  problems 

In  addition,  the  methods  of  selecting  data  (area 
estimates)  in  Phase  II!  are  discussed;  and  winter 
wheat,  spring  wheat,  and  total  wheat  estimates  are 
compared.  The  accuracy  of  the  winter  wheat,  spring 
wheat,  and  total  wheat  production,  area,  and  yield 
estimates  for  Phase  III  is  discussed. 


PHASE  I:  CROP  YEAR  1 974-75 

The  U.S.S.R.  work  in  Phase  I centered  around 
constructing  an  initial  historical-statistical  data  base, 
locating  sample  segments  within  the  country,  and  ac- 
quiring multispectra!  scanner  (MSS)  data  for  a subset 
of  the  sample  for  study  by  the  image  analyst.  No  esti- 
mates were  generated  for  the  U.S.S.R.  during  this 
phase. 


PHASE  II:  CROP  YEAR  1 975-76 


Soopt 

The  LACIE  Phase  II  effort  in  the  U.S.S.R.  was 
limited  to  two  indicator  regions,  one  in  the  winter 
wheat  area  and  one  in  the  spring  wheat  area  (fig.  1). 

The  winter  wheat  indicator  region  includes  the 
Baltics,  Belorussia,  the  Ukraine,  Moldavia,  the  north- 
ern Caucasus,  central  non-Chernozem,  and  th* 
oblasts  of  Belgorod  and  Kalmyk.  This  area  can  be  r,  - 
peeled  to  give  a fairly  clear  indication  of  the  ent:  e 
winter  wheat  situation  because  it  contributes  70.3 
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percent  to  total  area  and  82.3  percent  to  total  produc- 
tion, according  to  1971  historical  data.  The  percent 
contribution  of  area  versus  production  indicates  the 
inclusion  of  a preponderance  of  the  higher  yielding 
areas  in  the  indicator  region.  This  does  not  appear  to 
be  significant  because  an  examination  of  additional 
years  of  data  indicates  this  is  the  norm  rather  than 
the  exception. 

The  spring  wheal  indicator  region  is  composed  of 
10  oblasts— Orenburg,  Chelyabinsk,  Kurgan, 
Kustanay,  Turgay,  Tselinograd,  Kokchetav,  Severe- 
Kazakhstan,  Pavlodar,  and  Novosibirsk.  These 
oblasts  (except  Orenburg)  were  good  testing  sites  for 
the  LACiE  technology  because  such  factors  as  the 
short  growing  season,  marginal  precipitation  for 
wheat  production,  the  potential  for  the  occurrence  of 
early  frost,  and  other  potential  wheat  problems  made 
them  high-risk  areas.  According  to  1971  statistics, 
the  spring  wheat  indicator  region  contributed  39.7 
percent  to  the  total  spring  wheat  area  and  37.2  per- 
cent to  the  total  production.  The  per.ent  contribu- 
tion of  area  versus  production  indicau  t a slight  pre- 
ponderance of  the  more  marginal  y l.lding  areas  in 
this  region. 


Strategy 

The  sampling  strategy  used  in  LACIE,  based  on 
total  area  and  wheat  density,  allocated  a total  of  1947 
sample  segments  ($•  by  6-nautical-mile  samples)  for 
the  U.S.S.R.  These  segments  were  randomly  located 
in  the  wheat-producing  areas.  The  winter  wheat  in- 
dicator region  contained  38S  sample  segments;  the 
spring  wheat  region  had  362  segments. 


Data  Bate 

The  data  base  can  be  divided  into  four  principal 
categories — allocation  data,  historical-statistical  data 
(crop  production,  area,  and  yield),  yield  data,  and 
spectral  data. 

Allocation  data. — Allocation  data  include  hierar- 
chical identifiers;  i.e..  codes  for  country  (UR),  region 
(economic  region),  zone  (group  of  oblasts),  and 
stratum  (oblast).  This  part  of  the  data  base  also  iden- 
tifies each  sample  segment  (winter  wheat  or  spring 
wheat)  and  its  position  in  the  hierarchy  or  its 
geographical  location. 

Historical  statistical  data. — Historical  statistical 


data  include  historical  production,  amt,  and  yield 
statistics  for  both  spring  and  winter  wheat  at  the 
stratum  (oblast)  level  and  for  each  higher  level  of  the 
hierarchy.  Also  included  in  this  category  are  data  on 
derived  ratios  of  wheat  to  other  small  grains.  The 
need  for  this  portion  of  the  data  base  manifested  it- 
self early  in  Phase  II  (fall  1975)  when  it  became  evi- 
dent that  the  image  analysts  (photograph  in- 
terpreters working  with  hard-copy  MSS  data)  were 
unable  to  separate  wheat  from  rye  or  barley. 
Therefore,  the  classification  of  the  MSS  data  was  for 
small  grains  rather  than  for  spring  or  winter  wheat. 

The  ratios  established  were  for  (1)  winter  wheat 
to  fall-sown  small  grains  (rye,  barley,  and  winter 
wheal),  (2)  winter  wheat  to  total  small  grains  (fall- 
sown  rye  and  barley;  spring-sown  barley;  and  oats, 
spring  wheat,  and  winter  wheat),  (3)  spring  wheat  to 
spring-sown  small  grains  (oats,  barley,  and  spring 
wheat),  and  (4)  spring  wheat  to  total  small  grains 
(winter  wheat;  winter-sown  rye  and  barley;  and 
spring-sown  oats,  barley,  and  spring  wheat). 

Yield  data.— The  yield  data  were  received  from 
the  National  Oceanic  and  Atmospheric  Administra- 
tion (NOAA)  Center  for  Climatic  and  Environmen- 
tal Assessment  (CCEA)  at  Columbia,  Missouri.  The 
spring  and  winter  wheat  yield  estimates  were  then  in- 
put at  the  stratum  level  (oblast)  as  appropriate;  i.e., 
winter  yield,  spring  yield,  or  both  depending  on  the 
class  (or  classes)  of  wheat  historically  produced 
within  a given  stratum. 

Spectral  data. — For  each  usable  acquisition  (data 
of  such  quality  that  the  classification  procedures 
would  yield  a reasonable  and  acceptable  estimate  of 
Landsat  data  for  a sample  segment),  the  following  in- 
formation was  entered  in  the  data  base:  segment 
number;  crop  year;  percentage  of  winter,  spring,  and 
total  small  grains;  crop  type;  biological  growth  stage; 
Landsat  acquisition  date;  and  classification  date. 


LnndMtData 

The  quantity  of  the  usable  MSS  data  acquired  for 
the  U.S5.R.  is  very  poor  compared  to  that  of  the  U.S. 
data.  The  usable  Landsat  data  acquired  for  the  two 
countries  during  Phase  II  are  compared  in  table  1. 

Although  MSS  data  were  acquired  in  the  U.S.S.R. 
winter  wheat  area  over  a much  longer  period  of  time 
(from  August  1975  to  August  1976).  the  acquisition 
rate  in  the  spring  wheat  area  (data  acquired  from 
May  to  August  1976)  was  significantly  above  that  for 
the  winter  wheat.  This  difference  is  due  to  the 


Table  /. — Comparison  of  UsaNe  MSS  Data 
Acquired. for  the  fJ.SS.K.  and  the  LIS  Great  Plains 
in  Phase  II 


Ham 

txs.it. 

( mint  Slain 

H inter 
ntoeat 
indicator 
rrxnm 

Spring 
rtieai 
mdn  amt 
rrgitin 

Total 

wbrai 

indhator 

irgnm 

Sewn  states. 
L'.S.  Great 
Plains 

No.  of  Mgmenti 

allocated 

385 

362 

747 

601 

No.  of  usable 

acquiMlioni 

690 

SOI 

1491 

1599 

Acquiaition  raw* 

1.1 

22 

20 

17 

*Numhrr  »l  «<|um(Mnt  divided  hv  numhvt  *»l  ^menu  aih*aied 


weather  conditions  in  European  U.S.S.R.  The  fall  is 
overcast  much  of  the  time,  and  spring  and  early  turn* 
mer  rains  with  their  accompanying  cloud  cover  limit 
the  gathering  of  data  by  Landsat.  The  acquisition  rate 
for  the  United  Slates  is  much  improved  over  the 
USSR,  rate  because  (I)  more  usable  data  are  ac- 
quired from  fall  planting  to  harvest  over  the  entire 
area.  (2)  cloud  cover,  haze,  and  other  atmospheric  in- 
terferences are  minor  compared  to  similar  influences 
in  European  U.S.S.R.,  and  (3)  snow  covers  larger 
areas  for  longer  time  periods  in  the  U SS  R. 

Five  reports  (crop  estimates)  were  generated  in 
Phase  II  for  the  winter  wheat  indicator  region  and 
three  for  the  spring  wheat  indicator  region.  The 
spectral  data  (number  of  segment  data)  used  i‘or  each 
report  are  summarized  in  table  II. 


Winter  Whnat  Indicator  Region 

General  ueaiher/erop  situation.— The  1975-76 
winter  wheat  crop  started  slowly  because  of  subsoil 
and  topsoil  moisture  shortages  at  the  time  of  plant- 
ing. Emerged  stands  were  thin,  spotty,  and  vulnera- 
ble to  the  cold  weather  that  followed.  Wintcriii*  took 
its  toll  when  a February  cold  snap  affecu  * .heat- 
fields  lacking  adequate  snow  cover  protection.  The 
most  damaged  areas  included  the  northern  and 
eastern  Ukraine  and  the  lower  Volga  region. 

Precipitation  amounts  remained  very  light 
throughout  the  fall  and  winter  with  the  exception  of 
a heavy  snowfall  in  January.  Spring  rains  were  timely 
and  stimulated  vigorous  growth  during  April  and 


May.  Precipitation  amounts  over  the  Europoan 
U.S.S.R.  remained  g merous  throughout  the  rowing 
season,  with  occasionally  threatening  floods  in  the 
Dnieper  River  Valley. 

Temperature.  wen.  generally  near  normal  during 
the  growing  season  but  turned  cooler  than  normal  as 
the  wheat  crop  neared  maturity.  Continued  shower 
activity  over  most  of  the  winter  wheat  area  caused 
concern  about  the  potential  damage  caused  by  lodg- 
ing and  sprouting  and  about  the  difficulty  in  getting 
harvesting  equipment  into  the  fields. 

Warm  and  sunny  weather  in  late  August  allowed 
harvesting  operations  to  near  completion  over  most 
of  the  winter  wheat  region,  only  slightly  behind  the 
normal  harvest  calendar  date. 

Ideal  postdormancy  growing  conditions  produced 
above-normal  crop  yields  from  a near  loss  the  pre- 
vious fall.  In  contrast,  soil  moisture  conditions  for 
the  1976  planting  season  (Phase  III)  were  considered 
ideal  and  prompted  the  planting  of  a record  number 
of  hectares  to  winter  grain  in  the  fall  of  1976. 

Yield  analysis. — As  a result  of  the  soil  moisture 
situation  during  the  fall  of  1975  and  the  follow-on 
winterkill  problems,  the  winter  wheat  yield  in  the 
U.S.S.R.  began  the  1976  season  on  a down  note.  April 
yield  model  estimates  based  only  on  preseason  pre- 


Tabll  II.— Spectral  Data  Used  in  Phase  II  Reports 


Month  of  report  No.  of  segments 

No.  of  segment 

Percent  of 

at  heated 

data  used  in 

allocations 

rtpori 

tvsered 

H'miff  h heat  mdua tot  rrxtoft 


April 

315 

146 

3S 

May* 

3S5 

146 

38 

June 

3S5 

197 

51 

July 

315 

248 

64 

AufUfl 

3S5 

(b) 

September 

3S5 

<b> 

October 

385 

2S5 

74 

Spring  uheai  indicator  return 

Aufual 

362 

265 

73 

September 

362 

296 

82 

October 

362 

314 

87 

*1  hi*  ief*wt  upiUtnf  the  estimate  liH  yield  and  prudtA (•>***  hr%ao»e  no  new 

sr*itial  data  war  ifvctvcd  after  the  May  frp*wt 

Nft  mte«  wheat  reywiru  were  n*u  gene  fated  ikit  month  hevatne  of  the  etVut  m- 
toited  m the  swmp  -<  m the  apt  mg  wheat  data  hear 
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cipitation  and  temperature  predicted  yield*  to  be  nor- 
mal or  above  normal  in  only  S of  18  winter  wheat 
crop  rations.  Yields  steadily  improved,  Iwwever, 
beginning  in  May  and  continuing  throuth  to  harvest. 
By  the  final  yield  model  truncation  in  July,  9 of  the 
18  winter  wheat  crop  regions  registered  above-nor- 
mal  yield  indications.  In  contrast  to  the  subnormal 
winter  moisture  conditions,  heavy  May  rainfall 
proved  detrimental  to  yield  in  the  more  northerly 
areas,  notably  in  the  Baltics  (Estonia,  Latvia, 
Lithuania,  and  Kaliningrad)  and  Bclorussia.  Farther 
south  along  the  Black  Sea  in  the  lower  portion  of  the 
Ukraine  and  Krasnodar,  lack  of  April  precipitation 
held  yields  down.  A combination  of  preeeason 
moisture  stress  and  subnormal  temperatures  through 
March  led  to  severe  winterkill  problems,  primarily  in 
the  eastern  Ukraine  and  (owe.  Volga  regions,  where 
subnormal  temperatures  lowered  prospective  yields 
an  overage  of  5 quintals  per  hectare. 

Winter  wheat  yields  throughout  the  remainder  of 
the  central  and  western  European  USSR,  fared  ex* 
ceptionally  well  in  1976,  especially  in  the  prime  pro* 
duction  regions  of  the  Csucasus  and  middle  Volga 
area,  where  output  per  hectare  ranged  up  to  5 quin* 
tals  above  normal.  Yield  model  calculations  indicate 
that  Moldavia,  although  small  in  terms  of  total 
winter  wheat  production,  ranked  above  all  other 
Soviet  regions  in  I >76—38.0  quintals  per  hectare,  or 
10  percent  above  normal  expectations. 


Spring  Whoat  Inventor  Region 

General  weatherkrop  situation. — Because  of  a 
lingering  d-  ought,  the  central  USS.R.  spring  wheat 
region  had,  it  harvest  time  in  1975,  drawn  Mil 
moisture  reserves  down  to  a critically  tow  level. 
Much  hope  was  therefore  given  to  a plentiful 
snowfall  during  the  winter  months  Winter  and  early 
spring  moisture,  however,  wu  spotty,  with  precipita- 
tion amounts  generally  less  than  normal. 

Rain  and  snowfall  rams  continued  to  be  tow  dur- 
ing April,  and  although  May  brought  increased  pre- 
cipitation to  most  spring  wheat  areas,  dry  pockets 
were  widening  in  parts  of  the  Ural  Mountains. 
Kazakhstan,  and  western  Siberia.  Timely  early  sum- 
mer rains  really  turned  spring  crop  prospects  around 
and  replenished  dwindling  moisture  supplies  in  most 
of  the  New  Lands. 

Summer  temperatures,  averaging  1*  to  2*  C below 
normal  north  of  Kazakhstan,  aided  crop  develop- 
ment. Cooler  temperatures  as  harvest  approached 


slowed  maturity  in  western  Siberia,  leaving  crops 
vulnerable  to  an  early  frost;  however,  no  damage 
materialized.  Harvest  weather  was  most  favorable  in 
late  August  and  September,  although  seasonal 
showers  interfered  with  harvesting  operations  in  the 
eastern  portion  of  the  spring  wheat  area. 

Yield  analysis.— Similar  to  the  yield  estimates  for 
the  1976  winter  wheat  crap,  the  spring  wheat  yield 
indications  were  below  normal  at  the  May  planting 
time  because  of  adverse  praseason  precipitation  con- 
ditions. Yield  estimates  tended  to  support  reported 
soil  moisture  conditions;  all  seven  yield  models  in 
the  spring  wheat  indicator  region  indicated  below* 
normal  prospects.  Areas  to  the  west  along  the  Volga 
River  in  the  mixed  spring  and  winter  grain  regions 
gave  somewhat  higher  yield  predictions  throughout 
the  year,  although  unseasonably  abundant  rainfall 
detracted  from  yields  in  the  later  stages  as  harvest  ap- 
proached. At  the  end  of  the  season,  about  one-third 
of  the  spring  wheat  model  strata  estimated  above- 
normal yields,  all  but  one  stratum  lying  outside  the 
indicator  region. 

The  overall  seasonal  impact  of  weather  on  pre- 
dicted Soviet  yields  wu  somewhat  mixed.  Within 
the  indicator  area  surrounding  the  New  Lands,  yield 
model  estimates  were  off  u much  u 5 or  6 quintals 
per  hectare  from  n ormal  trends  in  western  Siberia 
primarily  because  of  severe  moisture  problems  oc- 
curring through  April.  Conversely,  the  yields  in 
Kokchetav  and  Severo-Kazakhstan  (Masts  were 
nearly  2 quintals  (or  16  percent)  above  trend  because 
cumulative  precipitation  from  May  through  July  ex- 
ceeded normal  precipitation  by  60  millimeters.  This 
region  posted  the  highest  1976  raring  wheat  yield  in 
the  Soviet  New  Lands— 12.4  quintals  per  hectare, 
followed  by  the  near-normal  10.5-quintal-per-hectare 
yield  in  the  northeastern  Urals  sector. 


SntlmntM 

Winter  wheat  Indicator  region.— Five  reports  were 
generated  for  the  winter  wheat  indicator  region  in 
the  I97$*76  crop  year.  The  area  and  production  esti- 
mates showed  consistent  significant  increases  with 
each  succeeding  report,  except  for  the  May  report 
ifig.  2).  No  addinonai  spectral  data  were  available 
from  those  used  in  the  April  report;  therefore,  the 
May  area  estimate  remained  the  same  as  the  pre- 
vious estimate.  However,  yield  estimates  were 
reduced,  resulting  in  a decrease  in  the  May  produc- 
tion estimate  from  the  April  level.  The  decreases  in 


FIGURE  2.— Phase  II  production,  area,  and  yield  estimates  of  the  U.S.S.R.  winter  wheat  indicator  region.  The  U.S.S.R.  actual  figures 
are  derived  from  multiple  Soviet  publications  because  the  U.S.S.R.  does  not  publish  production,  area,  and  yield  statistics  for  an  area 
coincident  with  the  indicator  regions. 


the  yield  estimates  were  not  uniform  for  the  region; 
therefore,  the  reduction  in  the  production  estimate 
was  not  directly  proportional  to  the  decrease  in  the 
average  yield  between  the  2 months. 

The  continued  increases  in  area  estimates  resulted 
from  improved  sample  segment  coverage  and  im- 
proved classifications  of  previously  worked  seg- 
ments using  later  acquisitions;  however,  much  of  the 
increase  was  directly  attributable  to  an  inability  to 
separate  wheat  from  other  small  grains  and  in- 
directly to  the  ratioing  technique  used.  LACIE  did 
not  envision  the  necessity  of  using  a ratio,  but,  as  the 
experiment  progressed,  it  became  apparent  that  the 
capability  of  distinguishing  wheat  from  other  small 
grains  in  foreign  areas  was  not  developed  within 
LACIE.  The  software  for  the  aggregation  was 
designed  specifically  for  wheat  so  that  a ratio  seemed 
the  logical  solution  to  the  problem.  A ratioing 
scheme  was  developed,  using  an  averaging 
methodology,  which  appears  to  function  satisfac- 
torily in  normal  years  but  which  must  be  modified  to 
perform  well  in  years  such  as  1975-76  for  the 
U.S.S.R.  winter  wheat  area.  Several  intervening  fac- 


tors that  year  adversely  affected  the  use  of  ratios  in 
developing  percentage  of  wheat  from  percentage  of 
small  grains;  e.g.,  changes  in  sown  area  and  abnormal 
amount  of  winterkill.  In  retrospect,  it  is  evident  that 
two  mqjor  classification  errors  were  made.  First,  fall- 
sown  small  grains  were  classified  when  in  reality 
spring-sown  small  grains  made  up  at  least  part  of  the 
confusion  crops.  Second,  spring-sown  or  total  small 
grains  were  classified  when  crops  other  than  small 
grains  made  up  a part  of  the  confusion  crops.  The 
principal  agronomic  abnormalities  for  that  year  were 
(1)  the  e xpansion  of  the  area  devoted  to  small  grains 
in  the  winter  wheat  area  and  (2)  an  abnormally  large 
area  usually  occupied  by  winter  wheat  replanted  to 
other  small  grains  when  losses  from  poor  germina- 
tion resulted  because  of  a dry  fall  and  severe  win- 
terkill in  certain  areas  of  the  country. 

Spring  wheal  indicator  region. — Three  reports  were 
generated  for  the  spring  wheat  indicator  region  for 
the  1975-76  crop  year.  The  area  and  production  esti- 
mates showed  consistent  and  expected  increases 
with  each  succeeding  report  (fig.  3).  Apparently,  the 
ratioing  technique  was  more  nearly  suited  to  the 
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FIGURE  Phase  II  production,  area,  and  yield  estimates  of  the  U.S.S.R.  spring  wheat  Indicator  region.  The  U.S.S.R.  actual  figures 
are  derived  from  multiple  Soviet  publications  because  the  U.S.S.R.  does  not  publish  production,  area,  and  yield  statistics  for  an  area 
coincident  with  the  Indicator  regions. 


overall  spring  wheat  area  since  no  significant  across- 
the-board  abnormalities  occurred  during  this  crop 
year.  This  was  largely  due  to  the  preponderance  of 
spring  wheat  to  potential  confusion  crops  (i.e.,  other 
small  grains)  in  the  spring  wheat  indicator  region. 
The  principal  problem  with  the  estimate  for  the 
spring  wheat  indicator  region  for  this  year  was  the 
timing  of  the  first  report.  It  was  anticipated  that  this 
report  would  be  released  no  later  than  mid-July. 
However,  because  of  a late  season  and  the  amount  of 
time  between  acquisition  and  receipt  of  data  by  the 
Crop  Assessment  Subsystem  (CAS),  it  was  mid- 
August  before  sufficient  data  were  available  for  sub- 
stantive results. 

Accuracy  of  Estimates 

Because  of  the  reporting  system  used  by  the 
U.S.S.R.,  estimation  of  the  bias  of  estimates  at  the  in- 
dicator region  level  is  not  possible.  However,  esti- 
mates of  precision — coefficient  of  variation  (CV) — 
can  be  projected  to  the  country  level  to  determine 
whether  the  precision  is  sufficient  to  support  ac- 


curacy goals.  The  CV  of  the  production  estimate  at 
the  country  level  must  be  less  than  6.1  percent  to 
support  the  LACIE  90/90  accuracy  goal  (see  the 
paper  by  Houston  et  al.  entitled  “Accuracy  Assess- 
ment: The  Statistical  Approach  to  Performance 
Evaluation  in  LACIE").  Table  III  gives  the  LACIE 
production  estimates  for  the  two  indicator  regions 
with  estimated  CV's  and  projections  to  the  country 
level  for  those  months  for  which  standard  statistics 
were  available  (ref.  1).  These  projections,  which  treat 
the  country-level  estimates  as  having  the  same 
characteristics  as  the  indicator  region  estimates,  indi- 
cate that  the  precision  at  the  indicator  region  levels  is 
more  than  adequate  to  support  the  90/90  accuracy 
goal.  In  fact,  these  projections  indicate  that  a relative 
bias  in  the  country-level  LACIE  production  estima- 
tor as  large  as  4 percent  can  be  tolerated  and  still  sup- 
port the  accuracy  goal. 

The  tendency  to  underestimate  winter  wheat  area 
at  harvest,  as  was  observed  in  the  U.S.  Great  Plains 
(Texas,  Oklahoma,  Kansas,  Colorado.  Nebraska,  and 
South  Dakota)  in  Phase  II,  is  not  indicated  in  the 
U.S.S.R.,  even  though  the  winter  wheat  signatures  in 
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Table  III. — LACIE  Production  Estimates  for  the 
Winter  and  Spring  Wheat  Indicator  Regions 


Month 

LACIE  production 
estimate.  MMV 

cr. 

percent 

O'.6 

percent 

Winter  wheat  indicator  region 

June 

27.8 

7.0 

3.1 

July 

300 

8.0 

3.6 

October 

34.9 

7.0 

3.1 

Spring  wheat  indicator  region 

August 

14.3 

11.0 

4.7 

September 

17.4 

9.0 

3.9 

October 

201 

9.0 

3.9 

^Million  metric  ton*. 
^Country *level  projection 


the  U.S.S.R.  were  much  like  those  observed  in  the 
United  States.  If  anything,  a tendency  to  overesti- 
mate is  indicated.  For  some  of  the  drier  areas  in  the 
U.S.S.R.  where  the  wheat  signatures  were  very  weak, 
late-fall  and  early-season  interpretations  of  winter 
grains  were  difficult  because  it  was  hard  to  determine 
whether  anything  was  actually  growing  in  many  of 
the  fields.  This  probably  led  the  analysts  to  confuse 
natural  vegetation  and  wheat  in  these  areas  and 
hence  to  tend  to  overestimate.  Spring  wheat  sig- 
natures for  the  U.S.S.R.  did  not  appear  as  strong  as 
those  for  the  United  States;  however,  the  spring 
wheat  fields  in  the  U.S.S.R.  were  much  larger.  Indica- 
tions arc  that  the  spring  wheat  area  estimates  at  the 
segment  level  were  much  better  than  those  in  the 
United  States  because  of  fewer  confusion  crops, 
minimal  strip-fallow  cropping  practices,  larger  fields, 
and  more  stable  year-to-year  ratios  of  spring  wheat  to 
small  grains  (resulting  from  more  stringent  govern- 
ment controls). 

Although  the  LACIE  yield  estimates  did  not  ap- 
pear to  vary  much  at  the  indicator  region  levels,  pre- 
vious discussion  indicates  that  the  yield  varied  con- 
siderably from  crop  region  to  crop  region  in  both  the 
winter  and  spring  wheat  regions  because  of  weather 
conditions.  The  LACIE  yield  models  apparently 
tracked  this  variability  reasonably  well  but  tended  to 
underestimate.  This  variability  illustrates  the  need  to 
track  both  area  and  yield  at  the  crop  region  level  in 
order  to  obtain  reliable  indicator  region  and  higher 
level  production  estimates. 


Technical  leeuee  and  Problem* 

Selection  of  the  indicator  regions. — Because  of 
resource  limitations,  it  was  impossible  to  work  the 
U.S.S.R.  wheat-producing  area  in  its  entirety  during 
Phase  II.  Therefore,  it  was  decided  to  work  indicator 
regions  in  the  winter  and  spring  wheat  areas  (see  fig. 
1)  that  would  be  representative  of  the  respective 
areas.  The  selection  of  regions  was  indeed  represent- 
ative. The  winter  wheat  indicator  region  contained 
approximately  70  percent  of  the  area  producing 
winter  wheat,  which  accounts  for  approximately  82 
percent  of  the  production.  The  spring  wheat  indica- 
tor region  contained  only  40  percent  of  the  total  area 
devoted  to  spring  wheat,  which  accounts  for  approx- 
imately 37  percent  of  the  total  production  but  covers 
the  major  part  of  the  high-risk  or  “swing”  area  pro- 
ducing spring  wheat.  The  unfortunate  aspect  of  this 
selection  is  that  the  U.S.S.R.  does  not  release  produc- 
tion, area,  and  yield  data  in  a manner  that  allows 
comparison  with  the  LACIE  estimate.  From  the 
standpoint  of  accuracy  assessment,  a better  winter 
wheat  indicator  region  might  have  included  the 
Baltics,  Belorussia,  and  the  Ukraine  to  coincide  with 
U.S.S.R.  releases. 

Sampling. — The  U.S.S.R.  wheat  sampling  density 
was  based  on  total  area  and  wheat  density  in  order  to 
achieve  the  LACIE  90/90  goal  (to  be  within  10  per- 
cent of  the  actual  crop  90  percent  of  the  time  on  the 
average).  However,  the  sampling  strategy  was  based 
on  1971  data  that  were  not  fully  representative  of  the 
1975  situation;  i.e.,  shifts  from  wheat  to  barley  and 
increases  in  cultivated  areas  were  not  reflected.  This 
resulted  in  an  oversample  of  some  areas  and  an  un- 
dersample of  other  areas.  The  sample  consisted  of  5- 
by  6-nautical-mile  sample  segments  randomly  placed 
in  agricultural  areas  as  determined  by  interpretation 
of  Landsat  images.  Landsat  data,  however,  were  not 
available  over  the  entire  area.  Where  spectral  data 
were  missing,  historical  data  were  used;  however,  the 
historical  data  again  were  not  truly  representative  of 
the  current  situation  and  resulted  in  appr  'ximately 
35  percent  of  the  sample  segments  being  placed  in 
nonagricultural  locations. 

Historical  data. — Much  of  the  initial  pi  nning  for 
the  LACIE  procedures  was  based  on  the  United 
States,  and  the  inconsistencies  or  lack  of  needed 
historical-statistical  data  in  foreign  areas  necessitated 
numerous  “workarounds”  in  the  initial  stages  of 
Phase  II  for  the  U.S.S.R.  Although  the  ll.S.S.R.  re- 
ports of  statistical  data  on  agriculture  are  massive,  an 
attempt  to  develop  specific  data  for  the  entire  coun- 
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try  or  to  establish  relatively  long-term  trends  (S  to  10 
years)  for  specific  values  usually  leads  to  frustration 
and  to  the  derivation  of  data.  Across-the-board 
statistical  data  at  the  oblast  level  are  normally  not 
available,  and  if  available,  are  usually  inconsistent. 
Thus,  data  for  some  oblasts  may  be  reported  as 
winter  grains  and  spring  grairs  or  total  grains, 
whereas  others  may  define  wheat  (winter  and 
spring),  rye,  and  barley  (winter  and  spring). 

Another  constraint  in  working  UiLS.R.  data  is 
that  most  official  releases  below  the  country  level  are 
at  least  2 years  old.  As  described  previously,  the  sub- 
country data  are  inconsistent,  and  considerable  work 
had  to  be  done  to  derive  representative  data  for  the 
needed  political  level,  the  oblast  (no  attempt  was 
made  to  work  at  the  county  (rayon)  level).  It  is  im- 
possible to  analyze  the  results  in  the  U.S.S.R.  in  a 
timely  manner  because  complete  data  (production, 
area,  and  yield  estimates)  are  not  released  by  the 
U.S.S.R.  during  the  crop  year  and  because  produc- 
tion, area,  and  yield  data  at  the  country  level  are  not 
released  until  the  spring  following  the  crop  year  of 
interest. 

Landsat  collection.— The  acquisition  rate  of  usable 
spectral  data  for  the  European  U.S.S.R.  was  26  per- 
cent lower  than  the  rate  for  the  United  States  (see  ta- 
ble 1)  and  Canada  because  of  haze  and  cloud  cover  in 
the  fall,  snow  cover  in  the  winter,  and  rainy  weather 
in  the  spring.  The  weather  is  a major  constraint  in  an 
accurate  inventory  in  that  it  is  quite  possible  that 
only  one  acquisition  will  be  obtained  during  an  entile 
crop  year  for  a given  sample  segment.  If,  for  exam- 
ple, the  acquisition  is  obtained  early  in  the  crop  year 
(in  the  September-October  time  frame)  and  the 
ground  cover  is  not  sufficient  for  accurate  evaluation 
and  classification,  the  resulting  estimate  will  only 
reflect  a pan  of  the  actual  crop.  This  estimate  will  be 
carried  for  the  entire  crop  year  and  will  result  in  a 
definite  downward  bias. 

The  spring  wheat  acquisition  rate  for  the  New 
Lands  is  better  except  for  July.  This  is  not  normally  a 
critical  time  for  data  collection  in  this  area  because 
the  key  biostages  (emergence,  tillering,  jointing, 
heading,  and  turning)  generally  occur  outside  this 
time  frame.  However,  if  the  crop  is  either  delayed  or 
accelerated  significantly,  acquisitions  for  this  time 
would  be  crucial  for  an  accurate  analysis. 

Classification  of  Landsat  data. — One  or  the  most 
critical  issues  for  the  LACIE  was  the  ability  to  dis- 
tinguish between  crops  using  the  Landsat  data.  With 
key  acquisitions,  it  was  possible  to  separate  small 
grains  from  row  crops,  hay,  and  improved  pastures. 


but  LACIE  never  achieved  the  capability  of  dis- 
tinguishing wheat  from  other  small  grains.  This  in- 
ability led  to  the  ratioing  technique  discussed  earlier 
in  this  paper.  The  ratioing  technique  leaves  much  to 
be  desired,  especially  in  the  U.S.S.R.  where  the  age 
and  completeness  of  available  data  make  it  difficult, 
if  not  impossible,  to  reflect  the  current  situation. 

In  retrospect,  and  from  a commodity  analyst's 
point  of  view,  it  might  have  been  a simpler  and  more 
accurate  operation  if  the  project  had  chosen  to 
classify  small  grains  and  if  the  crop  analyst  had  been 
free  to  develop  a ratioing  scheme  at  the  country  or 
regional  level.  This  approach  could  possibly  have 
allowed  more  time  for  research  and  development  on 
crop  separability  techniques  by  the  classification 
component. 

Early-season  estimates. — The  image  analysts’  pro- 
cedures during  Phase  II  dictated  that  only  an  area 
showing  emergence  or  growth  of  small  grains  be  so 
classified.  Therefore,  if  weather  conditions  inter- 
rupted the  seeding  of  large  areas  and  plant  growth 
was  observed  in  only  parts  of  fields  at  the  time  of 
Landsat  data  acquisition,  the  resulting  area  classifica- 
tion would  be  low.  If  no  subsequent  acquisitions  for 
that  sample  segment  were  obtained,  this  bias  would 
remain  in  the  system  throughout  the  crop  year. 

Yield  estimates. — Yield  estimates  were  obtained 
from  mathematical  regression  yield  models  operated 
by  NOAA  CCEA  at  Columbia,  Missouri.  These 
models  were  developed  using  monthly  climatic 
historical  yield  data.  Deviations  from  normal  tem- 
perature and  normal  precipitation  (or  a combination 
of  the  two  as  expressed  in  terms  of  potential  and  ac- 
tual evapotranspiration)  result  in  additions  or 
subtractions  from  a predicted  trend  yield.  Models  re- 
quiring monthly  accumulations  of  meteorological 
data  are  limited  in  their  capability  to  respond  to 
abrupt  weather  extremes,  which  can  rapidly  alter  the 
condition  of  the  crop  over  short  periods  during  criti- 
cal stages  of  crop  development.  The  regression-type 
yield  model  is  also  somewhat  limited  in  its  capability 
to  respond  fully  to  the  impact  of  either  abnormally 
good  or  abnormally  bad  years.  Awareness  of  this 
limitation  makes  good  analyst  judgment  paramount 
when  using  yield  model  predictions.  Experience  dur- 
ing Phase  II  has  shown  that  regression  yield  models 
constantly  responded  in  the  proper  direction  to  the 
weather  phenomenon  experienced.  Trend-term  ad- 
justments and  some  reselection  of  weather  variables, 
it  was  later  shown,  improved  the  predictive 
capability  of  the  U.S.S.R.  yield  models  after  Phase  II. 
Since  more  sophisticated  phenological  modeling  type 
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techniques  were  not  developed  for  operational  ap- 
plication,  no  alternative  to  using  the  regression  yield 
model  was  available  during  LACIE. 

Cultural  practices.— Two  cropping  practices  used 
in  the  U.SJ5.R.  complicated  the  analysis  of  its  wheat 
situation.  The  more  significant  of  the  two  is  the  cut- 
ting of  small  grains  for  “green  chop”  (harvesting  at 
heading  or  the  soft  dough  growth  stage  for  animal 
feed).  Therefore,  if  the  last  spectral  data  received  for 
a given  area  were  at  the  jointing  stage  and  a third  of 
the  small  grains  crop  was  cut  for  “green  chop,”  the 
LACIE  final  estimate  would  be  biased  upward. 
Although  analytical  work  has  not  been  done  on  this 
section,  it  is  suspected  that  it  is  a major  factor  in  the 
LACIE  at-harvest  estimate  of  winter  wheat  area 
because  the  practice  is  more  prevalent  in  the  winter- 
wheat-producing  area. 

The  second  practice  is  overseeding.  Poor  stands  of 
winter-sown  grains  (resulting  from  poor  germina- 
tion, winterkill,  or  related  factors)  are  normally  over- 
seeded; i.e.,  another  grain  (normally  barley)  is  sown 
over  the  existing  crop.  Thus,  the  result  is  a wheat- 
barley  or  rye-barley  mix  at  harvest.  If  it  is  a wheat- 
barley  mix  and  the  farm  is  short  in  delivering  its 
wheat  quota,  the  mix  can  be  delivered  as  wheat  at  a 
discount  depending  on  the  percentage  of  barley  in 
the  wheat.  If  this  type  of  operation  is  relatively 
widespread,  it  will  bias  the  yield  downward  because 
barley  yield  is  normally  lower  than  wheat. 

U.S.S.R.  reporting  of  statistical  agricultural  data. — 
Besides  the  delay  and  inconsistency  in  reporting,  the 
U.S.S.R.’s  report  of  “bunker  weights”  is  also  a prob- 
lem. Bunker  weight  is  the  weight  straight  from  the 
field  with  no  corrections  made  for  dockage  or 
moisture  content.  In  years  of  favorable  climatic  con- 
ditions, these  “bunker  weights”  could  be  fairly  repre- 
sentative of  the  nutritive  or  feeding  value  of  the  har- 
vest; however,  in  years  of  poor  weather  conditions, 
these  values  may  be  inflated  unrealistically  because 
of  high  moisture  content,  trash,  sprouting,  disease, 
etc. 


PHASE  III:  CROP  YEAR  1976-77 


Scope 

The  level  of  activity  for  the  U.S.S.R  vas  increased 
from  indicator  regions  in  Phase  11  to  total  country 
coverage  in  Phase  III.  This  coverage  automatically 
increased  the  segment  workload  from  747  to  1947 


segments,  a significant  impact  on  resources. 
However,  the  additional  workload  generated  an  esti- 
mate for  which  comparable  data  (official  U.S.S.R. 
estimates)  were  released  by  the  U.S.S.R.  after  har- 
vest. 


Sampling 

In  Phase  II,  the  sampling  strategy  was  based  on 
total  area  and  wheat  density.  The  Phase  III  sampling 
strategy  had  the  advantage  of  updated  and  more  ac- 
curate historical-statistical  data,  more  and  improved 
spectral  imagery  of  the  entire  U.S.S.R.,  and  the  ex- 
perience gained  in  Phase  II  to  use  as  a base.  The  sam- 
pling strategy  was  revised  for  Phase  III  and  was 
based  on  agricultural  (cropland)  area  and  wheat  den- 
sity. Additionally,  a significant  number  of  sample 
segments  in  nonagricultural  areas  during  Phase  II 
were  relocated  to  agricultural  areas  for  Phase  III. 


Data  Base 

The  data  base  for  Phase  III  was  not  changed  sig- 
nificantly from  the  Phase  II  edition,  with  the  excep- 
tion of  the  historical-statistical  section  where  updates 
and  corrections  were  made. 


LandaatData 

During  LACIE  Phase  III,  the  Landsat  data  for  the 
U.S.S.R.  were  acquired  between  August  5, 1976,  and 
September  15,  1977.  During  the  period  from  August 
5, 1976,  to  November  1,  1977, 8838  acquisitions  (im- 
agery of  a given  sample  segment)  were  examined  by 
the  Classification  and  Mensuration  Subsystem 
(CAMS)  and  the  results  were  reported  to  the  CAS. 
This  number  of  acquisitions  equates  to  an  average  ac- 
quisition rate  (number  of  acquisitions  divided  by 
number  of  sample  segments)  of  4.54  as  the  project  at- 
tempted to  collect  usable  data  on  the  total  allocation 
of  1947  sample  segments.  Although  the  number  of 
acquisitions  (8838)  is  almost  overwhelming  n mag- 
nitude and  may  lead  one  to  believe  that  usaole  data 
were  collected  on  100  percent  of  the  sample  seg- 
ments allocated  to  the  U.S.S.R.,  the  actual  coverage 
of  usable  imagery  amounted  to  only  78  percent  for 
the  entire  season.  In  other  words,  of  the  1947  sample 
segments  allocated  to  the  U.S.S.R.,  usable  data  were 
collected  on  1518  segments,  or  78  percent  of  the 
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allocation.  The  disposition  of  the  remaining  7320  ac- 
quisitions is  accounted  for  by  one  of  the  following. 

1.  The  image  was  not  processed  because  of  clouds 
or  image  data  problems. 

2.  The  image  was  examined  but  not  classified 
because  of  small  grains  preemergence  or  dormancy. 

3.  Several  acquisitions  of  the  same  segment  were 
available  at  the  time  it  was  worked.  The  analyst  had 
the  option  of  selecting  either  the  best  or  the  most 
representative  acquisition  to  process;  he  would  then 
archive  the  remainder. 

4.  The  acquisition  was  processed,  but  the  results 
were  unsatisfactory. 

5.  Multiple  acquisitions  resulted  in  satisfactory 
classifications  for  the  same  sample  segment. 

Two  severe  unanticipated  data  collection  con- 
straints— loss  of  the  Pakistani  ground  station  for  data 
collection  and  failure  to  acquire  early  spring  spectral 
data— curtailed  the  LACIE  data  collection  effort  at 
critical  times  during  the  crop  year.  Undoubtedly, 
both  the  coverage  and  the  acquisition  rate  would 
have  improved  if  the  Landsat  data  had  been  collected 
as  originally  planned. 

The  Landsat  acquisition  history  for  the  U.S.S.R. 
in  Phase  III  (from  August  5,  1976,  to  November  1, 
1977)  is  summarized  in  table  IV.  The  early-season, 
midseason,  and  late-season  periods  have  been  ar- 
bitrarily designated  to  facilitate  the  preparation  of 
this  table.  Large  areas  of  the  wheat-producing  regions 
were  combined  by  predominant  wheat  class  to 
further  expedite  the  data  compilation.  The  inclusive 
dates  for  early-season,  midseason,  and  late-season  ac- 
quisitions as  they  relate  to  areas  defined  on  the  map 
in  figure  4 are  as  follows: 
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in 
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The  exceptions  to  this  legend  are  Chimkent,  Dzham- 
but,  Alma  Ata,  and  Taldy  Kurgan  in  Area  I.  The  ac- 
quisition seasons  (early,  mid,  and  late)  are  divided 
the  same  as  Area  III  because  of  the  similarity  not 
only  in  cultural  practices  but  also  in  acquisition 
dates. 

The  sample  segment  coverage  by  acquisitions  con- 
taining usable  spectral  data  was  the  same  for  all  three 


areas  (see  table  IV).  This,  of  course,  is  an  abnormal 
situation  because  spectral  data  are  acquired  for  Areas 
I and  II  more  than  twice  as  long  as  for  Area  III;  thus, 
the  possibility  of  acquiring  equal  coverage  in  Area  III 
under  normal  circumstances  is  remote.  The  factors 
driving  this  skewed  coverage  are  as  follows: 

1.  A wet  tall  and  an  early  winter  in  the  winter 
wheat  areas  with  cloud  cover  much  of  the  time  and 
early  dormancy 

2.  Failure  to  acquire  spectral  data  over  European 
U.S.S.R.  at  the  prescribed  time  in  the  spring  (The 
first  acquisitions  were  requested  for  March  but  actual 
collection  was  not  begun  until  about  May  1.) 

3.  A wet  spring  and  summer  in  European 
U.S.S.R.  with  accompanying  clouds 

4.  Unusually  favorable  climatic  conditions  for 
spectral  acquisitions  over  the  spring  wheat  area;  i.e , 
insignificant  rainfall  in  June  and  most  of  July  and 
minimal  cloud  cover 

These  conditions  complicated  the  analysis  of  im- 
agery for  the  European  U.S.S.R.  because  most  of  the 
early-season  data  were  acquired  extremely  early  in 
the  season  (August/September  1976)  and  in  many 
cases  reflected  partial  estimates.  This  led  to  the 
assumption  that  the  very  early  acquisitions  did  not 
reflect  the  actual  extent  of  the  area  devoted  to  fall- 
sown  small  grains. 

The  spectral  data  coverage  of  the  New  Lands  was 
much  more  straightforward;  acquisitions  were  timely 
and  the  analysis  of  the  spectral  data  was  an  improve- 
ment over  that  of  the  early-season  winter  small 
grains.  (The  procedures  were  adjusted  to  avoid  the 
problems  encountered  in  the  analysis  of  early-season 
fall-sown  grains.) 

Spectral  coverage  during  the  season  differed  by 
wheat  class  (see  table  IV).  The  most  complete 
coverage  occurred  in  the  early  season  in  the  spring 
and  winter  areas,  the  next  best  coverage  occurred  in 
midseason,  and  the  late-season  coverage  decreased 
rather  sharply.  The  coverage  of  the  spring  and  winter 
wheat  or  mixed  area  showed  the  opposite  situation; 
coverage  in  the  early  season  and  midseason  was  ap- 
proximately equal  at  one-third  of  the  allocation  and 
improved  to  one-half  in  the  late  season. 

The  spectral  coverage  for  the  three  areas  (see  fig. 
4)  ranged  between  64  and  86  percent  for  the  entire 
year,  except  for  the  Transcaucasus  (12  percent), 
central  Asia  (43  percent),  and  the  Northwest  (0  per- 
cent). The  central  Asia  problem  was  due  to  a failure 
of  the  Pakistani  ground  station.  This  station  was  used 
to  relay  most  of  the  spectral  data  acquired  over 
central  Asia  in  an  effort  to  economize  on  use  of  the 
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Table  IV. — Phase  III  U.S.S.R.  Landsat  Data  Review 


! Percentage  of  allocated  sample  segments  for  which  usable  spectral  data  were  received, 
by  crop  season  and  wheat  class  / 


Region 

Sample  segment 
allocation 

Percent  coverage0 

Enthe  crap 

Seasoifi 

year 

Early  ‘ Mid  Late 

Winter  wheal ^ 


l.  Baltics 

25 

64 

12 

36 

24 

2.  Belorussia 

42 

79 

67 

71 

26 

3.  Southwestern  Ukraine 

117 

81 

40 

54 

37 

4.  Donets-Dnieper 

130 

72 

60 

32 

40 

Southern  Ukraine 

70 

94 

83 

77 

13 

6.  Moldavia 

16 

94 

88 

69 

38 

7.  Northern  Caucasus 

153 

83 

47 

37 

59 

1$.  Transcaucasus 

26 

12 

6 

7 

2 

16.  Central  Asia 

23 

43 

13 

26 

30 

Total 

602 

78 

52 

47 

38 

Winter  and  spring  wheal * 


8.  Central  Chernozem 

101 

69 

30 

12 

47 

9.  Central  non-Chernozem 

103 

75 

17 

37 

51 

15.  Volga  Vyatsk 

34 

74 

29 

24 

47 

16.  Volga 

325 

83 

43 

39 

55 

Total 

563 

78 

35 

33 

52 

Spring  wheal 

12.  Urals 

141 

91 

70 

48 

42 

13.  Kazakhstan 

436 

71 

42 

39 

20 

14.  Western  Siberia 

158 

86 

59 

42 

20 

17.  Eastern  Siberia 

36 

72 

19 

67 

31 

18.  Far  East 

to 

80 

60 

20 

70 

19.  Northwest 

1 

0 

0 

0 

0 

Total 

782 

78 

50 

42 

25 

Country  total 

1947 

78 

46 

41 

37 

*Thc  figure  for  ihc  Cdrl> . mij,  vmj  leu  aesutn  voveugev  w ill  nut  adit  10  crop-veer  coverage  hccauvc  dele  were  acquired 
fur  Ihc  >wmc  segment  in  «il  three  ^eavom  in  mum  cnc* 

‘'inclusive  date*  for  early,  mid.  and  late  seasons  arc.  for  winter  wheat,  from  Aug  1976  to  May  Jl.  1977,  lor  winter  and 
spring  wheal  (mixed),  from  Aug  1976  lu  May  Jl,  1977.  and  for  spring  wheal,  from  Apr  15  io  July  31,  1977. 
‘■predominantly  mixed  area 


satellite  recorder.  When  the  station  malfunctioned, 
much  of  the  data  for  this  area  was  lost.  The 
Transcaucasus  problem  was  more  complicated.  In  a 
plot  of  the  areas  for  which  the  U.S.S.R.  spectral  data 
could  be  "dumped"  to  the  Pakistani  and/or  Italian 
station,  the  Transcaucasus  was  located  in  an  overlap 


area:  i.e.,  both  stations  would  receive  the  spectral 
data  for  this  area.  As  Phase  III  progressed,  it  was 
determined  that  the  strength  of  the  stations  was 
weaker  than  anticipated  and  that  no  data  or  at  best 
sporadic  bits  of  data  were  being  received  for  the 
Transcaucasus.  The  Northwest  coverage  was  yet 
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FIGURE  4.— Principal  wbeat-pro4ecin«  areas  of  (be  U.S.S.R.  by  predamiaaat  wheal 


another  problem;  cloud  cover,  snow  cover,  and  the 
segment's  location  on  the  fringe  of  the  collection  area 
combined  to  negate  efforts  to  acquire  usable  data  for 
this  area. 

The  experience  in  Phases  U and  III  led  the 
U.S.S.R.  CAS  analyst  to  believe  that  a 30-percent 
coverage  of  the  allocation  (Phase  III  allocation  of 
1947  sample  segments)  of  good  usable  data 
(classification)  is  sufficient  to  produce  a reliable  esti- 
mate. This  assumption  is  supported  in  part  by  project 
statisticians'  determination  that  the  U.S.S.R.  was 
oversampled  by  approximately  75  percent.  The 
revised  sampling  developed  for  the  U.S.S.R.  to  meet 
the  90/90  criterion  of  production  at  harvest  allocated 
approximately  1 100  sample  segments.  The  1 100-seg- 
mem  allocation  includes  a cloud-cover  factor  based 
on  Phase  III  acquisition  history.  Thus,  the  revised 
sampling  strategy  sample  segment  allocation  exceeds 
the  commodity  analyst's  spectral  data  acquisition  re- 
quirements almost  twofold. 


Yield  Analysis 

Winter  wheat.— The  final  1976-77  LACIE  winter 
wheat  yield  prediction  for  the  U.S.S.R.  was  25.6  quin- 
tals per  hectare,  an  increase  of  1 quintal  over  the 
LACIE  Phase  II  (1975-76)  crop  year  estimate  of  24.6 
quintals  covering  a reduced  indicator  region.  This  is 
especially  significant  when  one  considers  that  the 
Phase  II  coverage  consisted  mostly  of  the  higher 
yielding  portion  of  the  winter  wheat  area.  The  offi- 
cial U.S.S.R.  countrywide  yield  figure  for  the  1975-76 
crop  year  was  25.9  quintals  per  hectare  for  winter 
wheat.  The  highest  previous  Soviet  yield  was  27.0 
quintals,  a record  set  in  1973;  the  winter  wheat  yield 
average  since  1970  had  been  23.0  quintals. 

Individual  yield  predictions  from  the  21  crop 
regions  in  the  U.S.S.R.  (fig.  5)  are  aggregated  to  ob- 
tain the  countrywide  estimate.  Winter  wheat  yield 
predictions  from  the  agrometeorological  regression 
yield  models  used  in  these  crop  regions  terminate 
with  the  July  truncation,  and  any  revisions  to  the 
combined  country  estimate  since  that  time  are  due  to 
area  adjustment  only. 

The  improvement  in  the  overall  country  yield 
estimate  for  the  1976-77  crop  year  over  the  previous 
year  is  reflected  in  the  individual  yield  stratum 
results.  More  than  three-fourths  of  the  crop  regions 
with  operating  yield  models  in  1976  registered  higher 
yields  in  1977.  The  greatest  impact  from  this  increase 
was  in  the  Ukraine,  where  45  percent  of  the  U.S.S.R. 


winter  wheat  crop  is  produced.  Yields  in  the 
southern  and  eastern  Ukraine  were  more  than  6 
quintals  higher  than  in  1976— the  result  of  mitder- 
than-average  winter  temperatures  and  abundant  rain- 
fall through  the  critical  spring  and  early  summer 
periods.  The  only  setback  in  1977,  compared  with  a 
year  earlier,  occurred  along  the  middle  and  upper 
Volga.  Here,  above-average  April  temperatures  and 
scanty  May  rainfall  kept  yields  as  much  as  7 quintals 
below  the  1976  figure.  Winter  wheat  yields  in  1977, 
compared  to  trend,  also  indicated  a bumper  year. 
Two-thirds  of  the  21  crop  regions  predicted  yields 
above  the  normal,  whereas,  in  1976,  only  slightly 
more  than  half  the  strata  were  forecasting  yields  to 
exceed  trend  predictions. 

Country-level  winter  wheat  yield  estimates  for  the 
U.S.S.R.  appeared  to  be  close  to  other  predictions,  of- 
ficial and  unofficial,  during  Phases  11  and  III.  No  ex- 
act check  can  be  performed,  however,  on  stratum- 
level  accuracy  because  of  a 2-  to  3-year  lag  in  official 
publication  of  regional  data.  Individually  or  on  a 
model-by-model  basis,  the  Caucasus- Volga  winter 
wheat  covariance  model,  covering  adjacent  Crop 
Regions  10  and  17,  may  be  somewhat  suspect.  For 
the  1975-76  and  1976-77  crop  years,  the  model's 
yields  from  the  lower  Volga  (Crop  Region  17)  were 
consistently  below  trend  throughout  the  season, 
whereas  predictions  for  the  northeastern  Caucasus 
(Crop  Region  10)  were  above  the  trend  with  equal 
frequency.  This  could,  of  course,  be  entirely  possible 
given  the  proper  meteorological  conditions. 

Building  for  a covariance  model  requires  pooling 
of  yield  and  climatic  data  over  both  regions  for  the 
data  base  period  from  1958  to  1971.  In  the  operation 
of  the  yield  model,  current  area-specific  weather  is 
then  applied  to  the  individual  crop  regions.  In  pool- 
ing data  for  Crop  Regions  10  and  17  for  the  Novem- 
ber to  March  and  the  April  temperature  variables, 
average  normal  temperatures  of  —2.98°  and  9.17°  C, 
respectively,  are  indicated.  Actual  temperature  data 
for  the  November  to  March  period  show  the  more 
southerly  Crop  Region  10  to  be  above  the  pooled 
average,  whereas  Crop  Region  17.  lying  to  the  north, 
deviates  below  the  pooled  average.  This  is  not 
surprising  because  the  north-south  span  encompass- 
ing the  two  strata  covers  nearly  500  miles  and 
because  temperature  averages  would  normally  be  ex- 
pected to  be  colder  inland  and  farther  north  (table 
V).  Yields  within  the  two  strata  are  affected  accord- 
ingly: Crop  Region  10  upward  because  of  the  above- 
normal temperatures  and  Crop  Region  17  down- 
ward. The  April  temperature  variable  reaction  in  the 
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Table  V. — Temperature  Variables  of  the  Caucasus-  Volga  Winter  Wheat  Yield  Model 


Crap  region  November-March  temperature  April  temperature  Total 


temperature 


Norm  Actual.  Deviation 

Held  impact.  Norm 

Actual. 

Deviation 

Yield  impact. 

variable 

"C  from  norm 

ql/ha 

‘ C 

ftom  norm 

allha 

Impact,  allha 

1975-76 


10.  Northwestern 

-2.80 

-2.0 

+08 

+1.1 

+9.40 

+11.7 

+2.3 

+0.8 

+1.9 

Caucasus 
17.  Lower  Volga 

-2.80 

-5.4 

-2.4 

-5.6 

+9.40 

+9.8 

+.4 

+.2 

-S.4 

1976-77 


10.  Northeastern 
Caucasus 

-2.98 

-0.3 

+2.7 

+1.3 

-9.17 

+11.7 

+2.5 

+1.1 

+2.4 

1 7.  Lower  Volga 

-2.98 

-4.2 

-1.2 

-1.1 

- v.17 

+10.4 

+1.2 

+.6 

-5 

l 


» 


model  shows  effects  similar  to  the  November  to 
March  period  although  yield  impact  is  less  for  the 
single-month  variable.  One  discernible  effect  of  the 
temperature  pooling  over  the  two  regions  is  that  it 
somewhat  destroys  the  use  of  trend  as  a meaningful 
analytical  tool  in  judging  model  performance.  A 
region-constant  correction  of  -2.5  quintals  a;  plied 
in  1977  to  Crop  Region  10  appears  to  bring  the  trend 
yield  into  comparable  range  with  a straight  average 
yield  over  the  1958-71  data  base  period  (table  VI). 
Crop  Region  17  yield,  though,  shows  a wide 
divergence  between  the  trend  term  value  as  shown  in 
the  model  and  the  actual  yield  average  over  the  same 
period.  Model  trend  deviations  would  indicate 
below-normal  yields  in  1976  and  1977;  however, 
yields  were  actually  above  the  14-year  data  base 
average  for  both  years. 

In  Phase  III,  one  other  slight  discrepancy  was  also 
noted  in  Crop  Region  10,  the  northeastern  Caucasus. 
Smoothing  the  November  to  March  temperature 
over  the  entire  crop  region  produced  an  above- 
average  reading  of  2.7°  C.  This  subsequently  con- 
tributed 1.3  quintals  to  the  yield  estimate  for  that 
period.  However,  this  region  was  particularly 
vulnerable  to  winterkill  on  January  4 and  5 when 
temperatures  were  sufficiently  low  and  the  area 
lacked  protective  snow  cover.  Above-normal  tem- 
peratures generally  throughout  the  5-month  Novem- 
ber to  March  period  tended  to  obscure  2 days  of  crit- 
ically low  temperatures.  This  probably  underscores 
even  more  the  need  for  a good  climatic  alarm  system 
rather  than  indicating  a yield  model  defect.  Yield 
monitoring  in  the  Caucasus,  where  potential  win- 


terkill situations  frequently  arise,  is  particularly  cru- 
cial because  the  region  is  a prime  wheat  producer  and 
contributes  heavily  to  overall  production.  The  north- 

Table  VI.— Northeastern  Caucasus  and  Lower  Volga 
Historical  Winter  Wheat  Yields 


Parameter 

Northeattem 
Caucasus 
(Crop 
Region  10) 

Lower  Volga 
(Crop 
Region 
17) 

5-yr  (1960-64)  average,  ql/ha 

15.2 

13.6 

5-yr  (1965-69)  average,  ql/ha 

15.0 

14.3 

5-yr  (1970-74)  average,  ql/ha 

19.9 

19.1 

10-yr  (1965-74)  average,  ql/ha 

17.4 

17.4 

15-yr  (1960-74)  average,  ql/ha 

16.6 

16.0 

1958-71  average.*  ql/ha  

15.8 

14.8 

LACIE  trend  yield.**  ql/ha: 

For  1976  

15.1 

20.1 

For  1977  

15.6 

18.2 

LACIE  Anal  yields,  ql/ha: 

For  1967  

176 

16.0 

For  1977  

202 

17.0 

LACIE  deviation  from  trend. 

percent: 

For  1976  

166 

-20.4 

For  1977  

295 

-6.7 

LACIE  deviation  from  1958-71 

average,  percent: 

For  1976  

114 

8.1 

For  1977  

27.8 

149 

*Thc  tliu  1'imluffn  to  ihc  him  14  imii  uf  ><c!d  d*u  wed  in  mudei  de,elnt>nieni 
t ine!  truncation  Model  developer*  suumed  trend  to  be  zero  over  the  data  bear 
period,  however,  a reptun  constant  correction  of  -50  ql/ha  in  and  - 1 5 ql/hs  m 
1977  it  applied  to  Crop  Region  to 
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eastern  Caucasia  is  the  single  largest  winter* wheat- 
producing  crop  region,  contributing  nearly  1 S percent 
to  the  total  U.S.SR.  inter  wheat  crop  over  the  past 
$ years. 

Spring  wheal.— The  final  LAC1E  yield  prediction 
for  1977  U5.S.R.  spring  wheat  leveled  at  8J  quintals 
per  hectare.  This  estimate  was  considerably  below 
the  1976  official  near-record  output  of  12.4  quintals. 
LAC1E  in  that  year  predicted  10.S  quintals— nearly  2 
quintals  leas  than  the  final  U.&S.R.  figure.  In  1976, 
LACIE  covered  only  a reduced  spring  wheat  indica- 
tor region  lying  primarily  east  of  the  Ural  Mountains. 
On  a strictly  comparative  basis,  the  1977  estimate 
would  be  even  lower  relative  to  the  previous  season 
because  the  normally  higher  yielding  regions  in  the 
west  along  the  Volga  River  are  included  in  the  1977 
estimate.  Extremes  in  U5.S.R.  spring  wheat  yields  in 
this  decade  ranged  from  a high  of  13.S  quintals  in 
1973  down  to  7.0  quintals  in  I97S.  The  average  since 
1970  was  11.$  quintals.  The  1977  LACIF.  estimate 
was  lower  than  for  any  recent  year  except  the  dis- 
astrous 197$  Soviet  spring  crop. 

The  16  U.S.S.R.  yield  models  covering  the  21 
spring  wheat  crop  regions  (fig.  6)  showed  a wide 
range  of  predictions  for  the  1977  season.  Yields  ex- 
ceeding 20  quintals  per  hectare  were  noted  along  the 
western  fringes  of  the  spring  wheat  area  in  the  Black 
Soil  Region.  By  comparison,  a yield  of  less  than  1 
quintal  was  predicted  in  the  more  arid  central  Asia 
section,  where  a deficit  in  precipitation  (more  than 
1$0  millimeters)  minus  potential  evapotranspiration 
(PET)  during  the  spring  dropped  the  forecast  yield 
more  than  7 quintals  below  normal  expectations. 
This  crop  region  is  relatively  insignificant  by  com- 
parison, producing  on  an  average  Iras  than  1 percent 
of  the  total  U.S.S.R.  spring  wheat  output. 

In  contrast  to  the  improvement  shown  in  197? 
winter  wheat  yields  compared  with  those  of  1976,  the 
U.S.S.R.  spring  crop  regions  predicted  a reduction 
when  judged  against  the  previous  year's  above- 
average  crop.  Most  of  the  reduction  occurred  in 
Kazakhstan  and  nearby  oblasts  in  the  southern  Ural 
Mountains  region  (Orenburg  and  Bashkir),  where 
yield  predictions  on  the  average  were  off  by  more 
than  4 quintals  compared  to  the  1976  figures.  Above- 
normal  April  temperatures  were  particularly  harsh  in 
Crop  Regions  22  to  2$,  centered  in  northeastern 
Kazakhstan.  Yields  in  that  one  month  alone  were 
discounted  an  even  2 quintals  across  the  board.  Pre- 
cipitation minus  PET  deficits  in  May  and  June  pre- 
vented any  subsequent  recovery.  This  four-region 


sector  alone  accounts  for  20  percent  cf  the  U.&S.R. 
spring  wheat  annual  production. 

Reduced  crop  prospects  In  the  central  New  Lands 
were  somewhat  offtet  by  Nightly  higher  predicted 
yields  in  the  northeastern  Urals  and  western  Siberia. 
The  Altai  Kray  1977  yield  equaled  its  year-earlier 
mark  of  7.3  quintals.  With  the  exception  of  the  Volga 
Valley,  other  peripheral  spring-wheat-producing 
areas  to  the  west  and  north  fared  better  in  1977.  The 
Volga  Valley  spring  wheat  yield  declined  slightly 
from  its  1976  level  because  of  a preseason  moisture 
deficit;  however,  the  impact  was  generally  leas  on 
spring  wheat  than  was  noted  for  winter  wheat  yields. 

Comparisons  of  1977  spring  wheat  yields  to  trend 
also  tended  to  support  indications  of  a deterioration 
in  yield  compared  with  1976.  Exactly  half  of  the  1976 
crop  regions  predicted  yields  below  the  normal, 
whereas,  in  1977,  more  than  60  percent  of  the  23 
spring  wheat  yield  strata  were  forecasting  yields  to  be 
less  than  trend  predictions. 

No  direct  check  of  yield  model  accuracy  can  be 
performed  at  this  time  on  individual  yield  models 
because  official  Soviet  data  at  the  regional  level  for 
the  1977  season  will  not  be  available  until  1979  or 
1980.  During  Phase  II,  spring  wheat  models  were  pre- 
dicting low— the  LACIE  indicator  region  estimate 
was  nearly  2 quintals  under  the  official  Soviet  figure. 
An  official  estimate  by  the  U5.S.R.  Government  in- 
dicates that,  at  the  country  level,  the  LACIE  Phase 
111  yield  is  0.9  quintal  per  hectare  low.  One  feature  of 
the  northeastern  Kazakhstan  and  Siberia- Altai  spring 
wheat  covariance  models  might  tend  to  bias  the 
LACIE  predictions  on  the  tow  tide.  For  these  two 
models,  covering  the  all-important  Crop  Regions  22 
to  2$  and  26  and  27,  respectively,  the  July  precipita- 
tion-minus-PET  variable  contains  only  the  squared 
departure  from  normal  term  with  a negative  coeffi- 
cient. (Any  combination  of  precipitation  and  tem- 
perature that  produces  anything  other  than  a normal 
precipitation-minus-PET  value  will  detract  from 
yield.)  The  only  way  to  prevent  a dropoff  in  July 
yield  is  for  precipitation  minus  PET  to  be  exactly 
“normal,"  in  which  case  its  contribution  to  vield  is 
zero. 

One  observed  feature  of  the  U.S.S.R.  spring  wheat 
yield  models  is  the  seemingly  large  influence  on 
monthly  yield  change  caused  by  the  trend  and 
weather  coefficients  themselves,  notably  with  the 
July  truncation.  As  a result  of  July  1977  weather  in 
particular.  14  yield  models  showed  a decrease  in  their 
yield  prediction,  whereas  $ registered  gains. 
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However,  coefficient  adjustments  (both  trend  end 
weather)  between  the  June  and  July  truncations 
allowed  the  July  weather  variable  to  be  overridden  in 
10  of  the  14  instances.  It  is  particularly  confusins  to 
the  user  when  analyzing  the  effects  of  weather  to 
find  that,  on  the  whole,  yield-reducing  weather  con* 
ditions  produced  overall  gains  in  yield.  Table  Vil 
provides  a breakdown  of  this  type  of  occurrence  by 
month  through  the  1977  season  for  both  spring  and 
winter  wheat.  The  greatest  frequency  is  associated 
with  the  July  spring  wheat  truncation.  July  is  a par* 
ticularly  crucial  month  for  the  Soviet  spring  wheat 
crop,  and  conflicting  yield  information  could  possi- 
bly lead  to  questions  of  yield  model  credibility.  It 
would  seem  more  desirable  to  hold  all  internal  ad- 
justments in  the  model  constant  and  let  the  weather 
variables  alone  shifl  the  yield  predictions. 


AreaKethnatee 

Although  the  LAC1E  area  estimates  for  the 
U-S.S.R.  spring  wheat  followed  predictable  patterns 
and  yielded  acceptable  results,  the  winter  wheat  esti- 
mates for  the  most  pan  were  unrealistic  to  the  com- 
modity analysts  because  they  urded  to  be  much 
higher  than  historical  information  could  substanti- 
ate. The  principal  components  of  the  area  estimate 
are  sampling  strategy,  aggregation  formulation,  ratio- 


Table  17/. — Spring  and  Winter  Wheat  Yield 
Models  Direction  of  Response  of  Yield  Prediction 
to  Weather  Variables 


Factor*  Monti)  of  truncation  (IV77) 


Mot.  Apt.  May  Jmt  My  Am-  Sept. 


Spring  wheat 


CHO(+)WX(+l 

1 

7 

4 

1 1 

CHO(-)WXf-) 

3 

14 

5 

J 

CHOf+|WX(-l 

4 

1 

10 

1 

CHO(-|WX(+> 

1 

0 

1 

l 

Winter  wheat 


CHG<+)  WXf+l 

7 

II 

7 

7 1 

CHG ( — > WX ( - 1 

1 

3 

i 

2 1 

CHOf+)WX(-| 

0 

6 

2 

0 

CHGf-lWX  (+1 

2 

0 

1 

3 

*CHGmam>icteitnn*.W*»»«ni>m  nwsteiinnar 


mg  techniques,  and  spectral  data  classification.  The 
sampling  strategy  and  aggregation  formulation  are 
considered  adequate,  and  the  ratioing  techniques  and 
data  are  the  best  available,  given  the  currency  and  ac- 
curacy of  Soviet  agricultural  statistics  that  are  re- 
quired inputs  to  sampling  and  aggregation.  This 
opened  the  possibility  of  misclassiflcation  of  spectral 
data.  The  following  paragraphs  address  two  objective 
methods  of  data  selection  or  editing  and  resultant 
estimates  compared  to  initial  procedures  (accepting 
all  spectral  classification?  at  face  value). 

Methods  of  data  selection.— Three  methods  of  data 
selection  (initial,  revised,  and  final)  were  used  to  pro- 
duce separate  sets  of  country  estimates.  This  section 
explains  the  background  and  development  of  each 
method. 

Initial  method:  The  initial  method  of  aggregation 
used  the  entire  population  of  CAMS  aggrega table 
estimates;  i.e.,  the  pure  CAMS  data  were  used  and  no 
data  selection  procedure  was  applied.  It  should  also 
be  pointed  out  that  no  “dummy”  estimates  were 
input.1 

Revised  method:  The  initial  early-season  esti- 
mates tended  to  be  unrealistically  low  because  a por- 
tion of  the  early-season  imagery  was  acquired  before 
all  the  wheat  had  emerged.  To  reduce  the  number  of 
partial-emergence  area  estimates  that  were  gener- 
ated, the  CAS  analysts  determined  the  tillering  date 
for  each  oblast  from  information  obtained  through 
Soviet  newspapers  and  available  meteorological  data. 
Area  estimates  made  before  the  specified  tillering 
dates  were  thresholded  since  these  data  were  ac- 
quired before  full  emergence  and  the  data  were  not 
representative  of  the  total  planted  area.  Historical 
data  for  1971  were  input  into  the  system  to  cover 
those  areas  for  which  Landsat  data  were  not  im- 
mediately available.  The  1971  data  were  replaced  as 
Landsat  data  became  available  later  in  the  season. 

The  LACIE  winter  wheat  area  estimates  in  Phase 
111  followed  the  pattern  set  in  Phase  II;  i.e..  the  best 
estimate  was  obtained  in  the  June-July  time  frame 
but  continued  to  escav; : with  the  receipt  of  addi- 
tional spectral  data.  This  escalation  occurred  pri- 


*The  Phase  III  CAS  software  was  <Jc*i*n«<J  to  that  at  It  . 

Ur  (turn  (oblast)  within  a hum  (stoop  of  oMaau)  had  an  a$- 
grafaisbte  classification  for  a minimum  of  throe  sample  segments 
before  an  estimate  could  he  generated  for  the  tone.  To  resolve  this 
problem,  analys’s  used  historical  data  as  dummy  CAMS  nii- 
mates  to  generate  on  estimate  far  the  tone  in  the  revised  and  final 
asumatmg  methodology.  The  tone  estimate  was  the  actual  histori- 
cal area  figure  for  the  1971  base  year. 
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marily  because  the  first  acquisition  of  Landsat  data 
for  many  segments  was  acquired  during  biostage  4 or 
5 (jointing/heading  and  soft  dough).  The  escalation 
is  shown  in  the  June,  July,  and  August  estimates  (fig. 
7).  The  difference  between  the  initial  and  revised 
estimates  in  figure  7 is  the  result  of  using  biostage  4 
and  S estimates  without  a previous  classification  to 
verify  the  cl  «sification  as  small  grains.  At  this  time 
of  the  year,  crops  such  as  hay,  potatoes,  sugar  beets, 
and  sunflowers  were  in  direct  competition  with  small 
grains  with  respect  to  spectral  detection.  In  many  (if 
not  most)  instances,  it  was  impossible,  given  the  cur- 
rent state  of  the  art,  to  differentiate  between  small 
grains  and  hay  or  row  crops  under  these  circum- 
stances. 

The  ratioing  techoi ; ie  usee?  by  LACIE  was 
developed  to  reduce  the  \- : iter  sn  ail  grains  estimate 
passed  by  CAMS  »o  a winter  wheat  estimate  for  use 
in  the  aggregation.  Although  these  ratios  were 
satisfactory  when  winter  small  grains  were  identifia- 
ble on  the  imagery,  this  process  presented  an  almost 
insurmountable  problem  during  the  spring  and  sum- 
mer when  hay  and  row  crops  had  been  planted.  If  no 
previous  fall  or  early  spring  acquisition  had  been 
passed  for  temporal  separation  of  late  spring  and 
summer  data,  the  current  ratioing  technique  could 
not  compensate  for  the  increased  confusion  due  to 
hay  or  row  crops  and  therefore  led  to  an  upward 
spiral  in  the  succeeding  estimates.  To  correct  for  this 
inability  to  differentiate  between  small  grains  and 
other  crops  using  single  acquisitions  from  specific 
lime  periods  during  the  growing  season,  the  com- 
modity analysts  used  the  revised  method  of  data 
selection.  The  following  criteria  were  used. 

1.  If,  for  a given  segment,  no  acquisition  was  ob- 
tained for  biostage  3 (jointing),  then  acquisitions  ob- 
tained for  biostages  4 (heading)  and  5 (soft  dough) 
would  not  be  used  for  aggregation  until  they  could  be 
processed  multitemporaliy  with  an  acquisition  ob- 
tained for  either  biostage  6 (hard  dough)  or  7 (har- 
vest). 

2.  If,  for  a given  segment,  an  acquisition  was  ob- 
tained only  for  biostage  6 (hard  dough)  or  7 (harvest) 
(i.e.,  no  acquisitions  were  acquired  for  biostages  3, 4, 
or  5),  then  these  data  would  be  excluded  from  the 
estimate. 

Final  method:  The  final  aggregation  technique 
was  developed  in  November  and  December  of  1977 
as  the  annual  report  for  the  U.S.S.R.  was  being  pro- 
duced. Close  study  of  the  revised  approach  revealed 
that,  although  it  resulted  in  a reasonable  country  esti- 
mate, it  distorted  some  regional  estimates  to 


unreasonable  levels  and  did  not  use  all  segment  data 
that  were  available. 

To  understand  the  winter  wheat  overestimation 
problems  encountered  by  LACIE,  one  must  under- 
stand the  plant  growth  cycle  in  the  U.S.S.R.  and  the 
type  and  amount  of  image  data  collected.  In  the 
U.S.S.R.,  winter  grain  growth  in  the  fall  is  usually 
such  that  significant  tillering  does  not  occur  in  all 
growing  areas  before  dormancy.  The  grains  are 
difficult  to  detect  on  satellite  imagery  without  suffi- 
cient ground  cover,  which  usually  occurs  at  the  tiller- 
ing stage.  Therefore,  imagery  of  the  spring  green-up 
period  is  essential  for  an  accurate  winter  grain  esti- 
mate because  confusion  with  other  crops  is  minimal. 

Multitemporal  acquisitions  are  required  to  iden- 
tify and  separate  crops  in  the  Landsat  data.  The 
analyst  interpreter  uses  events  in  the  plant  growth 
cycle  such  as  emergence,  heading,  and  turning  corre- 
lated with  crop  calendars  to  separate  winter  and 
spring  crops  in  the  imagery.  When  imagery  of  the 
critical  events  is  not  acquired  or  is  not  available  to 
the  analyst,  it  is  not  possible  to  separate  winter  and 
spring  crops.  The  separation  becomes  more  exact  as 
more  of  these  events  are  available  to  the  analyst. 

An  analysis  of  Landsat  data  and  the  results  sub- 
mitted to  CAS  by  the  image  analysts  revealed  that 
the  analysts  experienced  difficulty  in  identifying 
winter  grains  in  areas  for  which  limited  Landsat  im- 
agery was  available.  Imagery  was  not  acquired 
because  of  clouds,  processing  problems,  and  collec- 
tion problems.  In  these  areas,  if  a significant  amount 
of  spring  grains  was  present,  the  analyst  had  little  op- 
portunity to  separate  winter  and  spring  small  grains. 
If  the  affected  areas  were  designated  as  winter  wheat 
regions  because  of  a preponderance  of  winter  wheat, 
the  image  analysts  usually  submitted  winter  grain 
estimates.  When  no  acquisitions  were  available  be- 
tween fall  tillering  for  winter  grains  and  jointing  for 
spring  grains,  LACIE  winter  grain  estimates  likely 
contained  both  winter  and  spring  small  grains. 

The  sample  segment  estimates  representing  the 
winter  wheat  were  reviewed  and  the  following  cri- 
teria were  applied  to  resolve  the  uncertainties  created 
by  inadequate  acquisition  histories. 

1.  Sample  segment  winter  grain  estimates  based 
on  Landsat  data  acquired  before  winter  wheat  tiller- 
ing were  not  used  for  aggregation. 

2.  Sample  segment  winter  grain  estimates  based 
on  Landsat  data  acquired  between  winter  wheat 
tillering  and  the  historical  average  beginning  of 
spring  small  grains  jointing  were  used  for  aggrega- 
tion. These  estimates  were  considered  reasonably  ac- 
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FIGURE  7.— Phase  III  winter  wheat  production,  area,  and  yield  estimates  for  initial,  revised,  and  final  aggregations. 
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curate  because  winter  grain  emergence  was  complete 
and  confusion  with  spring-planted  crops  was 
minimal.  No  estimates  based  on  other  critical  ac- 
quisitions were  considered  necessary  to  verify  the  ac- 
curacy of  these  estimates. 

3.  Sample  segment  winter  grain  estimates  based 
on  acquisitions  after  the  historical  beginning  of 
spring  grain  jointing  required  a second  estimate 
based  on  acquisitions  between  winter  wheat  tillering 
and  spring  grain  jointing  to  verify  the  winter  small 
grains  classification  before  it  was  used  in  the  aggrega- 
tion. 

4.  If  a winter  grain  sample  segment  estimate 
satisfied  criteria  2 and  3 above  but  exceeded  the  pre- 
vious estimate  by  an  absolute  5 percent,  it  was  con- 
verted to  a total  grain  estimate. 

For  example,  if  for  a given  segment,  a usable  ac- 
quisi '.ion  (after  tillering)  dated  October  30, 1976,  was 
given  a winter  small  grains  estimate  of  11.0  percent 
and  a subsequent  aggregatable  acquisition  received 


on  June  3, 1977,  was  given  a winter  small  grains  esti- 
mate of  20  percent,  the  20-percent  winter-sown  small 
grains  estimate  would  not  be  included  in  the  aggrega- 
tion. The  20  percent  was  converted  to  a total  small 
grains  estimate  and  then  included  in  the  aggregation. 
The  June  3,  1977,  estimate  of  winter-sown  small 
grains  would  have  been  included  in  the  aggregation  if 
it  had  not  been  greater  than  16  percent,  since  it  would 
have  been  not  more  than  5 absol-  j percentage 
points  above  the  earlier  estimate  of  1 1 percent. 

When  the  criteria  were  applied  to  the  CAMS  esti- 
mates, many  of  the  estimates  that  most  likely  con- 
tained both  spring  and  winter  grains  were  converted 
to  total  grains  estimates.  As  a result,  the  winter 
wheat  to  small  grains  (WW/SG)  ratio  was  applied 
rather  than  the  winter  wheat  to  winter  small  grains 
(WW/WSG).  The  WW/SG  ratio  was  less  than  or 
equal  to  the  WW/WSG  ratio;  therefore,  the  winter 
wheat  ratioed  estimate  was  lower,  as  were  the  final 
aggregated  results.  The  final  approach  was  desirable 
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because  it  avoided  data  elimination  and  hence  im- 
proved the  precision  of  the  aggregated  regional  esti- 
mates. 

Comparison  of  estimates  involving  initial,  revised, 
and  final  methods  of  data  selection. — Winter  wheat, 
spring  wheat,  and  total  wheat  estimates  as  selected 
by  the  three  methods  of  editing  sample  segment  data 
for  aggregation  are  compared  as  follows. 

Winter  wheat:  During  Phase  III,  winter  wheat 
estimates  were  submitted  to  the  commodity  analyst 
for  aggregation  as  soon  as  winter  grains  were 
detected  on  the  Landsat  imagery.  The  early  acquisi- 
tions usually  showed  only  partial  emergence,  and  the 
estimates  did  not  accurately  represent  the  amount  of 
winter  grains  planted.  In  the  U.S.S.R.,  winter  grains 
may  not  complete  tillering  in  all  regions  before  dor- 
mancy occurs.  Tillering  appears  to  be  the  earliest 
stage  in  the  growth  cycle  of  wheat  for  positive  detec- 
tion by  the  LACIE  system  since  this  stage  provides 
sufficient  vegetative  ground  cover  for  identification 
of  plant  life. 

Because  these  early  estimates  were  potentially 
only  partial  estimates  of  the  emerging  winter  wheat, 
aggregated  country  estimates  were  biased  downward 
to  unsatisfactorily  low  values.  To  eliminate  these  low 
estimates,  an  early-season  data  editing  technique  was 
implemented  so  that  segment  estimates  made  before 
tillering  would  not  be  used  for  aggregation.  Figure  7 
shows  the  low  April  production  and  area  estimates 
associated  with  the  initial  aggregations  compared 
with  the  higher  estimates  associated  with  the  revised 
and  final  aggregations  that  use  early-season  data  edit- 
ing. 

As  the  season  progressed,  the  low  early-season 
estimates  were  replaced  by  complete  estimates.  By 
the  time  of  the  July  report,  which  primarily  used 
spectral  data  through  May  15,  the  best  initial  produc- 
tion and  area  estimates  had  been  derived.  The  area 
for  the  revised  and  final  methods  stayed  relatively 
constant  from  April  to  July.  The  increase  in  produc- 
tion for  these  estimates  was  due  to  increases  in  the 
LACIE  predicted  yields  and  was  not  related  to  shifts 
in  area  from  low-  to  high-yielding  areas.  The  initial 
production  estimate  rose  between  April  and  July 
because  of  increases  in  both  area  and  yield. 

After  the  July  report,  the  initial  aggregated  area 
estimates  continued  to  rise  to  a maximum  of  23.8 
million  hectares,  and  the  production  reached  a max- 
imum of  62.1  million  metric  tons  at  the  end  of  the 
season.  These  figures  are  unrealistically  high.  As  the 
winter  wheat  estimates  rose,  CAS  developed  and  im- 
plemented the  revised  aggregation  technique  (de- 


scribed earlier  in  this  paper).  The  revised  method 
succeeded  in  reducing  the  high  initial  country  area 
estimate  to  a reasonable  figure;  however,  the  end-of- 
season  estimate  was  still  slightly  less  than  8 percent 
too  high.  After  July,  the  production  estimate  for  the 
revised  method  continued  to  rise  but  not  at  the  rate 
of  the  initial  estimate. 

With  data  for  the  entire  year  in-house,  the  CAS 
analysts  studied  the  revised  estimates  at  the  regional 
level.  The  regional  estimates  suggested  that,  although 
the  country  area  estimates  were  reasonable,  certain 
regions  that  were  greatly  overstated  in  the  initial  esti- 
mate remained  unrealistically  high,  whereas  certain 
regions  that  were  unreasonably  low  were  reduced 
even  further.  Therefore,  the  final  technique  was 
developed  to  solve  this  problem.  The  deviations  of 
the  end-of-season  results  for  the  final  estimates  of 
area  and  production  compared  with  the  official 
U.S.S.R.  release  were  3.9  and  —6.4  percent,  respec- 
tively. Figure  7 shows  a rise  and  fall  in  area  and  pro- 
duction between  July  and  the  end  of  the  season  at- 
tributable to  CAMS  processing  procedures.  CAMS 
was  required  to  backlog  winter  wheat  data  after  July 
so  that  spring  wheat  could  be  processed.  Not  until 
October  and  November  were  significant  amounts  of 
late-season  imagery  (biostages  6 and  7)  processed. 
The  significant  departure  in  production  between  the 
revised  and  final  approaches  is  partly  due  to  less  area 
in  the  revised  method,  but  it  is  primarily  due  to  the 
final  procedure's  correction  of  higher  yielding  areas. 

Spring  wheat:  The  spring  wheat  revised  approach 
is  identical  to  the  initial  approach,  except  that  histori- 
cal data  (six  segments)  have  been  added  to  Tyumen' 
and  the  Northwest,  enabling  estimates  to  be  gener- 
ated for  these  two  zones.2  The  spring  wheat  final  ap- 
proach used  the  same  data  selection  criteria  as  were 
used  for  the  winter  wheat  final  approach.  The  addi- 
tional historical  data  for  Tyumen'  and  the  Northwest 
are  included  in  the  final  estimate. 

The  August  estimates  for  area  and  production  are 
low  because  sufficient  spectral  data  had  not  been 
processed  to  adequately  estimate  the  entire  spring 
wheat  crop  by  the  August  cutoff  date.  The  August  ag- 
gregation contained  estimates  of  only  34  percent  of 
the  1416  allocated  spring  wheat  segments,  and  many 
of  these  estimates  were  low  because  of  incomplete 
emergence.  As  the  season  progressed,  these  low  esti- 


2Under  Phase  111  software,  a zone  would  receive  estimates 
only  if  at  least  one  stratum  within  the  rone  contained  a minimum 
of  three  segments  with  aggregatable  acquisitions. 
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mates  were  replaced  by  later  estimates  reflecting  the 
actual  area  or  extent  of  the  small  grains  with  addi- 
tional  segments  also  being  added  to  the  final  report. 
The  end-of-season  report  contained  estimates  on  69 
percent  of  the  1416  segments  and  provided  a best 
estimate  (final)  of  41.4  million  hectares  and  36.3 
million  metric  tons  (fig.  8).  These  estimates  deviated 
from  the  official  U.S.S.R.  announcement  by  0.2  and 
—9.5  percent,  respectively. 

Total  wheat.  The  total  wheat  aggregation  is  simply 
a combined  report  of  the  winter  and  spring  estimates 
for  August  through  the  end  of  the  season.  The  total 
wheat  initial  area  estimates  (fig.  9)  are  about  5 
million  hectares  less  in  August  than  at  the  end  of  the 
season.  This  is  due  to  low  early-season  spring  wheat 
estimates  Since  the  winter  wheat  area  only  varied  by 
400000  hectares  during  this  same  time  period.  The 
spring  wheat  area  gained  4.5  million  hectares  from 
August  until  the  end  of  the  season.  The  initial  pro- 
duction estimates  level  off  from  September  until  the 
end  of  the  season  when  the  low  early-season  spring 
wheat  estimates  were  corrected.  Production  for 


spring  wheat  and  winter  wheat  varied  only  700000 
and  400000  metric  tons,  respectively,  after  the  Sep- 
tember report. 

The  revised  and  final  approaches  give  lower  area 
and  production  estimates  than  the  initial  approach 
because  of  reduced  winter  wheat  estimates  during 
this  time.  These  approaches  and  their  effects  on 
winter  wheat  have  been  previously  discussed.  Again, 
the  rise  and  fall  in  the  production  figures  by  the  final 
approach  are  due  to  the  backlog  of  winter  wheat  data 
during  the  season  with  late-season  data  being  proc- 
essed only  in  October  and  November.  This  is  dis- 
cussed further  in  the  conclusions  section.  The  best 
area  and  production  estimates  for  total  wheat  are  the 
end-of-season  final  estimates  of  62-9  million  hectares 
and  91.4  million  metric  tons,  which  differ  from  the 
actual  estimates  as  announced  by  the  U.S.S.R.  (62.0 
million  hectares  and  92.0  million  metric  tons)  by  1.5 
and  0.7  percent,  respectively. 

Estimates  based  on  30-day  turnaround  time. — To  il- 
lustrate the  importance  of  spectral  data  timeliness, 
all  Phase  III  monthly  estimates  were  recomputed 
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FIGURE  8. — Phase  III  spring  wheal  production,  area,  and  yield  estimates  for  initial,  revised,  and  final  aggregations. 
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FIGURE  9. — Phase  III  total  wheat  production,  area,  and  yield  estimates  for  initial,  revised,  and  final  aggregations. 


from  a 30-day  delay  average  rather  than  from  the 
real-time  average;  figures  10  to  12  show  the  results. 
The  30-day  delay  data  provide  a more  logical  esti- 
mate curve  because  (1)  the  yield  data  are  syn- 
chronized with  the  spectral  data  (both  30  days  before 
the  report  date)  as  opposed  to  the  real-time  situation 
where  the  most  current  spectral  data  are  normally  at 
least  30  days  older  than  the  yield  data,  and  (2) 
spectral  data  were  aggregated  chronologically  as  they 
were  acquired  (rather  than  backlogging  data). 


Accuracy  of  Estimates 

The  LACIE  wheat  production  estimates  for  the 
U.S.S.R.  are  presented  in  table  VIII  with  statistics 
and  comparison  data.  Four  types  of  LACIE  esti- 
mates were  generated  in  Phase  III. 


1.  Initial  estimates  used  all  CAMS-processed 
acquisitions. 

2.  Revised  estimates  (first  used  in  the  April  esti- 
mates) employed  a thresholding  procedure  to  elimi- 
nate early-season  (preemergence)  acquisitions  and 
later  included  a thresholding  procedure  based  on  key 
acquisition  to  eliminate  suspect  data. 

3.  Final  estimates,  released  in  the  CAS  annual  re- 
port, were  recalculated  for  the  entire  season  using  the 
data  editing  procedure. 

4.  Final  estimates  with  a 30-day  delay — wherein 
Landsat  data  acquired  up  to  30  days  before  the  report 
date  were  aggregated  to  make  area  data  more  directly 
comparable  to  yield  data  and  to  normalize  the  proc- 
essing time— were  used.  The  final  estimates  with  a 
30-day  delay  were  released  as  the  official  LACIE  esti- 
mates for  Phase  III. 

Comparison  of  the  LACIE  monthly  total  wheat 
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TahLE  VIII. — Comparison  of  LACIE  ami  U.S.S.R.  Production  Estimates 
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FIGURE  10. — Phase  III  winter  wheat  production  and  area  esti-  FIGURE  II.— Phase  III  spring  wheat  production  and  area  esti- 
mates recomputed  from  w 30-day  delay  average.  mates  recomputed  from  a 30-day  delay  average. 


production  estimates  with  the  official  U.S.S.R. 
Government's  final  estimates  indicates  that  the 
LACIE  estimates  supported  the  90/90  accuracy  goal 
each  month  from  August  through  the  final  report. 
Table  IX  gives  the  statistics  used  to  evaluate  the 
90/90  criterion.  It  contains  the  estimated  relative 
difference  (RD),  the  CV  for  each  monthly  estimate, 
the  tolerable  relative  biases  given  for  the  observed 
CV,  and  the  significance  level.  For  example,  the  RD 
and  the  CV  for  the  final  LACIE  estimates  were  —0.7 
and  3.8  percent,  respectively.  With  a CV  of  that  mag- 
nitude, the  LACIE  total  wheat  production  estimate 
would  support  the  90/90  criterion  if  the  relative  bias 
was  between  the  limits  of  —5.6  and  4.6  percent. 
Since  the  estimated  relative  bias  was  within  this  in- 
terval, the  estimate  supported  the  90/90  criterion. 
The  last  column  gives  estimates  of  the  probability  of 
observing  the  RD  encountered,  given  a 90/90  pro- 
duction estimator.  It  is  inferred  that  if  the  signifi- 
cance level  is  greater  than  10  percent,  the  estimator 
supports  the  90/90  accuracy  goal. 

The  results  of  LACIE  Phase  III  with  its  revised 


Table  IX. — Evaluation  Statistics 
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50 
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-.7 

3.8 

(-5.6.46) 

50 

approach  indicate  that  the  accuracy  goal  of  90/90  was 
achieved  in  the  U.S.S.R.,  where  the  shortfall  in  the 
spring  wheat  crop  was  identified  3 months  before 
completion  of  harvest  and  similar  results  were 
achieved  in  the  winter  wheat  regions.  The  initial 
LACIE  estimate  of  97.6  million  metric  tons  in 
August  was  within  6 percent  of  the  U.S.S.P  January 
28  figure  of  92  million  metric  tons,  and  the  LACIE 
final  estimate  released  on  January  23  for  total  wheat 
production  was  within  1 percent.  Throughout  1977, 
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FIGURE  12.— Phase  III  total  wheat  production  and  area  esti- 
mates recomputed  from  a 30-day  delay  average. 

implementation  problems  and  data  processing  back- 
logs were  encountered  that  resulted  in  estimation  er- 
ror beyond  that  which  would  be  encountered  in  a 
functioning  operational  system.  Faulty  data  acquisi- 
tion orders  and  inoperative  ground  receiving  stations 
led  to  the  loss  of  Landsat  acquisitions  over  a portion 
of  the  U.S.S.R.  winter  wheat  region.  These  lost  Land- 
sat  data  were  never  received  or  recorded  and  so  could 
not  be  recovered.  Data  already  processed  were 
reevaluated,  and  the  crop  analyst’s  procedures  were 
modified.  In  December  1977,  these  data  interpreta- 
tion problems  were  circumvented,  and  the  LAC1E 
estimates  were  recomputed  using  Landsat  data 
assuming  a 30-day  processing  delay  operationally. 
The  resulting  estimates  were  released  on  January  23 
before  the  final  Soviet  release.  In  a future  operation, 
such  results  could  be  produced  as  early  as  August  or 
September.  These  somewhat  improved  results  were 
within  3 percent  of  the  Soviet  figures  in  August, 
some  3 months  before  harvest. 

A detailed  examination  of  the  conditions  leading 
to  the  Soviet  shortfall  in  spring  wheat  production  and 
the  response  observed  in  the  LACIE  models  indi- 


cated that  the  LACIE  forecast  technology  did  re- 
spond in  a timely  fashion.  In  most  of  the  U.S.S.R. 
spring  wheat  regions,  the  growing  season  ex- 
perienced temperatures  warmer  than  average.  These 
elevated  temperatures  led  to  moisture  deficiencies 
through  increased  demand  on  available  precipitation. 
The  PET  data  indicated  that  the  above-normal  tem- 
peratures in  the  growing  season  seriously  depleted 
soil  moisture  supply  throughout  the  southern  por- 
tions of  the  U.S.S.R.  spring  wheat  area.  While  the 
northern  regions  had  normal  to  above-normal 
moisture  in  addition  to  these  impacts,  the  April  tem- 
perature was  nearly  4°  C above  normal,  which 
theoretically  at  least  would  deplete  the  preseason  soil 
moisture  supply. 

An  investigation  of  the  Landsat  data  and  the  yield 
model  response  at  subregional  levels  indicated  that 
the  drought  conditions  were  clearly  observable  in  the 
Landsat  data  and  that  the  yield  models  responded  by 
reducing  yield  estimates  in  the  affected  regions. 
Radiometric  measurements  by  Landsat  (green  index 
number),  which  were  known  to  be  related  to  crop 
vigor,  indicated  the  southern  portions  of  the  spring 
wheat  region  were  under  severe  drought  conditions. 
However,  in  the  northern  regions,  LACIE  was 
forecasting  above-normal  yields.  In  the  southern 
regions,  LACIE  yield  models  reduced  the  yield 
prospects  nearly  2 quintals  per  hectare  in  response  to 
the  high  April  temperature  before  the  growing 
season  had  commenced.  The  continuing  drought 
reduced  the  yield  nearly  2 more  quintals  per  hectare 
below  the  normal  11.5  quintals  per  hectare. 

Production.— For  total  wheat,  there  was  no  signifi- 
cant difference  at  the  10-percent  level  between  the 
final  official  LACIE  production  estimate  and  the 
final  estimate  released  by  the  U.S.S.R.  Government. 
In  fact,  the  final  or  official  LACIE  production  esti- 
mates for  Phase  III  were  consistently  between  89.1 
and  92.3  million  metric  tons  and  were  never  signifi- 
cantly different  from  the  official  Soviet  figure  of  92.0 
million  metric  tons.  The  final  RD  between  the  offi- 
cial LACIE  and  U.S.S.R.  Government  estimates  was 
—0.7  percent.  The  CV  for  the  official  LACIE  esti- 
mates dropped  steadily  from  4.3  percent  in  August  to 
3.8  percent  for  the  final  estimate. 

A comparison  of  the  monthly  LACIE  production 
winter  wheat  figures  with  the  official  U.S.S.R. 
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Government  figures  showed  that  only  in  August  was 
the  difference  significant  at  the  10-percent  level. 

The  official  LACIE  spring  wheat  estimates  for 
September,  October,  and  the  final  report  were  not 
significantly  different  at  the  10-percent  level  from 
the  official  Soviet  estimate,  but  the  difference  was 
highly  significant  in  August  because  of  LACIE 
underestimation. 

Area  estimates.— The  LACIE  wheat  area  estimates 
for  the  U.S.S.R.  and  associated  statistics  and  com- 
parison data  are  presented  in  table  X.  The  test 
statistics  for  total  wheat  showed  that  the  differences 
between  the  official  LACIE  estimates  and  the  offi- 
cial Soviet  estimate  were  not  significant  at  the  10-per- 
cent  level  except  in  August.  The  underestimate  in 
August  was  due  to  the  LACIE  underestimation  of 
spring  wheat  area  in  the  first  spring  wheat  area  ag- 
gregation for  the  U.S.S.R.  in  Phase  III. 

Although  a complete  set  of  test  statistics  was  not 
available  for  the  initial,  revised,  and  final  estimates, 
it  is  apparent  that  the  final  and  official  estimates 
were  closer  to  the  official  U.S.S.R.  Government 
figure  than  were  the  initial  or  revised  estimates. 

There  were  marginally  significant  differences  at 
the  10-percent  level  between  the  official  LACIE 
winter  wheat  area  estimates  and  the  official  Soviet 
estimates  for  August,  September,  and  October  in 
Phase  III.  The  final  official  LACIE  estimate  was  not 
significantly  different  from  the  official  figure 
released  by  the  U.S.S.R.  Government. 

The  spring  wheat  statistics  and  associated  com- 
parison data  in  table  X indicate  that  the  official 
LACIE  estimates  compared  well  with  the  Soviet  esti- 
mate after  August.  However,  there  is  a large  RO  of 
— 12.8  percent  for  August  due  to  underestimation  by 
LACIE. 

Yield  estimates. — The  LACIE  wheat  yield  esti- 
mates for  the  U.S.S.R.  are  presented  in  table  XI  with 
the  associated  statistics  and  comparison  data.  The 
estimates  of  the  precision  (i.e.,  the  CV)  were  not 
available  for  the  LACIE  total  wheat  yield  estimates. 
However,  the  official  LACIE  estimates  were  quite 
close  to  the  official  U.S.S.R.  Government  estimate. 
The  difference  between  the  official  LACIE  estimates 
for  total  wheat  and  the  official  U.S.S.R.  Government 
figure  was  never  more  than  0.4  quintal  per  hectare. 

The  final  and  official  LACIE  winter  wheat  yield 
estimates  were  closer  to  the  U.S.S.R.  Government 
estimate  than  were  the  initial  or  revised  estimates. 
There  was  no  significant  difference  at  the  10-percent 
levet  between  the  official  LACIE  and  the  official 
U.S.S.R.  Government  estimates.  The  absolute 


difference  between  the  monthly  LACIE  estimates 
and  the  official  U.S.S.R.  Government  estimate  never 
exceeded  1.1  quintals  per  hectare. 

None  of  the  differences  between  LACIE  spring 
wheat  estimates  and  the  official  Soviet  estimate  were 
significant  at  the  10-percent  level.  A tendency 
towards  underestimation  is  apparent,  however,  and 
has  been  addressed  previously. 

Technical  Issues  and  Problems 

The  technical  issues  and  problems  in  Phase  II 
were  also  major  constraints  in  Phase  III  with  the  ex- 
ception of  the  indicator  region  and  the  sampling 
problem. 

Indicator  region. — When  it  was  decided  to  work 
the  entire  wheat-producing  area  of  the  U.S.S.R.  in 
Phase  III,  the  issue  of  indicator  regions  and  the  asso- 
ciated problem  of  what  to  use  as  a comparison  for  the 
LACIE  estimates  were  eliminated.  Although  the 
U.S.S.R.  announcement  of  production,  area,  and 
yield  statistics  was  released  months  after  the  Phase 
III  crop  year  terminated,  these  data  were  irreplacea- 
ble in  calculating  the  accuracy  and  success  of  LACIE 
in  the  U.S.S.R. 

Sampling.— Approximately  800  of  the  total  alloca- 
tion of  1947  sample  segments  were  relocated  be- 
tween Phases  II  and  III.  Most  of  the  relocation  in- 
volved moving  sample  segments  from  non- 
agricultural  to  agricultural  areas;  however,  some  seg- 
ments were  moved  to  provide  more  efficient  sam- 
pling of  the  agricultural  area.  The  segments  were  allo- 
cated on  total  land  area  in  Phase  II  and  on 
agricultural  area  only  in  Phase  III. 

Software  problems. — During  Phase  II,  it  was 
decided  that  zones  (see  fig.  13)  which  did  not  have  at 
least  one  stratum  (oblast)  with  a minimum  of  three 
segments  for  which  usable  spectral  (Landsat)  data 
had  been  acquired  would  not  be  included  in  the  ag- 
gregation or  LACIE  estimate.  The  rationale  was  that 
Landsat  data  for  less  than  three  segments  would  not 
be  representative  of  an  area  or  zone  size  and 
therefore  could  not  be  estimated  by  remotely  sensed 
data.  This  software  was  in  place  at  the  beginning  of 
Phase  III.  It  became  immediately  apparent  that  the 
LACIE  estimates  for  the  U.S.S.R.  would  be  unac- 
ceptable (early  season  at  a minimum)  because  of  the 
large  number  of  zones  which  did  not  meet  this  cri- 
terion. It  was  decided  to  insert  historical  data  in  areas 
(zones)  for  which  no  estimate  was  generated  until 
sufficient  usable  Landsat  data  were  acquired  to  meet 
the  necessary  criterion. 
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Table  X. — Comparison  of  LACIE  and  USS.R.  Area  Estimates 
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Table  XI. — Comparison  of  LACIE  and  USS.R.  Yieid  Estimates 
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CONCLUSIONS 

The  mqjor  problems  with  area  estimation  that 
have  been  discussed  in  this  paper  were  overcome  or 
successfully  circumvented  in  Phase  tit.  This  resulted 
in  estimates  well  within  the  accuracy  tolerance  goals 
of  LACIE;  however,  there  are  refinements  and 
enhancements  that  should  be  implemented  to 
further  improve  the  estimates. 

In  general,  the  yield  models  provided  much  better 
estimates  in  Phase  111  than  in  Phase  II  because  of  the 
updated  data  and  the  resultant  coefficient  changes. 
The  models  never  demonstrated  the  capability  of 
reflecting  the  degree  of  extremely  high  or  low  yields 
but  rather  indicated  the  direction  of  deviation  from 
trend.  These  models  should  be  upgraded  or  replaced 
with  models  capable  of  more  accurate  yield  estimates 
to  reduce  more  accurate  production  estimates. 

Availability  and  timeliness  of  Landsat  data  are 
very  important  to  accurate  area  estimates.  If  the  ap- 
propriate spectral  data  coverage  had  been  available  at 
the  right  time,  the  data  editing  procedures,  as  dis- 
cussed earlier,  could  have  been  avoided. 

The  consensus  of  the  commodity  analysts  partici- 
pating in  the  U.S.S.R.  work  is  that  Phase  111  U.S.S.R. 
area  rwjlts  arc  repeatable  given  the  same  Landsat 
classification  and  aggregation  procedures. 

The  LACIE  USSR.  Phase  111  results  evolved 
from  3 years  of  concentrated  effort  by  essentially  the 
same  commodity  analysts.  Moving  to  a new  foreign 
area  would  require  a minimum  preparation  time  of  1 
year  before  reliable  results  could  be  expected. 
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OVERVIEW 

The  LACIE  project  originally  planned  to  generate 
area,  yield,  and  production  estimates  in  the  Canadian 
Prairie  Provinces — the  principal  wheat-growing 
region  of  that  country — in  both  Phases  II  and  III. 
Because  of  a change  in  scope  at  the  beginning  of 
Phase  111,  the  investigation  in  Canada  was  reduced  to 
a moderate  number  of  sites  where  ground  truth  was 
collected.  In  order  to  support  accuracy  assessment,  a 
small  number  of  exploratory  and  intensive  test  sites 
were  analyzed  during  all  three  phases  of  LACIE,  but 
in-season  area,  yield,  and  production  estimates  were 
generated  only  in  Phase  II. 


Phase  I 

The  work  in  Canada  in  Phase  1 was  centered 
around  building  a historical-statistical  data  base, 
locating  sample  segments  within  the  country,  and  ac- 
quiring multispectral  scanner  (MSS)  data  for  a subset 
of  the  sample  for  study  by  the  image  analyst.  There 
were  no  estimates  generated  for  Canada  during  this 
phase. 


Phase  II 

Scope. — The  LACIE  Canadian  spring  wheat 
region  includes  the  major  spring  wheat  producing 
region  in  Canada  (fig.  1).  This  area  is  comprised  of 
the  Provinces  of  Saskatchewan,  Alberta,  and 
Manitoba.  This  region  grows  predominantly  spring 
wheat  and  spring  grains  with  some  winter  rye  scat- 


aUSDA  Foreign  Agricultural  Service,  Houston,  Texas, 
bNASA  Johnson  Space  Center.  Houston,  Texas. 

CUSDA  Economics,  Statistics,  and  Cooperatives  Service, 
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tered  throughout  the  three  provinces.  Saskatchewan 
accounts  for  the  major  proportion  of  the  three- 
province  total  spring  wheat  production,  with  approx- 
imately 65  percent,  while  Alberta  accounts  for  23 
percent  and  Manitoba  12  percent. 

Sampling. — The  sampling  frame  used  in  Canada  is 
the  same  as  that  used  for  sampling  in  the  United 
States:  Group  I units  which  historically  account  for 
substantial  wheat  area  and  Group  II  units  where  seg- 
ments are  allocated  on  the  basis  of  probabilities  pro- 
portional to  size.  The  initial  sample  segment  alloca- 
tion was  based  on  the  1971  census  and  census  district 
boundaries.  The  1971  census  was  used  instead  of 
more  currer,.  information  since  it  was  the  only 
publication  that  included  county-level  area  data.  This 
choice  of  a base  had  a major  impact  on  the  aggrega- 
tion results  and  caused  a number  of  problems  in  the 
analysis  and  evaluation  of  the  Canadian  data.  The 
final  section  of  this  paper  will  describe  in  detail  the 
major  problems  associated  with  the  Phase  II  Cana- 
dian reports. 

The  sample  segment  allocation  for  Canada  placed 
283  segments  within  the  three  provinces,  with 
Saskatchewan  allocated  170,  Alberta  75,  and 
Manitoba  38. 

Data  base. — The  Canadian  data  base  is  comprised 
of  five  data  sets  that  are  necessary  to  support  the  ag- 
gregation and  reporting  functions.  These  five  data 
sets  are  as  follows. 

1.  Allocation  data.  This  file  includes  the  political 
hierarchy  and  associated  segment  and  area  descrip- 
tors—agricultural  land,  political  area,  segment  loca- 
tion, and  political  hierarchy  identifiers. 

2.  Historical  data.  At  a minimum,  historical  area, 
yield,  and  production  data  were  input  for  the  1971 
base  year  that  was  used  to  generate  the  sample  seg- 
ment allocation. 

3.  Ratio  data.  This  file  was  used  to  ratio  the  small 
grains  estimates  generated  from  the  Landsat  data  to 
the  wheat  estimates  that  were  needed  to  support 


FIGURE  1.— Map  of  Canadian  Prairie  Provinces  with  LACIE  strata  numbers. 


wheat  area  aggregation.  The  ratios  were  based  on 
1971  data  and  were  constructed  for  two  different  data 
types — spring  wheat  to  spring  small  grains  and 
spring  wheat  to  total  small  grains. 

4.  Landsal  data.  All  segment-level  estimates  that 
were  generated  for  Canada  were  stored  in  the  data 
base  for  use  in  generating  aggregated  area  estimates. 

5.  Yield  data.  Yield  estimates  were  generated  for 
each  crop  district  by  the  National  Oceanic  and  At- 
mospheric Administration's  (NO  A A)  Center  for 
Climatic  and  Environmental  Assessment  (CCEA). 
These  base  estimates  were  stored  and  used  in  ag- 
gregating yield  and  production  estimates  to  the  coun- 
try level. 

Laiulsai  data. — The  final  wheat  area  estimate  for 
the  Canadian  spring  wheat  region  was  based  on 
spectral  coverage  obtained  between  May  29  and  Sep- 
tember 23.  1976.  with  the  majority  of  acquisitions 
received  between  late  June  and  mid-August.  All 
Canadian  data  were  processed  as  spring  small  grains 
or  small  grains.  Ratioing  was  performed  within  the 


Crop  Assessment  Subsystem  (CAS)  to  determine 
the  percent  wheat. 

Landsat  acquisition  coverage  throughout  the  1976 
crop  season  was  exceptionally  good  as  a result  of  the 
relatively  cloudfree  summer.  The  overall  average  ac- 
quisition rate  was  6 acquisitions  per  segment,  with  a 
usable  acquisition  rate  of  2.8.  This  netted  an  end-of- 
season  estimate  that  contained  usable  data  for  90  per- 
cent of  the  283  allocated  segments.  Both 
Saskatchewan  and  Manitoba  had  usable  acquisitions 
that  accounted  for  90  to  95  percent  of  the  segments 
for  each  of  the  three  reports  generated  during  the 
season.  Only  Alberta  had  consistently  low  coverage 
rates,  with  27  percent  usable  for  the  first  eport.  57 
percent  for  the  second,  and  79  percent  for  the  final. 

The  project  processed  1 704  acquisitions  during  the 
1976  Canadian  crop  season,  mid-May  to  mid-Sep- 
tember. Of  these  1704  acquisitions,  254  were  used  in 
the  final  aggregation.  The  data  dropout  is  accounted 
for  by  913  acquisitions  being  classified  as  not  usable 
for  aggregation  (cloud  cover,  mechanical  problems. 
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etc.)  and  537  acquisitions,  which  were  classified  as 
satisfactory,  being  replaced  by  subsequent  acquisi- 
tions (fig.  2).  The  classifications  for  all  acquisitions 
for  all  reports  are  shown  in  table  I.  Of  the  2S4  acquisi- 
tions used  in  the  aggregation,  1 segment  was 
classified  as  nonagricultural.  In  the  first  report.  28 
segments  were  classified  or  ratioed  to  0 percent 
wheat.  By  the  final  report.  12  of  these  segments  had 
received  updated  information,  leaving  16  segments 
with  a 0 percent  wheat  estimate. 

Eighty  percent  of  all  estimates  used  in  the  final  ag- 
gregation included  acquisitions  for  biostage  4 or  later. 
This  figure  indicates  that  most  of  the  Landsat 
analysis  included  data  for  the  end  of  the  crop's  full 
growth  cycle.  Throughout  the  season,  approximately 
35  days  were  required  to  process  the  data  from  ac- 
quisition to  receipt  by  CAS  for  use  in  the  aggregation. 

Yield  data. — The  CCEA  agrometeorological  yield 
estimates  for  the  1976  crop  year  generally  confirmed 
ihe  known  meteorological  conditions  prevalent 
throughout  the  three  major  wheat  producing  prov- 
inces. Much  like  the  summer's  weather  pattern,  yield 
results  were  somewhat  mixed  across  the  prairies. 
The  drought  that  dominated  the  U.S.  upper  Midwest 
throughout  the  summer  affected  Canadian 
agriculture  as  well,  driving  yield  model  indications 


well  below  normal  along  the  eastern  fringes  of  the 
Canadian  wheat  belt. 

Manitoba  spring  wheat  yields  apparently  ex- 
perienced the  greatest  setback,  finishing  4 to  6 
bushels  per  acre  below  average  in  the  east,  with  all 
districts  predicting  below-normal  yields.  This  result 
represented  a gradually  deteriorating  situation 
through  the  summer.  Early  in  the  season,  crop  dis- 
tricts in  Manitoba,  as  well  as  in  most  of  the  other 


Tabu  I. — Classification  of  Acquisitions 


( lassilhaliott  No.  (pereent)  of  acquisitions  in — 


1st  report  'ml  report  frd  and 

linal  reports 


Cloud  cover,  hare.  eic. 

38 

(5) 

68 

(6) 

91 

(5) 

Mechanical  difficulties 

6 

(1) 

18 

(2) 

43 

(3) 

Preemergence 

386 

(47) 

386 

(31) 

386 

(23) 

Multiple  acquisitions8 

129 

(16) 

275 

(22) 

388 

(23) 

Unsatisfactory 

1 

(— ) 

4 

(-) 

5 

(-) 

Satisfactory 

254 

(31) 

489 

(39) 

791 

(46) 

“Several  puses  of  one  segment  were  reviewed  at  one  lime  and  only  the  best  acouist- 
lion  was  used  to  determine  the  area  estimate. 


TOTAL  ACQUISITIONS  RECEIVED:  1704 


FIGURE  2. — Landsat  data  flow  from  CAMS  to  aggregation  for  spring  wheat  in  the  Canadian  Prairie  Provinces. 
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Canadian  crop  districts,  were  predicting  above- 
average  crop  prospects.  In  early  July,  stress  from  in- 
sufficient moisture  began  to  appear.  From  then  until 
the  final  truncation  in  August,  yield  estimates 
declined  steadily  throughout  the  four  Manitoba  crop 
districts. 

In  the  Province  of  Saskatchewan,  to  the  west  of 
Manitoba,  yields  showed  a marked  contrast  to  those 
in  Manitoba,  with  crop  predictions  reflecting  the  pre- 
vailing ideal  weather  that  region  enjoyed  through 
most  of  the  summer.  In  the  central  portion  around 
Saskatoon  and  westward,  yield  indications  in  May 
were  up  to  2 bushels  per  acre  below  normal,  due  to 
preseason  soil  moisture  deficiency.  However,  abun- 
dant June  and  July  rains  rallied  crop  prospects,  and 
the  entire  province  finished  the  season  with  bumper 
yields — as  much  as  8 bushels  per  acre  above  normal 
expectations  in  the  north. 

The  seasonal  impact  of  weather  on  Alberta  wheat 
production  seemed  at  the  end  of  the  season  to  have 
been  favorable.  Two  of  the  three  Alberta  districts  in- 
dicated yields  slightly  above  trend  with  only  the 
northwest  Peace  River  region  falling  below  the 
norm.  Above-normal  September  temperatures,  asso- 
ciated with  mild  but  rainy  weather,  apparently 
delayed  harvesting  in  that  region,  reducing  some- 
what the  final  wheat  yield  and  quality.  In  the  more 
northerly  regions,  once  the  crop  has  passed  the  frost- 
susceptible  stage,  colder  temperatures  associated 
with  high  pressure  and  clear  weather  are  desirable  to 
speed  maturity,  freeze  down  weeds  that  hamper  har- 
vest operation,  and  thus  hasten  harvest  completion. 
The  eastern  and  southern  Alberta  wheat  crop,  like 
that  of  the  neighboring  portions  of  Saskatchewan, 
started  the  year  with  slightly  below-normal  yield  ex- 
pectations, due  to  preseason  moisture  problems,  but 
recovered  early  and  stayed  even  through  most  of  the 
summer.  Excellent  maturing  conditions  in  August 
and  September  pushed  yields  to  their  final  above- 
trend  mark  in  the  major  portion  of  Alberta's  wheat 
producing  region. 


Phase  III 

In  Phase  III,  Canada  was  not  worked  opera- 
tionally to  provide  aggregated  area,  yield,  and  pro- 
duction estimates  because  of  the  change  in  scope  at 
the  beginning  of  this  phase.  A small  number  of  ex- 
ploratory and  intensive  test  sites  were  analyzed  to 
support  accuracy  assessment  within  the  project.  This 
analysis  will  not  be  presented  in  this  paper. 


ESTIMATES 

Three  monthly  reports  and  one  final  report  were 
generated  for  the  Canadian  spring  wheat  region  dur- 
ing the  1976  crop  year.  The  area  and  production  esti- 
mates showed  significant  increases  with  each  suc- 
ceeding report.  The  final  end-of-season  area  estimate 
for  the  three  Canadian  Prairie  Provinces  was  20.8 
million  acres  (table  II).  This  is  6.0  million  acres  (22 
percent)  below  the  official  Canadian  Statistics  Octo- 
ber estimate  of  26.8  million  acres.  The  production 
estimate  was  also  low — 576.3  million  bushels  versus 
an  official  figure  of  833.0  million  bushels— a 31-per- 
cent  difference  (table  II). 

The  LACIE  estimates  were  based  on  spectral  data 
acquired  from  mid-June  to  mid-September  1976, 
with  the  bulk  of  the  data  acquired  during  July  and 
August.  For  the  Canadian  Prairie  Provinces  as  a 
whole,  usable  Landsat  data  were  acquired  for  90  per- 
cent of  the  segments. 

The  continual  increases  in  the  area  estimate  were 
due  primarily  to  improved  spectral  coverage  and 
upgrading  of  previous  acquisitions.  The  first  report 
was  delayed  until  August  because  of  the  lack  of  ade- 
quate spectral  coverage  to  generate  an  estimate.  In 
that  report,  usable  acquisitions  were  received  for  61 
percent  of  the  allocated  segments.  While  this  may 
seem  reasonable  for  a first  report,  early-season  data 
(i.e.,  classifications  indicating  5 percent  or  less  small 
grains  area)  reduced  the  effective  coverage  to  43  per- 
cent of  those  segments  allocated. 

In  the  second  and  third  reports,  spectral  coverage 
improved  dramatically  to  83  percent  and  90  percent, 
respectively,  with  a substantial  amount  of  mid-  and 
late-season  data  being  used  to  generate  the  estimate. 
Coverage  throughout  the  three  provinces  was  better 
than  anticipated  during  this  crop  season  since  cloud 
cover,  usually  so  prevalent  during  the  Canadian  crop 
season,  was  reduced  because  of  the  exceptionally 
favorable  weather  conditions. 

The  area  estimates  generated  by  LACIE  during 
Phase  II  were  well  below  official  Canadian  reports. 
In  fact,  every  in-season  estimate  for  the  totai  legion 
during  1976  fell  below  the  1971  base  year  used  for  the 
aggregation  data  base  (table  II).  For  the  province- 
level  estimates,  only  the  third  report  estimate  for 
Alberta  was  over  the  1971  base  level  (approximately 
9 percent  over). 

One  of  the  most  obvious  causes  of  the  underesti- 
mate was  the  use  of  ratioing  to  derive  wheat  esti- 
mates from  the  small  grains  estimates  generated  by 
the  Classification  and  Mensuration  Subsystem 
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Table  11. — Comparison  of  August,  September,  October, 
and  January  LAC1E  Estimates 
for  the  Three  Canadian  Prairie  Provinces a 


Pmvint v Historical  LACIE  Official^ 


(071)  

Auk- 

Sept. 

Oil. 

Jan} 

Area,  thousands  of  aavs 


Saskatchewan 

12923 

9697 

11202 

11453 

13511 

17  400 

Alberta 

3 443 

2099 

3433 

3 767 

4 535 

5 600 

Manitoba 

2 519 

1751 

2100 

2125 

2 756 

3 800 

Total 

18885 

13  547 

16735 

17  345 

20802 

26  800 

Yield,  hulacrc 


Saskatchewan 

26.7 

29.5 

29.6 

29.5 

29.5 

31.5 

Alberta 

26.4 

23.5 

24.6 

24.6 

25.0 

32.5 

Manitoba 

29.4 

23.1 

23.3 

23.4 

23.4 

27.1 

Average 

27.0 

27.7 

27.8 

27.7 

27.7 

31.1 

Production,  thousands  of  bushels 


Saskatchewan 

345000 

285  750 

331  793 

338  354 

398  722 

548  000 

Alberta 

91000 

49  235 

84  281 

92  731 

113216 

182  000 

Manitoba 

74000 

40  400 

48  995 

49817 

64  376 

103  000 

Total 

510  000 

375  385 

465  069 

480902 

576  314 

833000 

‘‘These  figures  are  based  on  a rework  of  (he  data  performed  after  the  annua!  report  was  produced.  A 
number  of  errors  were  found  in  the  yield  models  that  required  this  rework.  Yield  and  production  accuracy 
statistics  were  not  generated  during  the  crop  season  and  were  computed  during  this  rework.  The  rework  of 
the  yield  estimates  did  not  significantly  afTect  the  L ACIfc  results.  Figures  are  not  comparable  to  in-season 
reported  estimates. 

^Statistics  Canada,  Field  Crop  Reporting  Series,  No  20.  Dec.  3. 1976. 
cBa$ed  on  1975  data  for  ratioing  CAMS  small  grains  estimates. 


(CAMS).  1971  was  designated  as  the  base  year  for 
Canada  because  statistics  were  available  at  the  coun- 
ty level  to  support  the  sampling  and  allocation. 
When  a ratioing  procedure  had  to  be  implemented, 
1971  data  were  also  used  to  determine  the  ratios  since 
no  other  data  existed  at  the  county  level  to  support 
this  procedure.  Data  obtained  after  1971  (not  at  the 
county  level)  show  a substantial  shift  in  the  amount 
of  area  planted  to  wheat  versus  other  grains.  Most  of 
the  increased  wheat  acreage  has  been  at  the  expense 
of  flax,  rapeseed,  and  fallowed  land.  The  government 
over  the  past  several  years  has  been  asking  farmers 
to  increase  their  wheat  and  feed  grain  acreage  and 
reduce  their  fallow  acreage,  since  current  research 
within  the  country  shows  that  the  prevalent  summer 
fallowing  practices  have  tended  to  waste  moisture 


and  soil  resources.  When  the  1975  ratios  were  ap- 
plied to  the  segment  estimates  in  the  final  report,  the 
total  LACIE  wheat  acreage  increased  from  17.3 
million  acres  to  20.8  million  acres,  an  improvement 
of  13  percent  but  still  22  percent  below  official  esti- 
mates. If  the  ratio  obtained  from  the  1976  official 
estimates  of  wheat  and  small  grains  acreage  is  ap- 
plied to  the  LACIE  data,  an  additional  improvement 
of  4 percent  is  realized. 

The  use  of  more  current  data  in  the  ratioing  does, 
as  anticipated,  improve  the  estimate  generated  by 
LACIE.  but  it  accounts  for  less  than  half  the 
difference  between  the  LACIE  estimate  and  the  offi- 
cial estimate.  So,  while  ratioing  is  a major  factor  in 
the  underestimate,  other  factors  contributed  substan- 
tially. All  the  factors  including  ratioing  thought  to 
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affect  the  estimate  are  reviewed  in  the  section  en- 
titled  “Technical  Issues."  To  date,  sampling/ 
classification  and  Landsat  resolution  seem  to  be  the 
other  principal  causes  of  the  area  underestimate. 

The  yield  estimates  generated  for  the  1976  crop 
season  were  under  the  official  yields  both  for  the 
region  and  for  the  individual  provinces.  The  regional 
yield  for  LACIE  was  27.7  bushels  per  acre  compared 
to  the  official  yield  of  31.1  bushels  per  acre  (table  II). 
For  the  country  as  a whole,  the  final  estimate 
reflected  the  gradual  upward  trend  that  the  region 
has  been  experiencing  over  the  past  10  years,  but  it 
did  not  account  for  the  above-normal  yields  observed 
throughout  the  area.  At  the  province  level,  the  yield 
models  seemed  to  reflect  more  strongly  the  eriy- 
season  below-normal  weather  conditions  and  less 
strongly  the  whole-season  favorable  weather.  The 
yield  estimates  were  above  trend  only  in 
Saskatchewan;  the  estimates  for  Alberta  and 
Manitoba  were  below  both  the  1975  estimates  and 
the  10-year  average. 

The  final  LACIE  production  estimate  was  31  per- 
cent below  the  official  Canadian  estimate,  576.3 
million  bushels  versus  833  million  bushels.  The  low 
area  estimate  combined  with  the  low  yield  estimate 
produced  a lower  than  anticipated  production  figure. 
Only  Alberta's  production  figure  during  the  in- 
season  reporting  was  above  the  1971  base  year  as  a 
result  of  the  area  estimate  being  above  that  level.  The 
principal  cause  of  this  low  production  estimate  was 
the  area  figure.  Even  though  the  LACIE  yield  was 
lower  than  official  projections,  it  was  above  the  long- 
term trend. 


ACCURACY  OF  ESTIMATES 


Production  and  Area  Accuracy 

As  mentioned  previously,  the  Canadian  Phase  II 
crop  reports  were  regenerated  and  included  yield  and 
production  accuracy  statistics  (see  LACIE  Phase  II, 
Crop  Assessment  Subsystem  Annual  Report, 
Canada,  January  14,  1977,  with  Addendum  dated 
April  29,  1977).  The  revision  of  the  production  esti- 
mates resulted  in  the  following  changes  (table  111); 
August  +3.7  percent,  September  +2.0  percent,  Octo- 
ber +0.3  percent,  and  January  +0.2  percent  Since  the 
changes  were  minor  and  since  these  estimates  have 
associated  with  them  the  statistics  needed  for  ac- 
curacy assessment,  the  following  analysis  is  based  on 


these  revised  estimates,  it  should  be  pointed  out  that 
the  only  difference  between  the  October  and  January 
estimates  is  that  ratios  of  spring  wheat  to  small 
grains  based  on  1975  Canadian  data  wete  used  for  the 
January  spring  wheat  area  estimates.  The  October 
and  previous  spring  wheat  area  estimates  were  ob- 
tained using  1971  agricultural  census  data.  The  cen- 
sus data  allowed  crop-district-level  ratios. 

The  revised  estimates  and  corresponding  coeffi- 
cients of  variation  (CV's)  for  each  month  are  pre- 
sented in  table  IV.  The  relative  differences  (RD's) 
shown  ere  with  respect  to  the  official  Canadian  esti- 
mates released  December  3,  1976,  by  Statistics 
Canada.  The  test  statistic  indicates  whether  or  not 
the  LACIE  estimate  is  significantly  different  from 
the  corresponding  official  Canadian  estimate.  (For 
further  details  on  the  statistical  approach,  see  the 
paper  by  Houston  et  al.  entitled  “Accuracy  Assess- 
ment; The  Statistical  Approach  to  Performance 
Evaluation  in  LACIE.") 

The  precision  (as  measured  by  the  CV)  of  the 
LACIE  spring  wheat  production  estimates  for  Sep- 
tember, October,  and  January  is  sufficient  to  support 
the  LACIE  90/90  accuracy  goal.  However,  com- 
parisons of  each  of  the  monthly  production  esti- 
mates with  the  official  Canadian  estimate  indicate 
the  presence  of  a negative  bias  that  is  too  large  to  sup- 
port this  goal.  Treating  the  observed  RD  and  CV  for 
the  January  production  estimate  as  the  true  param- 
eters of  the  LACIE  production  estimator  for  Canada 
indicates  that  a 90/65  accuracy  goal  is  achievable  at 
harvest;  i.e.,  the  probability  is  90  percent  that  the  at- 
harvest  LACIE  spring  wheat  production  estimate  is 
within  ±35  percent  of  the  true  Canadian  production. 
This  result  for  Canada,  of  course,  falls  far  short  of 
the  90/90  goal. 

Comparisons  of  the  LACIE  area  and  yield  esti- 
mates with  the  corresponding  official  Canadian  esti- 
mates indicate  that  both  area  and  yield  erors  con- 
tributed significantly  to  the  production  underestima- 
tion. Both  area  and  yield  were  significantly  under- 
estimated each  month.  However,  the  area  error  con- 
tributed more  to  the  underestimation  of  production, 
as  is  indicated  by  inspection  of  the  area  and  yield 
relative  differences. 

The  tendency  to  underestimate  spring  wheat  area 
was  also  observed  in  the  U.S.  spring  wheat  region. 
This  underestimation,  in  both  the  United  States  and 
Canada,  is  partially  the  result  of  the  inability  to 
differentiate  spring  wheat  from  other  small  grains. 
Consequently,  historical  ratios  of  spring  wheat 
acreage  to  small  grains  acreage  were  used  to  obtain 


518 


Tahu  Hi — Comparison  of  Previously  Submitted  Canada  Reports 
and  Re  vised  Estimates 


Pntvituc 

Area. 

Yield. 

Production, 

thousands  at  ai  m 

hidaere 

thousands  o/  bushels 

Original  Revised 

Original  Revised 

Original  Revised 

( AS  Animal  Return — January 


Saskatchewan 

13  546 

13  511 

29.1 

29.5 

394  274 

398  722 

Alberta 

4 535 

4 535 

2S.I 

25.0 

113919 

113.16 

Manitoba 

2 754 

2 756 

24.4 

23.4 

67  226 

64  376 

Total 

20  835 

20  802 

a27.6 

“27.7 

575  419 

576  314 

< AS  Monthly  Report — (ktober  !'> 

Saskatchewan 

It  456 

11453 

29.1 

29.5 

333912 

338  354 

Alberta 

3 767 

3 767 

24.9 

24.6 

93  615 

92  731 

Manitoba 

2124 

2125 

24.5 

23.4 

51950 

49817 

Total 

17  347 

17  345 

a27.6 

*27.7 

479477 

480902 

( AS  Monthly  Report — September  1 * 

Saskatchewan 

tt  205 

11  202 

28.9 

29.6 

324068 

331  793 

Alberta 

3 433 

3 433 

23.3 

24.6 

80  334 

84  281 

Manitoba 

2099 

2 100 

24.3 

23.3 

51  181 

48  995 

Total 

16  736 

16  735 

a27.2 

*27.8 

45b  583 

465  069 

( AS  Monthly  Report — August  I A 


Saskatchewan 

9437 

9697 

28.8 

29.5 

271418 

285  750 

Alberta 

1952 

2 099 

23.2 

23.5 

45  334 

49  235 

Manitoba 

1 800 

1751 

25.2 

23.1 

45  409 

40400 

Total 

13188 

13  547 

*27.5 

“27.7 

362  162 

375  385 

*Avcr«gc 


spring  wheat  acreage  estimates.  These  ratios  were 
responsible  for  a significant  amount  of  the  under- 
estimation observed  for  Canada  in  the  August,  Sep- 
tember, and  October  estimates,  since  a majority  of 
the  ratios  were  developed  from  1971  data  and  the 
planting  of  wheat  in  preference  to  nonwheat  small 
grains  had  greatly  increased  since  that  time.  For  ex- 
ample, in  the  Province  of  Saskatchewan,  which 
historically  produces  about  65  percent  of  the  Cana- 
dian spring  wheat,  the  ratio  of  spring  wheat  acreage 
to  spring  small  grains  acreage  increased  f '"i  60  per- 
cent in  1971  to  about  76  percent  in  1976,  an  increase 
of  16  percent. 

Incorporating  the  use  of  1°75  ratios  of  wheat  to 
small  grains  for  the  January  area  estimate  made  a sig- 


nificant improvement  over  the  October  estimate,  but 
the  January  estimate  was  still  significantly  smaller 
than  the  Canadian  estimate.  This  fact  indicates  that, 
as  a result  of  more  confusion  crops,  smaller  fields, 
and  a relatively  short  growing  season,  the  spring 
small  grains  area,  in  Canada  as  in  the  United  States, 
is  also  significantly  underestimated.  The  strip-fallow 
cropping  practice,  which  effectively  creates  smaller 
fields,  leads  to  underestimation,  since  some  of  the 
strip-fallow  fields  are  small  compared  to  Landsai 
resolution  and  hence  are  difficult  to  detect  and 
measure. 

Another  potential  source  of  error  in  the  LACIE 
spring  wheat  area  estimation  process  is  sampling.  To 
date,  this  particular  error  has  not  been  quantified  for 
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Canada.  The  sampling  plan  was  based  on  1971 
agricultural  census  data,  and,  for  the  same  reason 
that  the  1971  ratios  of  spring  wheat  to  small  grains 
were  inappropriate,  the  sampling  scheme  employed 
may  also  have  shortcomings.  For  example,  the  in- 
crease in  the  planting  of  wheat  in  preference  to  other 
small  grains  probably  resulted  in  counties  with 
sparse  wheat  acreage  in  1971  having  significantly 
more  area  planted  to  wheat  in  1976.  This,  of  course, 
would  indicate  the  need  for  more  sample  segments  in 
these  areas.  This  particular  situation  occurred  in  the 
United  States  in  Minnesota.  The  first  allocation  by 


LACIE  to  estimate  spring  wheat  production  in  the 
United  States  for  crop  year  1974-75  was  based  on 
1969  agricultural  census  data  and  Minnesota  was 
allocated  13  segments.  When  the  sample  allocation 
was  redone  for  the  1976-77  crop  year,  the  allocation 
was  based  on  the  1974  agricultural  census  and  Min- 
nesota was  allocated  47  segments.  This  increase  in 
sample  number  for  Minnesota  was  primarily  due  to 
the  increase  in  area  planted  to  wheat  from  1969  to 
1974. 

Although  spring  wheat  area  appears  to  be  the  pri- 
mary contributor  to  the  production  underestimate, 
the  yield  was  also  significantly  underestimated. 


Tabu  IV— LACIE  Revised  Estimates  of  Spring 
Wheat  Production,  Area,  and  Yield  for  Canada 
Compared  With  Official  Country  Estimates 


till  Print  muon 


Month 

Otfieial.0 
tliamaihh 
at' bushels 

tACIL, 

thomamls 
at  bushels 

o. 

pm  cm 

Kl),h 

Pi’ncm 

Tesi 

stmislte 

tel 

August 

37S  385 

6.5 

-121.9 

— 18.8 

September 

465069 

5.2 

—79.1 

-15.2 

October 

480902 

4.8 

-73.2 

-15.3 

January 

833  000 

576  314 

4.9 

-44.5 

-9.1 

(b)  Area 


Month 

Ollieial. 
thousands 
ii  I mres 

1 4(7/ , 
thousands 
of  Hi  res 

<1 

percent 

Rll 

pen  cm 

Test 
sni t is  in 

August 

13  547 

5.8 

-97.8 

-16.9 

September 

16  735 

4.0 

-60,1 

-15.0 

October 

17  345 

3.1 

-54.5 

-17.6 

January 

2b  800 

20  802 

3.2 

-28.8 

-9.0 

(et  iiehl 

Month 

Ofth  tal, 

t.AUt. 

O. 

KP. 

Test 

holm  re 

balm  re 

penvm 

pen  mi 

Mat  is  in 

August 

27.7 

3.6 

-12.3 

-3.4 

September 

27.8 

3.6 

-11.9 

-33 

October 

27.7 

3.7 

-12.3 

-3.3 

January 

31.1 

27.7 

3.7 

-12.3 

-3.3 

C»n*<U 

^Relative  difference  - (LACK  - officuh  * IAC1E  * 100 

‘ThclACIE  Mumatc  » uinificinil)  dittocM  from lilt afOcul ChwCu-i  ettimne. 


Yield  Accuracy 

Since  the  Canadian  government  does  not  publish 
official  yield  and  production  figures  until  their  Sep- 
tember report,  it  was  not  possible  through  most  of 
the  growing  season  to.pinpoint  yield  accuracy  in  the 
three  wheat  growing  provinces.  Yield  predictions 
from  the  CCEA  agrometeorological  models  were 
produced  beginning  in  May  for  each  of  the  four 
Manitoba,  nine  Saskatchewan,  and  three  Alberta 
crop  districts.  Official  Canadian  wheat  estimates, 
when  they  did  become  available,  reported  only 
province-level  results.  The  CCEA  yield  models,  on 
the  other  hand,  predict  only  at  the  crop  district  level. 
Therefore,  without  access  to  current-year  district 
area  data  to  calculate  the  province-level  yields,  it  was 
not  possible  to  precisely  track  LACIE  yields  at  the 
province  level  unless  certain  assumptions  were  made 
about  1976  crop  district  wheat  area  and  distribution 
ratios.  This  was  done;  namely,  the  historical  5-year 
(1971-75)  wheat  area  by  distribution  by  crop  district 
within  a given  province  was  assumed  to  be  propor- 
tional to  the  1976  ratio.  Under  this  assumption  then, 
it  was  possible  to  track  province  yields  through  the 
early  phases  of  the  season.  After  the  initial  LACIE 
area  data  became  available  in  August,  province-level 
yields  were  calculated  using  these  figures  and  the 
results  were  then  compared  to  the  official  Statistics 
Canada  crop  releases  for  the  months  of  September 
and  October  (table  V).  Table  VI  gives  similar  data 
using  the  revised  CCEA  yield  predictions  and  final 
end-of-season  Canadian  figures  released  in  Decem- 
ber 1976. 

Based  on  fairly  conclusive  indications  from  the  of- 
ficial Canadian  source,  it  appears  the  LACIE  models 
underestimated  across  the  board  in  their  initial  year 
of  operation.  For  the  three  Prairie  Provinces  corn- 
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Table  V. — Canada:  J976CCEA/LACIE  Yield  Estimates  With  Comparisons 

(In  bushels  per  acre! 


Province 

May 

June 

July 

August 

September 

October 

CCEAa 

CCEA  CCEA  (TEA  LACIE h 

CCEA 

LACIE 

Offtctaf 

RDd  CCEA 

LACIE 

Official0 

RCJ 

Manitoba 

28.0 

27.6 

27.5 

25.5 

25.2 

24.3 

24.4 

27.4 

-11.0  24.3 

24.5 

27.1 

-9.6 

Saakatchewan 

24.9 

24.5 

28.8 

28.7 

28.8 

28.9 

28.9 

30.2 

-4.3  29.1 

29.1 

31.8 

-8.5 

Alberta 

24.6 

26.0 

25.6 

23.5 

23.2 

23.4 

23.4 

31.8 

-26.4  25.1 

24.9 

31.8 

-21.7 

Three  Prairie 
Provinces 

25.3 

25.3 

28.0 

27.1 

27.5 

27.1 

27.2 

30.1 

-9.6  27.6 

27.6 

31.1 

-11.2 

aCCEA  began  producing  Canadian  yield  estimates  by  crop  district  for  each  of  the  three  provinces  in  May  1976.  A province-level  yield  estimate  could  not  be  obtained 
without  like  area  data  to  iggi egate  IAC1E  area  estimates  were  not  forthcoming  until  August;  hence,  a scheme  to  obtain  the  province  yields  earlier  was  initiated  using  5-year 
(1971-7$)  average  crop  district  wheat  area  and  distribute.  ' •%  for  each  province  For  comparison  purposes,  this  technique  was  continued  each  month  throughout  ihe 


"Initial  LACIE  province  yield  estimate  obtained  by  aggregating  CCE  A crop  district  yield  and  LACIE  crop  district  area  estimates. 
^Statistics  Canada,  Ministry  of  Industry.  Trade  and  Commerce.  Sept  10. 197b. 

"Relative  difference  (in  percent)  - (LACIE  - Statistics  Canada)  + Statistics  Canada  x 100. 

Statistics  Canada.  Ministry  of  Industry.  Trade  and  Commerce,  Oct.  8.  1976. 


Table  VI. — Canada:  1976  CCEA/LACIE  Revised  Yield  Estimates  With  Comparisons 


(In  bushels  per  acre } 


Province 

Augini 

September 

October 

Final 

LACIE a 

LACIE 

Official 

Rif 

.LACIE 

Official 

ROi’ 

LACIE 

Official1 

RD>> 

Manitoba 

23.1 

23.3 

27.4 

— 15.0 

23.4 

27.1 

-14.0 

23.4 

27.1 

-14.0 

Saskatchewan 

29.5 

29.6 

30.2 

-2.0 

29.5 

31.8 

-7.2 

29.5 

31.5 

-6.3 

Alberta 

23.5 

24.6 

31.8 

-22.6 

24.6 

31.8 

-22.6 

25.0 

32.5 

-23.1 

Three  Prairie 
Provinces 

27.7 

27.8 

30.1 

-7.6 

27.7 

31.1 

-10.9 

27.7 

31.1 

-10.9 

*LeDuc.  Sharon  Yield-Weather  Regression  Models  for  the  Canadian  Provinces  LACIE-004JJ.  NASA  Johnson  Space  Center.  Houston.  Tcv.  Feb.  197b. 
^Percent. 

Statistics  Canada.  Ministry  of  Industry.  Trade  and  Commerce.  Dec.  3, 1976. 


bined.  latest  LACIE  calculations  place  spring  wheat 
yield  above  the  normal,  but  below  Canadian  sources, 
maintaining  a pattern  established  early  in  the  crop 
season.  The  margin  between  the  normal  yield  and 
the  CCE  A-predicted  yield  widened  especially  during 
June,  largely  because  of  abundant  rainfall  over  nearly 
75  percent  of  Canada's  wheat  producing  region.  The 
average  of  the  Prairie  Province  yields  predicted  by 
the  agrometeorological  models  then  held  at  27  to  28 
bushels  per  acre  through  the  end  of  the  season, 
finishing  J to  4 bushels  per  acre  lower  than  the  offi- 
cial Canadian  estimate.  September's  estimate  fell 


within  10  percent  of  that  from  Statistics  Canada; 
however,  a boost  in  forecast  production  by  the  Cana- 
dians in  October  pushed  the  yield  difference  to 
almost  11  percent. 

Although  Manitoba's  official  yield  (27.1  bushels 
per  acre)  was  beneath  the  agrometeorological  statisti- 
cal trend  (28.0  bushels  per  acre),  the  input  of  weather 
data  to  the  model  pushed  LACIE  predicted  yields  to 
more  than  10  percent  below  official  figures.  This 
would  indicate  a collective  overreaction  by  the 
Manitoba  models  to  the  known  moisture  shortage  in 
that  region  the  summer  of  1976.  Although  rainfall 


was  off  as  much  as  40  percent  in  parts  or  the  prov- 
ince, the  effect  was  apparently  not  as  extensive  as  the 
yield  models  seemed  to  indicate.  LACIE's  final  yield 
predictions  for  Manitoba  were  14  percent  below  the 
final  Canadian  figure. 

Across  the  border  in  Saskatchewan,  overall  ac- 
curacy improved.  September  comparisons  showed 
only  a 2-percent  LAC  IE  underestimation.  By  Octo- 
ber, the  margin  had  widened  to  just  over  7 percent, 
due  mostly  to  an  upward  1.6-bushel-per-acre  revision 
by  the  Canadians.  The  Saskatchewan  models  ac- 
curately reflected  the  season's  favorable  weather, 
although  collectively  they  did  not  react  as  sharply  as 
conditions  appeared  to  warrant. 

Of  the  yield  predictions  covering  the  three  Prairie 
Provinces,  the  Alberta  estimate  showed  the  widest 
margin  of  error,  underestimating  ;'ne  official  spring 
wheat  yield  figure  of  32.5  busheis  per  acre  by  more 
than  7 bushels  per  acre.  Throughout  the  season, 
CCEA-predicted  yields  hovered  around  the  trend 
line,  checked  by  below-normal  precipitation  from 
May  through  July  throughout  90  percent  of  the 
Alberta  wheat  producing  sector.  Combined,  the  three 
Alberta  models  appeared  to  be  overly  influenced  by 
the  July  precipitation  variable.  When  this  is  added  to 
the  large  negative  trend  variable  adjustment  associ- 
ated with  the  July  truncation,  yields  in  that  month 
alone  dropped  over  2 bushels  per  acre,  at  a time  when 
crops  were  apparently  progressing  well. 

The  CCEA  Canadian  agrometeorological  yield 
models  for  1976  collectively  show  a significant  bias 
toward  underestimation,  both  where  crop  conditions 
were  considered  good  to  excellent  (Saskatchewan 
and  Alberta)  and  where  they  were  somewhat  less 
favorable  because  of  insufficient  rainfall  (Manitc'-..). 
In  general,  the  model  reaction  to  weather  variables 
tended  toward  an  overreaction  to  unfavorable 
weather  and  an  underreaction  to  favorable  weather 
(table  VII).  For  the  three  Prairie  Provinces,  yield 
model  performance  for  the  1976  season  appeared  to 
be  just  outside  a 90-percent  level  of  accuracy  (89.1 
percent). 


TECHNICAL  ISSUES 

After  an  extensive  evaluation  of  the  major  factors 
that  affected  the  final  LACIE  Canadian  estimates, 
two  stand  out  as  the  principal  sources  of  these  lower- 
than-expected  estimates: 

1.  A change  in  the  area  planted  to  wheat  since 
1971  which  affected  the  wheat-to-small-grains  ratios 


Table  V//.— LACIE  Yield  Mode!  Reaction  to 
197 6 Meteorological  Conditions 
Compared  With  Official  Source0 


Ptmimv 

Crop  season  weather 
(hi 

Yield  model  response 
(b) 

Manitoba 

Unfavorable 

Overreaction 

Saskatchewan 

Favorable 

Underreaction 

Alberta 

Favorable 

Underreaction 

Three  Prairie 

Favorable 

Underreaction 

Provinces 

Huiauc*  Canada.  Scpc  10, 1916 
^Definition* 

Favorable— model-predicted  and  official  yields  above  normal  trend. 
Unfavorable— model-predicted  and  official  yields  below  normal  trend. 
Overreacuon— modcl-predtcied  yield  deviates  in  same  direction  from  normal  as 
official  yield  but  vanes  by  s greater  magnitude. 

UndencKtion— modet*predKtcd  yield  deviates  m same  direction  from  normal  as 
official  yield  bot  vanes  by  a lesser  magnitude 


used  in  the  current  aggregation  system  (1971  was  the 
base  year  used  for  generating  the  LACIE  reports) 

2.  An  overall  underestimate  due  to  sampling  and 
classification  error 

By  updating  the  ratios  with  1975  data,  the  area 
estimate  increased  to  20.8  million  acres,  or  22  percent 
below  the  official  estimate.  Ratios  based  on  the  1976 
official  estin  • ; improved  the  LACIE  estimate  by  4 
percent,  but  this  was  still  substantially  below  the  offi- 
cial figure. 

The  following  sections  review  the  five  major  prob- 
lems that  were  encountered  during  the  Phase  II 
Canadian  analysis. 


Allocation  Data 

The  first  sample  segment  allocation  for  Canada 
was  done  on  the  basis  of  the  1 971  census  and  census 
district  boundaries.  Th:  allocation  used  the  sampled 
portion  of  the  political  area  covered  by  the  census, 
not  the  total  area  within  the  subdivision  h >undaries. 
To  match  the  yield  model  geographic  boundaries,  a 
second  allocation  was  done  us  ng  the  yield  model 
areas  as  strata  and  census  subdivisions  as  substrata. 
This  new  allocation  showed  that  about  18  segments 
of  the  283  should  have  been  moved.  However,  no 
segments  were  moved,  and  the  Group  I and  Group  II 
substrata  retained  the  segments  as  previously  placed. 
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The  L ACIE  aggregations  for  Canada  in  1976  were 
done  using  as  strata  the  crop  districts,  the  boundaries 
of  which  essentially  coincided  with  census  district 
boundaries,  except  in  Saskatchewan  where  the  boun- 
daries were  slightly  different.  The  yield  model  areas 
ar?  aggregates  of  crop  districts,  and  the  calculated 
yields  are  assigned  to  all  crop  districts  (area  strata) 
contained  within  the  yield  model  area. 

The  reallocation  for  Canada  using  yield  strata 
resulted  in  the  following  distribution  of  segments. 


Itonimv 

Total 

(innt/i  1 

(inwp  II 

Saskatchewan 

170 

153 

17 

Alberta 

75 

74 

1 

Manitoba 

38 

14 

24 

The  Group  I area  estimate  was  unbiased.  The  Group 
II  segments  presented  a problem  because  the  seg- 
ments were  chosen  with  probabilities  proportional  to 
size  based  on  the  old  allocation  and  aggregated  with 
weights  based  on  the  new  aggregation,  thus  incurring 
some  bias.  However,  since  Saskatchewan  and  Alber- 
ta contain  10  percent  or  fewer  Group  II  segments,  the 
bias  is  not  considered  significant.  In  Manitoba,  63 
percent  of  the  segments  were  Group  II.  The  census 
district  and  crop  district  boundaries  were  the  same, 
with  yield  models  covering  aggregates  of  the  crop  dis- 
tricts. These  models  provided  a single  yield  estimate 
at  a giver  time  which  was  used  for  all  crop  districts 
covered  oy  the  yiejd  model.  Therefore,  the  aggrega- 
tions performed  during  Phase  II  may  be  considered 
to  have  provided  unbiased  area  estimates  and  valid 
area  variance  estimates.  Further,  the  yield  and  pro- 
duction variances  can  be  calculated  at  the  yield  strata 
level  and  higher  because  the  boundaries  coincide 
witl  multiples  of  crop  districts  except  for  a minor 
deviation  in  Alberta. 


Historical  Data 

One  of  the  major  problems  encountered  in  work- 
ing Canada  was  the  lack  of  historical  data  at  the  level 
at  which  the  original  allocation  was  generated.  The 
original  allocation  was  based  on  data  contained  in  the 
1971  Canadian  Census  of  Agriculture.  This  census  is 
conducted  every  S years,  with  publication  approx- 
imately 2 years  after  data  collection.  This  publication 
is  the  only  one  the  LACIE  project  has  been  able  to 
obtain  that  contains  area  statistics  at  the  county  level 
(no  production  statistics  are  contained  in  this  cen- 


sus). All  other  statistics  produced  by  Statistics 
Canada  are  at  the  province  or  crop  district  level. 

Most  of  the  data  on  Canada  available  to  LACIE 
besides  the  1971  census  consisted  of  statistics  at  the 
province  or  crop  district  level  on  wheat  area  and  pro- 
duction, with  spotty  coverage  of  other  crops.  No 
comprehensive  set  of  data  for  small  grains  or  other 
crops  was  available.  Data  received  from  Statistics 
Canada  in  the  late  fall  of  1976  greatly  improved  the 
data  available  at  the  province  level,  but  data  below 
that  level  are  still  lacking. 

The  importance  of  having  a comprehensive  data 
set  over  a period  of  years  cannot  be  overstated.  These 
data  provide  a valuable  tool  for  understanding  the 
changing  patterns  of  agriculture.  Such  understanding 
serves  as  a base  for  comparison  of  LACIE  estimates, 
helps  improve  ratioing  procedures,  and  makes  it 
possible  to  track  long-term  trends  in  agricultural 
policy.  The  lack  of  these  data  has  caused  major  con- 
cern, especially  in  the  development  of  ratios  and  the 
analysis  of  factors  affecting  the  LACIE-generated 
estimates. 


CAS  Ratioing  Methodology 

Raiioinfi  pnxvdure. — During  Phase  II,  Landsat 
data  processed  by  CAMS  was  passed  to  CAS  with 
classifications  for  spring  small  grains  or  small  grains. 
Since  the  CAS  aggregation  system  was  designed  to 
aggregate  wheat,  a procedure  for  ratioing  was 
employed  to  derive  wheal  from  small  grains. 
Preprocessing  software  was  written  to  accommodate 
this  ratioing  procedure.  This  software  was  designed 
to  ratio  the  small  grains  estimates  at  the  segment 
level.  Two  types  of  ratios  were  included  in  this 
preprocessing  step  for  Canada:  (1 ) spring  wheat  area 
to  total  small  grains  area  and  (2)  spring  wheat  area  to 
spring  small  grains  area 

Ratios  for  Canada  were  derived  using  the  1971 
census  data,  since  it  was  the  only  source  available 
that  contained  data  on  other  crops  at  the  county  level 
(barley,  oats,  rye,  mixed  small  grains).  For  each  of 
the  283  sample  segments,  ratios  were  constructed  on 
the  basis  of  the  county  statistics  for  each  segment. 
These  ratios  were  used  for  the  three  LACIE  r<'t*orts 
generated  in  August,  September,  and  Octobei 

in  October,  additional  data  were  received  that  up- 
dated LACIE  statistics  for  Canada.  This  information 
included  data  through  the  197$  crop  season  on  other 
crops  at  the  Canadian  crop  district  level.  These 
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figures  were  then  used  to  construct  ratios  at  the  crop 
district  level.  Instead  of  each  segment  having  a 
unique  ratio  based  on  statistics  for  the  co'rnty  in 
which  that  segment  was  located,  ratios  baseu  on  the 
197$  data  were  common  for  all  segments  within  each 
crop  district.  The  derived  wheat  estimates  based  on 
this  197$  data  were  used  to  generate  the  (Inal  esti- 
mate contained  in  this  report. 

Changes  in  wheat  area,  1971  to  1976. — One  of  the 
most  critical  elements  in  determining  the  reliability 
of  a ratio  procedure  is  the  stability  or  variability  of 
wheat  area.  Since  the  late  1960's,  some  major  shifts 
have  occurred  in  Canadian  agriculture  that  have  had 
decided  effects  on  the  ratioing  methodology 


employed  by  CAS.  Since  1970,  there  has  been  a grad- 
ual shift  toward  increased  planting  of  wheat  This  is 
true  not  only  in  Canada  but  in  ail  major  wheat-grow- 
ing countries.  In  the  Prair.e  Provinces  as  a whole, 
this  shift  has  been  predominantly  at  the  expense  of 
reduced  plantings  of  flax  and  rapeseed  and  reduced 
summer  fallow  acreage  (tables  VIII  and  IX).  Only 
minor  shifts  in  the  proportion  of  wheat  versus  other 
small  grains  have  occurred  over  the  long  run,  but 
year-to-year  variability  is  high. 

in  a review  of  tables  VIII  and  IX  at  the  province 
level,  one  of  the  most  striking  items  is  that,  while 
areas  devoted  to  all  small  grains  and  miscellaneous 
crops  were  at  record  levels  in  1971,  the  area  planted 


Tabu:  VIII. — Changes  In  Crop  Area  In  the  Three  Canadian  Prairie 

Provinces 


/In  mn'%1 


( rap  year 

Wheat 

Other  malt 

flax 

Bapneed 

Summer  fallow 

ttrohrP 

Stnkaiihewm 

1965-74  av|. 

15  719  800 

5 741900 

667400 

127)400 

17865600 

1971 

12925117 

• 700  290 

924715 

27)6  555 

16  559  825 

1975 

15200000 

5150000 

450000 

1800000 

18200000 

1976  (cst.) 

17400000 

5410000 

225000 

850000 

18000000 

Alberto 


1965*74  avg. 

499)900 

7 246600 

507)00 

1068 100 

72)8900 

1971 

544)  )|l 

8800494 

270  75) 

1987625 

7008714 

1975 

4500000 

7800000 

200000 

1 700000 

6900000 

1976  (eel.) 

5600000 

7950000 

100000 

850000 

6700000 

Muttiiaha 

1965-74  cvg. 

282)400 

5179  500 

850  )00 

509  800 

2178400 

1971 
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to  wheal  was  significantly  reduced  compared  to  re- 
cent years.  The  result  is  an  unusually  tow  wheat-to- 
small-grains  ratio  for  1971.  In  all  the  comparative 
data  for  Canada,  1971  seems  to  have  been  an  unusual 
year  as  far  as  wheat  was  concerned. 

In  Saskatchewan,  the  wheat-io-small-grains  per- 
centage increased  16  percent  from  1971  to  the  1976 
estimate  but  only  3 percent  over  the  10-year  average. 
Fallow  acreige  has  remained  fairly  constant  except 
in  1971  when  fallow  acreage  was  substantially  lower 
than  the  average. 

Alberta  has  the  lowest  wheat  acreage  of  the  three 
provinces.  In  1971,  the  wheai-to-small-grains  ratio 
was  29  percent;  it  increased  to  41  percent  in  the  1976 


estimate,  the  same  as  the  10-year  average.  Summer 
fallow,  rapeseed,  and  flax  acreage  has  been  con- 
tinually declining  since  1971,  giving  way  to  increased 
acreage  planted  to  smalt  grains. 

Manitoba  has  followed  very  closely  the  pattern  of 
Saskatchewan.  Wheat  acreage  increased  IS  percent 
between  1971  and  the  1976  estimate.  Summer  fallow 
acreage  has  remained  fairly  stable  since  1971,  while 
wheat  acreage  increased  at  the  expense  of  other  small 
grains,  flax,  and  rapeseed  acreage. 

While  there  have  been  dramatic  changes  in  wheat 
acreage  over  the  past  rcveral  years,  this  fact  does  not 
fully  account  for  the  problems  associated  with  the 
LACIE-generated  estimates.  The  end-of-season  re* 


Tahu  IX. —Changes  In  Wheal  Area  In  the  Three  Canadian  Prairie  Provinces 
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port  (October)  wing  .1971  ratios  produced  an  area 
estimate  of  17.3  million  acres,  3S  percent  below  the 
official  Canadian  estimate.  Updating  the  ratio  with 

1975  data  improved  the  estimate,  but  !t  was  still  22 
percent  below  the  official  figure.  If  the  province-level 

1976  estimates  are  used,  only  a slight  improvement 
(4  percent)  is  realized.  Econometric  models  were 
developed  to  predict  the  confusion  crop  ratios  for 
Canada  at  the  crop  district  level  at  the  end  of  Phase 
II;  however,  because  of  the  changes  in  project  scope, 
these  models  were  not  used  in  Phase  111.  Thus,  an  18- 
percent  difference  between  the  LAC1E  estimate  and 
the  official  estimate  remains.  In  all  probability,  this 
difference  can  be  accounted  for  by  sampling  and 
classification  error,  since  all  other  factors  that  might 
contribute  to  this  underestimate  have  been  analyzed. 


Sampling  and  Classification 

To  date,  the  Accuracy  Assessment  Group  has  not 
completed  the  analysis  of  the  Canadian  intensive  test 
site  data  to  determine  the  type  and  magnitude  of 
sampling/classification  problems  in  Canada.  Plans 
for  completion  of  this  analysis  are  still  to  be  deter- 
mined. 

Some  of  the  preliminary  analyses  performed  on 
the  U.S.  spring  wheat  intensive  test  sites  and  blind 
sites  have  indicated  that  the  classification  analysts 
have  tended  to  underestimate  the  amount  of  small 
grains  within  a segment.  This  tendency  also  seems  to 
affect  the  Canadian  estimates  received  during  the 
1976  crop  season. 

Interpretation  of  Canadian  Landsat  data  was  more 
difficult  than  interpretation  of  other  areas  because  of 
the  dissected  topography,  high  confusion-crop  prob- 
lems. and  the  need  to  perform  multitemporal 
analysis  for  improved  identification.  As  a result  of 
these  major  problems,  interpretation  of  Canadian 
segments  requires  substantially  more  data  and 
analysis  time  to  produce  a usable  estimate.  Since 
multitcmporal  analysis  is  almost  a necessity  in 
generating  an  estimate  in  Canada,  a lack  of  adequate 
acquisitions  can  cause  serious  problems  in  the 
interpretation  process. 


Segment  Acquisition  end  Image  Analysis 

An  analysis  of  spring  small  grains  area  estimates 
for  Canada,  when  were  transmitted  to  CAS.  can  be 
summarized  as  follows. 


1.  Of  the  170  sample  segments  allocated  to 
Saskatchewan,  152  had  spring  small  grains  area  esti- 
mates of  greater  than  0 percent;  7 had  0-percent  esti- 
mates; and  11  had  no  acquisitions  suitable  for  in- 
terpretation. About  80  percent  of  the  152  segments 
(122  segments)  had  no  significant  change  in  esti- 
mates between  the  time  of  early  jointing  and  heading 
(about  June  IS  to  July  IS)  and  harvest  (about  mid- 
September). 

About  10  percent  of  the  152  sample  segments  (15 
segments)  had  a significant  change  in  area  estimate 
between  the  early  estimate  and  the  harvest  estimate. 
In  most  of  these  segments,  the  area  estimates  in- 
creased. but  in  two  the  revised  estimate  was  smaller. 
Approximately  10  percent  of  the  152  sample  seg- 
ments (16  segments)  had  no  acquisitions  at  harvest. 

It  is  believed  that  the  distribution  of  overall  ac- 
quisitions and  the  lack  of  at-harvest  acquisitions  on 
10  percent  of  the  sample  segments  for  which  esti- 
mates were  transmitted  to  CAS  had  no  significant 
impact  on  the  area  aggregation  for  Saskatchewan. 

2.  A teal  of  75  sample  segments  wss  allocated  to 
Alberta;  these.  80  percent  (60  segments)  had  area 
estimates  transmitted  to  CAS.  Only  one  sample  seg- 
ment was  estimated  to  have  0>percent  spring  small 
grains.  One  segment  had  an  increase  in  area  estimate 
between  early  processing  (about  jointing  stage)  and 
harvest.  Three  segments  had  no  at-harvest  acquisi- 
tions. 

There  is  no  apparent  reason  for  these  conditions 
to  have  had  a significant  effect  on  the  area  aggrega- 
tion for  Alberta. 

3.  Of  the  38  sample  segments  allocated  to 
Manitoba.  2 segments  had  no  acquisitions  s.titablc 
for  interpretation.  The  remaining  36  sample  seg- 
ments can  be  evaluated  as  follows: 

a.  About  60  percent  (20  segments)  had  no  sig- 
nificant change  in  estimates  between  early  proc- 
essing (jointing  to  heading)  and  harvest.  About  25 
percent  (9  segments)  had  significant  changes  in  area 
estimate  between  the  early  estimate  and  the  at-har- 
vest  estimate. 

b.  Approximately  15  percent  of  the  36  seg- 
ments (6  segments)  had  no  at-harvest  acquisitions 
and  I segment  had  an  at-harvest  estimate  only. 

There  ere  no  indications  that  these  conditions 
should  have  significantly  affected  the  CAS  aggrega- 
tions for  Manitoba. 
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Accuracy  and  Performance  of 
LACIE  Area  Estimates 

/ F.  Potter ,a  £.  M.  Hsu,a  A.  G.  Houston ,b  and  D.  E.  Pittsb 


INTRODUCTION 

The  accuracy  assessment  effort  is  designed  to 
check  the  accuracy  of  the  LACIE  estimates  of  wheat 
production,  area,  and  yield  throughout  the  growing 
season  to  determine  whether  the  operational  pro- 
cedures are  sufficient  to  satisfy  the  LACIE  project 
goals  and  to  identify  problem  areas  in  the  estimation 
process. 

In  this  papjr,  the  results  obtained  in  assessing  the 
accuracy  of  the  LACIE  acreage  estimates  in  the 
United  States  are  discussed.  The  accuracy  assess- 
ment of  yield  and  production  estimates  is  discussed 
elsewhere  in  this  volume  (see  the  paper  by  Phinney 
et  al.  entitled  “Accuracy  and  Performance  of  LACIE 
Yield  Estimates  in  Major  Wheat  Producing  Regions 
of  the  World,”  the  paper  by  Marquis  entitled 
“LACIE  Area,  Yield,  and  Production  Estimate 
Characteristics:  U.S.  Great  Plains,”  the  paper  by 
Hickman  entitled  “LACIE  Area,  Yield,  and  Produc- 
tion Estimate  Characteristics:  U.S.S.R.,”  and  the 
paper  by  Conte  et  al.  entitled  “LACIE  Area,  Yield, 
and  Production  Estimate  Characteristics:  Canada”). 

Although  the  accuracy  assessment  studies  dis- 
cussed in  this  paper  are  conducted  in  the  U.S.  Great 
Plains  region,  these  studies  are  also  designed  to  pro- 
mote the  development  of  procedures  that  can  be 
used  to  obtain  accurate  estimates  for  other  parts  of 
the  world. 


REGIONS  OF  THE  U.S.  GREAT  PLAINS 

In  this  paper,  results  are  given  for  a number  of 
regions  within  the  U.S.  Great  Plains.  These  regions 
are  defined  as  follows. 

1.  The  U.S.  Southern  Great  Plains  (USSGP) 


“Lockheed  Electronics  Company,  Houston.  Texas. 
^NASA  Johnson  Space  Center,  Houston,  Texas. 


region  consists  of  Colorado,  Kansas,  Nebraska, 
Oklahoma,  and  Texas.  Only  winter  wheat  estimates 
have  been  made  for  these  states. 

2.  The  spring  wheat  (SW)  regioi.  onsists  of  Min- 
nesota and  North  Dakota.  These  states  have  very  lit- 
tle winter  wheat,  so  LACIE  has  made  estimates  for 
spring  wheat  only. 

3.  The  mixed  wheat  (MW)  region  consists  of 
Montana  and  South  Dakota.  These  states  have  both 
spring  and  winter  wheat. 

4.  The  U.S.  Northern  Great  Plains  (USNGP) 
region  consists  of  the  two  spring  wheat  states  and  the 
two  mixed  wheat  states. 

5.  The  U.S.  Great  Plains  (USGP)  region  consists 
of  the  nine  states  of  the  USSGP  and  the  USNGP. 


PHA8E I (CROP  YEAR  1 974-75) 

Phase  I of  the  LACIE  project  concentrated  on  the 
estimation  of  wheat  acreage.  Yield  and  production 
feasibility  studies  were  also  carried  out,  but  the  ac- 
curacy assessment  team  investigated  only  the  ac- 
curacy of  acreage  estimation. 


The  90/90  Criterion 

Detailed  discussions  of  the  90/90  criterion  and  the 
Phase  I estimates  are  given  in  the  paper  by  Marquis. 
It  was  found  that  the  estimates  for  winter  wheat  in 
the  U.S.  Southern  Great  Plains  did  support  the  90/90 
criterion  but  that  the  total  wheat  estimates  for  the 
U.S.  Northern  Great  Plains  and  the  U.S.  Great  Plains 
regions  did  not. 


Area  Error  Source  Analytes 

Comparison  of  LACIE  and  USD  A SRS  acreage  esti- 
mates.— Table  I shows  the  comparison  of  the  LACIE 


Original  photography  msy  Dt  ftirtfusad  ItOflU 

LiCS  Data  Center 

Sioux  Falls.  SQ  5*7 19  j( 


527 


Table  l. — Comparison  of  SRS  and  LACIE  At-Harvest  Estimates  of  Wheat  Area a 


Region 

n/Mb 

SRS. 

LACIE. 

RDS 

CV.d 

Test 

thousands 

thousands 

percent 

percent 

statistic 

of  acres 

of  acres 

Winter  wheat 


Colorado 

24/32 

2 260 

3 058 

26.1 

20.8 

Kansas 

55/84 

12100 

12  940 

6.5 

7.1 

Nebraska 

23/3S 

3070 

2657 

-15.5 

28.0 

Oklahoma 

29/40 

6 700 

6906 

3.0 

11.2 

Texas 

28/49 

5 700 

4 218 

—35.1 

32.6 

USSGP 

159/290 

29830 

29  779 

-2 

7.0 

-0.03 

Spring  wheat 

Minnesota 

9/13 

2 844 

2 150 

— 32.3 

15.7 

North  Dakota 

42/65 

10  213 

5 853 

-74.5 

14.8 

SW  states 

51/78 

13  057 

8003 

-63.2 

11.6 

Total  wheat 


Montana 

39/60 

4975 

3999 

-24.4 

259 

South  Dakota 

23/33 

3003 

4 154 

27.7 

17.7 

MW  states 

62/93 

7978 

8 153 

2.2 

<56 

USNGP 

113/171 

21035 

16156 

-30.2 

9.8 

e — 3.11 

USGP 

272/411 

50865 

45  935 

— 10.7 

5.7 

e — 1.88 

Projected  to  272/637  3.7 

national 


*lACtE  Mlimilw  bued  on  CAMS  rework  dais. 

^The  n is  the  number  of  segments  used,  the  .If  is  the  number  of  segments  Allocated. 
‘Relative  difference  - (LACIE  - SRS)  «-  LACIE  x 100 
^Coefficient  of  variation  • standard  deviation  -*■  LACIE  x J00 

eThe  LACIE  estimate  is  significantly  different  from  the  SRS  estimate  at  the  10-percent  level 


and  the  U.S.  Department  of  Agriculture  Statistical 
Reporting  Service  (USDA  SRS)1  estimates.  A test 
statistic  is  given  showing  whether  the  LACIE  esti- 
mate is  significantly  different  from  the  USDA  esti- 
mate. The  derivation  and  interpretation  of  the  test 
statistic  are  described  in  the  paper  by  Houston  et  al. 
entitled  “Accuracy  Assessment:  The  Statistical  Ap- 
proach to  Performance  Evaluation  in  LACIE." 

For  winter  wheat  in  the  USSGP  region,  the 
LACIE  estimate  is  very  close  to  the  SRS  estimate 
and,  according  to  the  statistical  test,  is  not  signifi- 
cantly different  from  it.  For  spring  wheat  in  the  SW 
states,  the  LACIE  estimate  is  much  lower  than  the 
SRS  estimate.  The  LACIE  estimate  for  total  wheat  in 
the  MW  states  is  slightly  higher  than  the  SRS  esti- 


'The  Statistical  Reporting  Service  has  since  become  a part  of 
the  Economics.  Statistics,  and  Cooperatives  Service. 


mate;  therefore,  the  large  (and  statistically  signifi- 
cant) underestimate  for  total  wheat  at  the  USNGP 
level  is  due  to  the  large  underestimate  for  the  SW 
states,  especially  North  Dakota.  The  same  is  true  for 
the  statistically  significant  underestimate  for  total 
wheat  at  the  USGP  level.  An  analysis  of  the  problem 
in  North  Dakota  showed  that  the  major  source  of  er- 
ror was  sampling. 

Study  of  classification  and  sampling  error  using 
ground-observed  proportions. — The  expression  “blind 
site"  is  a designation  applied  to  selected  operational 
segments  for  which,  unknown  to  th  ■ analyst, 
ground-truth  data  were  acquired  for  evaluation  pur- 
poses. The  implementation  of  this  approach  oc- 
curred late  in  the  growing  season  of  LACIE  Phase  I. 
Thus,  all  of  the  selected  sites  were  in  the  northern 
spring  wheat  regions. 

High-resolution  color-infrared  aerial  photography 
of  29  LACIE  segments  in  North  Dakota  and  Mon- 


tana  was  acquired  in  mid-August  1975.  Within  a few 
days  following  the  photography,  field  teams  col- 
lected ground  information  for  a substantial  portion 
of  these  segments. 

Figure  1 shows  plots  of  the  ground-observed  seg- 
ment proportions  and  the  SRS  county  proportions 
versus  the  LACIE  proportions  for  16  blind  sites  in 
North  Dakota.  All  proportions  are  for  small  grains. 
The  LACIE  estimate  is  fairly  representative  of  the 
ground-truth  proportion.  Indeed,  at  the  10-percent 
level  of  significance,  the  average  LACIE  estimate  is 
not  significantly  different  from  the  average  ground- 
truth  proportion.  However,  it  is  clear  that  the  LACIE 
estimate  is  not  at  all  representative  of  the  SRS  county 
proportions.  The  ground-observed  spring  small- 
grains  proportions  are  38  percent  below  the  corre- 
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EKJl'RE  I. — Regression  of  ground-observed  segment  propor- 
tions and  SRS  county  proportions  on  LACIE  proportions  of  small 
grains. 


sponding  SRS  county  spring  small-grains  propor- 
tions. These  results  indicate  that  sampling  error 
resulting  from  nonrepresentative  sample  segments 
was  the  major  source  of  the  observed  bias  in  the 
acreage  estimate  for  North  Dakota.  Other  investiga- 
tions with  full-frame  imagery  confirmed  that 
agriculture  is  very  heterogeneous  in  this  region  and 
many  of  the  LACIE  segments  did  not  adequately 
represent  their  counties. 


Special  Studies 

In  Phase  I,  a number  of  special  studies  were  con- 
ducted to  investigate  various  aspects  of  LACIE  pro- 
cedures. They  are  described  in  detail  in  reference  1. 
The  major  studies  are  summarized  as  follows. 

Study  of  the  effects  of  site,  biophase,  and  AI. — One 
study  was  conducted  to  investigate  the  effects  of 
three  major  factors — site,  biophase,2  and  analyst  in- 
terpreter (AI) — on  errors  in  the  estimation  of  seg- 
ment small-grains  proportions.  All  14  Al's  operating 
within  the  Classification  and  Mensuration  Sub- 
system (CAMS)  for  the  LACIE  Phase  I operations 
participated  in  this  experiment.  The  test  was  run  on 
two  intensive  test  sites  (ITS’s):  segment  1969,  Toole 
County,  Montana;  and  segment  1976,  Franklin 
County,  Idaho.  These  segments  were  selected 
because  multispectral  scanner  (MSS)  data  were 
available  for  all  four  biophases.  (Classifications  for  at 
least  one  biophase  were  missing  for  all  the  other 
ITS's.)  Each  AI  was  required  to  interpret  each 
biophase  acquisition  for  each  segment  using  the 
Phase  I operational  procedure.  This  resulted  in  a total 
of  56  small-grains  proportion  estimates  for  each  seg- 
ment. 

The  analysis  of  the  data  produced  the  following 
results. 

1.  The  error  in  proportion  estimation  varied  sig- 
nificantly from  one  ITS  to  another. 

2.  There  was  a significant  difference  in  the  rela- 
tive performance  between  Al’s  from  one  segment  to 
another. 

3.  Use  of  biophase  1 increased  the  accuracy  for 
one  ITS  and  decreased  it  for  the  other. 

It  is  important  to  note  that  the  experiment  used  only 
two  sites  so  the  results  should  not  be  widely  applied. 


2The  four  biophases  in  wheat  development  are  defined  as  (I) 
tillering  to  jointing,  (2)  jointing  to  heading,  (3)  heading  to  soft 
dough,  and  14)  soft  dough  to  harvest. 
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Four-Al  study  of  the  tiffed  of  small-grains  propor- 
tion, amount  of  training  data,  and  biophase. — In 
another  study,  four  Al's  working  independently  and 
using  the  CAMS  rework  procedures  analyzed  all  of 
the  acquisitions  over  the  23  Phase  I ITS’s  that  had  ac- 
quisitions satisfying  the  CAMS  rework  criteria.  The 
results  were  used  to  study  (1)  the  effect  of  the  pro- 
portion of  small  grains  in  the  segment  on  proportion 
error,  (2)  the  effect  of  the  amount  of  training  data  on 
proportion  error,  and  (3)  the  effect  of  biophase  on 
labeling  accuracy. 

The  results  showed  that  the  proportion  of  small 
grains  in  the  segment  had  a pronounced  effect  on 
CAMS  proportion  error— the  sites  that  were  low  in 
small  grains  tended  to  be  overestimated  and  the  sites 
that  were  high  in  small  grains  tended  to  be  underesti- 
mated. A theoretical  explanation  of  why  this  effect 
occurs  is  given  in  reference  1. 

It  was  found  that  only  limited  information  could 
be  gained  on  the  effect  on  proportion  error  of  the 
amount  of  data  used  to  train  the  classifier  because 
the  amount  of  training  data  selected  by  the  AI's  was 
very  site  dependent  and  proportion  error  was  also 
very  site  dependent.  It  appeared  that  there  was  a 
slight  reduction  in  proportion  error  as  the  number  of 
training  pixels  increased. 

In  the  investigation  of  the  effect  of  biophase  on 
labeling  accuracy,  results  were  obtained  for  eight 
biophase  combinations.  The  best  combination  was 
biophases  1,  3,  and  4.  However,  these  results  were 
not  very  accurate  because  only  a few  sites  were 
averaged  for  each  combination  and  labeling  ac- 
curacies varied  greatly  from  site  to  site. 

Crop  calendar  verification. — To  assess  the  perfor- 
mance of  the  adjustable  crop  calendar  (ACC),  the 
ACC's  for  12  crop  reporting  districts  (CRD's)  having 
intensive  test  sites  were  compared  to  the  correspond- 
ing historical  crop  calendars  and  to  the  development 
stages  determined  by  ground  observations  on  the 
ITS’s. 

The  ACC  performance  during  the  jointing-to-soft- 
dough  stage  for  winter  wheat  and  the  planting-to- 
soft-dough  stage  for  spring  wheat  in  the  U.S.  Great 
Plains  appeared  to  be  quite  good.  The  biggest  discrep- 
ancies were  at  the  beginning  of  the  period  covered  by 
the  ACC — at  jointing  for  winter  wheat  and  at  plant- 
ing for  spring  wheat.  An  8-  to  10-day  disagreement 
occurred  between  the  dates  the  USDA  reported  for 
the  CRD  (which  were  used  as  starter  dates  for  the 
ACC)  and  the  ITS  ground-truth  data.  The  ITS 
ground-truth  and  ACC  output  were  closest  to  agree- 
ment at  the  heading  and  soft-dough  stages.  Indica- 


tions were  that  more  accurate  starter  dates  would 
have  allowed  the  ACC  to  perform  more  accurately 
throughout  the  spring  and  summer. 

The  results  of  the  study  showed  that 

1.  Accurate  starter  models  for  spring  wheat  are 
vital  to  good  overall  performance  of  the  ACC. 

2.  Proper  operation  of  the  ACC  for  winter  wheat 
before  and  during  dormancy  to  provide  an  accurate 
estimate  of  jointing  in  spring  is  vital  to  the  overall 
operation  of  the  ACC  for  winter  wheat. 


PHA8E II  (CROP  YEAR  1 975-76) 

As  a result  of  the  blind  site  investigations  in 
North  Dakota  at  the  end  of  Phase  I,  20  segments 
were  added  to  North  Dakota  for  Phase  II  to  alleviate 
the  sampling  problem.  Also,  the  analysis  of  pre- 
viously unavailable  Landsat  imagery  over  large  areas 
showed  some  sample  segments  to  be  in  non- 
agricultural  areas.  These  segments  were  moved  into 
agricultural  areas  for  Phase  II.  The  blind  site  in- 
vestigations also  indicated  a tendency  to  underesti- 
mate small-grains  proportions.  Therefore,  for  Phase 
II,  the  number  of  blind  sites  was  increased  substan- 
tially in  order  to  investigate  classification  problems 
further. 

Early  in  Phase  I.  it  was  found  that  the  analysts 
could  not  reliably  separate  spring  wheat  from  other 
small  grains.  This  procedure  required  the  use  of 
historical  ratios  of  wheat  to  small  grains  to  obtain  a 
wheat  area  estimate.  In  Phase  II,  the  use  of  historical 
county-level  ratios  for  wheat  area  estimation  was 
continued. 

In  Phase  II,  LACIE  estimates  were  available  for 
acreage,  yield,  and  production.  Most  of  this  section  is 
devoted  to  a discussion  of  the  accuracy  assessment 
results  on  acreage  estimation  in  the  USGP  yardstick 
region.  However,  it  begins  with  a brief  review  of  the 
90/90  evaluation  and  the  relative  contribution  of  area 
and  yield  errors  to  the  production  error.  A more 
complete  discussion  of  the  yield  and  production 
results  is  given  in  the  papers  by  Phinney  and  Mar- 
quis, respectively. 


The  90/90  Evaluation  and  Production 
Senaltlvlty  Analysis 

As  in  Phase  I,  it  was  found  that  the  winter  wheat 
production  estimates  supported  the  90/90  accuracy 
goal  but  the  spring  wheat  and  total  wheat  production 
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estimates  did  not.  Tables  II  and  III  show  the  sen- 
sitivity analysis  results  on  the  effect  of  errors  in  area 
and  yield  on  the  variability  and  bias  of  the  produc- 
tion estimates.  (The  methods  used  «o  determine 
these  results  are  described  in  the  paper  by  Houston  et 
al.)  Table  II  indicates  that  yield  error  contributed 
slightly  more  to  production  estimate  variability  than 
area  error;  table  III  indicates  that  the  underestimates 
in  production  were  due  primarily  to  underestimates 
of  area. 


Area  Error  Source  Analyses 

Comparison  of  LACIE  and  SRS  acreage  esti- 
mates—These  comparisons  are  designed  to  monitor 
how  well  LACIE  is  performing  throughout  the  crop 
year  and  to  detect  any  problems  that  may  exist.  The 
LACIE  and  SRS  acreage  estimates  are  shown  in 
figure  2 and  table  IV.  In  the  following  discussion, 
winter  wheat  is  considered  first,  followed  by  spring 
wheat,  then  total  wheat.  Figure  2 and  table  IV  are  ar- 
ranged in  this  order. 

For  the  major  regions,  a significance  test  was  per- 
formed to  determine  whether  the  LACIE  estimate 


Table  II — Relative  Contribution  of  Area  and  Yield 
Errors  to  Variability  of  Production  Estimate 


Region 

Total  CK 
percent 

(T.  percent 

LACIE  acreages 
x SRS  yields 

LACIE  yields 
x SRS  acreages 

Winter  wheal 

7.0 

4.5 

5.3 

Spring  wheat 

100 

6.) 

7.5 

Tolai  wheal 

5.2 

3.7 

44 

Table  III. — Relative  Contribution  of  Area  and  Yield 
Errors  to  Bias  of  Production  Estimate 


Region 

Total  RD. 
percent 

RD.  percent 

LACIE  acreages 
x SRS  yields 

LAOEvieids 

X SRS  acreages 

Winter  wheat 

-7.2 

-7.6 

-11 

Spring  wheat 

— 22  3 

-29.1 

+6.3 

Total  wheat 

-123 

-14.9 

+ 1.5 

was  significantly  different  from  the  SRS  estimate. 
The  test  statistic  is  given  in  the  last  column  of  table 
IV  and  the  method  is  described  in  the  paper  by 
Houston  et  al. 

Winter  wheat;  Figures  2(a)  to  2(d)  show  the 
acreage  estimates  for  winter  wheat.  Figure  2(a) 
shows  that  the  LACIE  estimates  for  the  USSGP 
region  were  lower  than  the  SRS  estimates  for  every 
month  except  June.  Statistical  tests  showed  that  the 
LACIE  estimates  for  February,  March,  and  April 
were  significantly  different  from  the  corresponding 
SRS  estimates.  These  lower  estimates  are  expected 
early  in  the  season,  because  a number  of  wheat  fields 
have  not  yet  “greened  up”  enough  to  have  a charac- 
teristic wheat  signature.  In  1976,  this  effect  was 
especially  apparent  in  Kansas,  Oklahoma,  and  Texas 
because  these  states  were  affected  by  drought.  In 
May  and  June,  the  LACIE  estimate  for  the  USSGP 
improved  and  was  not  significantly  different  from 
the  SRS  estimate  from  May  through  the  final  esti- 
mate. In  June,  it  was  closer  to  the  final  SRS  estimate 
(which  held  from  July  on)  than  to  the  June  SRS  esti- 
mate. The  final  LACIE  estimate  had  a relative 
difference  (RD)  of  —6.3  percent  and  a coefficient  of 
variation  (CV)  of  5 percent. 

The  most  serious  problem  in  the  USSGP  region 
was  the  underestimate  for  Oklahoma,  shown  in 
figure  2(b).  Blind  site  investigations  indicated  that 
the  msyor  source  of  the  underestimate  in  Oklahoma 
was  analyst-mislabeled  fields  resulting  from  early 
dry  conditions  and  an  unusual  wheat  growth  cycle 
following  spring  rains.  The  wheat  was  late  in  green- 
ing up  and  had  signatures  that  were  quite  different 
from  normal  wheat.  In  fact,  comparisons  of  LACIE 
blind  site  ground  observations,  aircraft  photography, 
and  analyst  labels  on  a field-by-field  basis  indicated 
that  the  analysts  rarely  misidentified  nonwheat 
fields  as  wheat,  but  the  underestimate  resulted  from 
labeling  wheat  fields  as  non  wheat. 

The  winter  wheat  acreage  estimates  for  the  two 
mixed  wheat  states  are  shown  in  figure  2(c).  These 
estimates  were  very  low  in  June  but  increased 
throughout  the  season.  The  RD  for  the  final  estimate 
was  -14.7  percent  and  the  CV  was  19  percent. 

Figure  2(d)  shows  the  total  USGP  winter  wheat 
estimates.  The  final  LACIE  estimate  had  an  RD  of 
— 7.3  percent  and  a CV  of  5 percent.  July  was  the 
only  month  for  which  the  LACIE  estimate  was  sig- 
nificantly different  from  the  SRS  estimate.  Thus, 
there  was  a tendency  to  underestimate  winter  wheat 
but  it  was  significant  only  for  July.  This  tendency 
was  mainly  due  to  underestimation  in  Oklahoma. 
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FIGURE  2. — LACIE  and  SRS  acreage  esllroales.  (a)  USSGP,  winter  wheal,  (b)  Oklahoma,  winter  wheal,  (c)  Mixed  wheat  states, 
winter  wheat,  (d)  USGP,  winter  wheat,  (e)  Spring  wheat  states,  spring  wheat.  (0  Mixed  wheat  states,  spring  wheat,  (g)  USGP,  spring 
wheat,  (h)  USNGP,  total  wheat.  (I)  USGP.  total  wheat. 
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Spring  wheat:  Figure  2(e)  shows  the  spring  wheat 
estimates  in  the  two  spring  wheat  states,  Minnesota 
and  North  Dakota.  There  was  consistent  under- 
estimation by  LACIE,  but  there  was  a considerable 
improvement  in  September.  Part  of  this  improve- 
ment was  due  to  a change  in  the  ratios  of  wheat  to 
small  grains  that  were  used  to  calculate  the  wheat 
acreage.  For  spring  wheat,  CAMS  normally  deter- 
mines only  small-grains  proportions,  and  the  wheat 
proportions  are  then  calculated  by  multiplying  these 
by  the  historical  wheat-io-small-grains  ratios  for  the 
county  in  which  the  segment  is  located.  A change  to 
ratios  based  on  1975  data  accounted  for  48  percent  of 
the  improvement  in  North  Dakota  and  53  percent  of 
the  improvement  in  Minnesota.  In  North  Dakota,  a 
further  36  percent  of  the  improvement  was  due  to 
the  addition  of  21  new  segments.  These  new  seg- 


ments were  added  to  North  Dakota  to  correct  the 
sampling  problem  identified  during  Phase  I.  There 
was  also  an  undersampling  problem  in  Minnesota, 
since  the  spring  wheat  area  had  increased  from 
829  000  acres  in  1969  (the  year  that  was  used  for  the 
sampling  allocation)  to  2 844  000  acres  in  1976.  Blind 
site  investigations  indicated  a number  of  causes  of 
the  underestimate  in  North  Dakota,  including  poor 
Landsat  resolution  of  strip-fallow  areas,  weak  or 
missing  signatures,  and  poor  acquisition  histories. 

Figure  2(0  shows  the  spring  wheat  estimates  for 
the  two  mixed  wheat  states,  Montana  and  South 
Dakota.  These  estimates  were  consistently  low,  but 
they  did  improve  as  the  season  progressed.  The  im- 
provement was  partly  due  to  improved  spring-wheat- 
to-small-grains  ratios.  The  final  spring  wheat  esti- 
mate for  the  mixed  wheat  states  had  an  RD  of  -21.1 
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Table  I K— Comparison  of  SRS  and  LA  CIE  Acreage  Estimates 


(a)  Februarf* 


Region 

n/M 

SRS, 

thousands 
of  acres 

LACiE. 
thousands 
Of  acres 

RD,  percent 

CK  percent 

Test 

statistic 

Whiter  wheat 

Colorado 

13/32 

2830 

3539 

20.0 

26 

Kaniai 

43/84 

13100 

8013 

-63.5 

12 

Nebraska 

13/35 

3 400 

4500 

24.4 

18 

Oklahoma 

30/40 

7J50 

3499 

-90.0 

24 

Texas 

31/49 

6300 

3170 

-98.7 

25 

USSQP 

130/240 

33180 

22  721 

-46.0 

9 

b-5.1 

(b)  March a 


Region 

n/M 

SRS. 

LACIE. 

RD,  percent  CK  percent 

Test 

thousands 

thousands 

statistic 

of  acres 

of  acres 

Winter  wheal 


Colorado 

2S/32 

2830 

2 768 

-2.2 

25 

Kansas 

61/84 

13100 

8536 

-53.5 

8 

Nebraska 

21/35 

3400 

3 632 

6.4 

13 

Oklahoma 

36/40 

7 550 

3450 

-118.8 

18 

Texas 

42/49 

6300 

3 725 

-69.1 

30 

USSGP 

185/240 

33180 

22111 

-50.1 

8 

b— 6.3 

( c ) April 

Region 

n/M 

SRS. 

LACIE. 

RD.  percent 

CK  percent 

Test 

thousands 

thousands 

statistic 

of  acres 

of  acres 

Winter  wheat 


Colorado 

25/32 

2 040 

2 768 

26.3 

25 

Kansas 

62/84 

11 000 

8 536 

-28.9 

8 

Nebraska 

22/35 

3 400 

3 583 

5.1 

13 

Oklahoma 

36/40 

5800 

3450 

-68.1 

18 

Texas 

44/49 

3 900 

3479 

-12.1 

20 

USSGP 

189/240 

26140 

21816 

-19.8 

7 

*>-2.8 

*The  SRS  annum  for  February  and  March  arc  lha  December  1975  ealimaiea  of  MOdcd  tcrcetc 
bThc  LACIE  estimate  if  significantly  dilTcrcrii  from  Ihc  SRS  estimate  ai  Ihc  10-pcrccni  level. 


percent  and  a CV  of  12  percent.  The  results  pre- 
sented in  table  IV  show  that  there  was  an  under- 
estimation problem  in  Montana,  where  the  RD  For 
the  final  estimate  was  —54.0  percent  and  the  CV  was 
22  percent.  Investigations  indicated  that  the  under- 
estimates in  Montana  were  underestimates  of  wheat 


proportions  in  strip-fallow  areas,  which  did  not 
classify  well  because  Landsat  resolution  is  not  fine 
enough  to  resolve  the  fields. 

The  monthly  estimates  for  the  total  spring  wheat 
in  the  USGP  region  are  shown  in  figure  2(g).  The 
LACIE  estimates  were  consistently  low  and  were 
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Table  IV.— Continued 


(d)  May 


4 


Region 

n/M 

SRS 

LACIE. 

RD.  percent  CK  percent 

Test 

thousand 

thousands 

statistic 

Of  acres 

ttf acres 

Winter  wheal 


Colorado 

26/32 

1900 

2 807 

323 

24 

Kanin 

70/84 

10800 

9392 

-15.0 

6 

Nebraska 

27/35 

2950 

3653 

19.2 

13 

Oklahoma 

38/40 

5 800 

3897 

-48.8 

16 

Texas 

47/49 

3900 

4810 

18.9 

14 

USSGP 

208/240 

25350 

24  559 

-3.2 

6 

-0.5 

(el  June 


Region 

n/M 

SRS. 
thousands 
of  acres 

LACIE 
thousands 
of  acres 

RD.  percent 

CK  percent 

Test 

statistic 

Winter  wheat 

Colorado 

26/32 

1900 

2995 

36.6 

23 

Kansas 

75/84 

10  750 

10535 

-2.0 

6 

Nebraska 

30/35 

2 950 

4104 

28.1 

12 

Oklahoma 

38/40 

5 300 

4148 

-39.8 

14 

Texas 

47/49 

3900 

4 556 

14.4 

15 

USSGP 

216/240 

25  300 

26338 

3.9 

5 

-0.8 

Montana 

10/38 

3020 

488 

—518.9 

193 

South  Dakota 

8/10 

1040 

1 159 

10.3 

43 

MW  states 

18/48 

4 060 

1647 

-146.5 

65 

USGP 

234/288 

29  360 

27985 

-4.9 

6 

-.8 

if!  July 

Region 

n/M 

SRS. 

LACIE. 

RD,  percent 

CK  percent 

Test 

thousands 

thousands 

statistic 

of  acres 

of  acres 

Winter  wheal 


Colorado 

30/32 

2 200 

2 867 

233 

25 

Kansas 

78/84 

11  100 

10  795 

-2.8 

6 

Nebraska 

32/35 

3000 

4133 

27.4 

II 

Oklahoma 

40/40 

6300 

4025 

-56.5 

15 

Texas 

47/49 

4 700 

4314 

-89 

15 

USSGP 

227/240 

27  300 

26134 

-45 

5 

-0.1 

Montana 

21/38 

3 020 

1044 

-189  3 

52 

South  Dakota 

9/10 

1 040 

1482 

29.8 

23 

MW  states 

30/48 

4060 

2 526 

-607 

25 

USGP 

257/288 

31  360 

28660 

-9.4 

5 

b-l  9 

hThe  LACIE  estimate  is  iifmfinmly  different  from  the  SRS  estimate  at  the  10- percent  level 


535 


Table  I V.— Continued 


(g)  August 


Region 

n/M 

SRS, 

LACIE. 

RD,  percent  CK  percent 

Tat 

thousands 

thousands 

stannic 

of  atm 

of  atm 

Winter  wheat 


Colorado 

31/32 

2 200 

2830 

22.3 

24 

Kanau 

78/84 

It  100 

10932 

-1.5 

5 

Nebraska 

32/35 

3000 

4086 

26.6 

II 

Oklahoma 

40/40 

6300 

4 305 

-46.3 

15 

Texas 

47/49 

4 700 

4 310 

-9.0 

16 

USSGP 

228/240 

27  300 

26463 

-3.2 

5 

-0.6 

Montana 

22/38 

3020 

1911 

-58.0 

35 

South  Dakota 

9/10 

1040 

1482 

29.8 

23 

MW  states 

31/48 

4060 

3 393 

-19.7 

22 

USGP 

259/288 

31  360 

29856 

-5.0 

S 

-1.0 

Spring  wheat 


Minnesota 

10/13 

3 826 

1741 

-119.8 

40 

North  Dakou 

31/85 

11  540 

8161 

-41.4 

14 

SW  suites 

41/98 

15  366 

9902 

-55.2 

13 

Montana 

14/22 

2 315 

1 12’ 

-105.4 

28 

Soiih  Dakota 

14/23 

2050 

2169 

5.5 

12 

M W states 

28/45 

4 365 

3 296 

-32.4 

12 

USNGP 

69/143 

19  731 

13198 

-49.5 

10 

b-5.0 

Total  wheat 

Montana 

36/60 

5 335 

3038 

-75.6 

19 

South  Dakota 

23/33 

3090 

3651 

15.4 

13 

MW  states 

59/93 

8425 

6 689 

-26.0 

11 

USNGP 

100/191 

23  791 

16  591 

-43.4 

9 

b-4.8 

USGP 

328/431 

51  091 

43054 

-18.7 

5 

b— 3.7 

LACIE  esnmaic  i«  ti|mftc«mly  dtffctcm  from  the  SRS  estimate  at  the  10-pcrc«ni  level 


significantly  different  from  the  SRS  estimates  for 
every  month  and  for  the  final  estimate.  Of  the  four 
states  contributing  to  the  total  spring  wheat  estimate, 
only  South  Dakota's  spring  wheat  acreage  was  not 
consistently  underestimated.  This  record  indicated  a 
serious  underestimation  problem  for  spring  wheat. 
In  addition  to  the  reasons  given  previously,  blind  site 
studies  indicated  that  this  underestimation  was  also 


due  to  errors  in  the  ratios  of  wheat  to  sr  all  grains 
that  were  used  to  calculate  the  wheat  acreage. 

Total  wheat:  Figure  2(h)  shows  the  total  wheat 
estimate  in  the  four*state  USNGP  region.  It  was  con* 
sistently  underestimated  and  was  significantly 
different  from  the  SRS  estimate  for  every  month  and 
for  the  final  estimate.  The  final  estimate  had  an  RD 
of  -24.2  percent  due  to  underestimates  of  spring 
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Table  IV.— Continued 


(hi  September 


Region 

n/M 

SRS 

LACK,  RD.  percent  CV.  percent 

Test 

thousands 

fftOWflWdl 

statistic 

o/ acres 

ttf  acres 

Winter  wheat 


Colorado 

32/32 

2200 

2704 

114 

24 

Kanin 

SI/84 

It  100 

10919 

-to 

5 

Nebraska 

33/35 

3000 

3399 

11.7 

11 

Oklahoma 

40/40 

6 300 

4261 

-47.9 

14 

Texu 

47/44 

4700 

4 344 

-8.2 

16 

USSGP 

233/240 

27  300 

25697 

-62 

5 

-0.4 

Montana 

35/38 

3020 

2103 

-43.6 

29 

South  Dakota 

9/10 

1040 

1452 

28.4 

23 

MW  nates 

44/48 

4060 

3555 

-14.2 

20 

USGP 

277/288 

31360 

29  252 

-72 

5 

-1.4 

Spring  wheat 

Minnesota 

10/13 

3826 

2551 

-50.0 

27 

North  Dakota 

67/85 

11540 

9650 

-19.6 

5 

SW  states 

77/98 

15  366 

12  201 

-25.9 

7 

Montana 

19/22 

2315 

1291 

-79.3 

23 

South  Dakoia 

18/23 

2050 

2095 

2.1 

13 

MW  states 

37/45 

4 365 

3386 

-28.9 

12 

USNGP 

114/143 

19  731 

15587 

-26.6 

6 

*-4.4 

Total  wheal 

Montana 

54/60 

5 335 

3394 

-57.2 

14 

South  Dakoia 

27/33 

3 090 

3547 

12.9 

12 

MW  auies 

81/93 

8 425 

6941 

-21.4 

9 

USNGP 

158/191 

23  791 

19142 

-24.3 

6 

*—4.1 

USGP 

391/431 

51091 

44  839 

-13.9 

4 

*-3  5 

’’The  LACK  otinuw  « »i»nifk>ni!>  different  (rum  ihc  SKI  Miircete  it  the  lo.penem  level 


wheat  in  Montana,  Minnesota,  and  North  Dakota 
and  of  winter  wheat  in  Montana.  The  CV  was  $ per- 
cent. 

Figure  2(i)  shows  the  total  wheat  estimate  in  the 
nine-state  USGP  region.  The  LACIE  estimate  was 
consistently  low  and  was  significantly  different  from 
the  SRS  estimate  for  every  month  and  for  the  final 
estimate.  The  final  LACIE  figure  had  an  underesti- 


mate of  2.2  million  acres  in  the  winter  wheat  acreage 
and  an  underestimate  of  4.1  million  acres  in  the 
spring  wheat  acreage. 

Studies  based  on  ground-observed  proportions. — In 
Phase  II.  ground-observed  proportions  were  obtained 
for  103  winter  wheat  segments  and  33  spring  wheat 
segments  called  “blind  sites"  because  the  Afs  did 
not  know  the  identity  of  these  sites.  The  ground-ob- 
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Table  J V.— Continued 


(I ) October 


Region 

n/M 

5RS 

LAGS,  RD.  percent  CV,  percent 

Ten 

thdUSOEMts 

(hom&tdf 

ttatluic 

of  octet 

af  octet 

Winter  wheat 


Colorado 

32/32 

2 200 

2 704 

ISA 

24 

Kama* 

tuu 

11100 

10989 

-10 

5 

Nebraska 

33/35 

3000 

3 399 

11.7 

II 

Oklahoma 

40/40 

6300 

4261 

-47.9 

14 

Texas 

47/49 

4 700 

4344 

-82 

16 

USSGP 

233/240 

27  300 

25697 

-6.2 

5 

— 1.2 

Montana 

34/31 

3 020 

2131 

-41.7 

28 

South  Dakota 

9/10 

1040 

1452 

28.4 

23 

MW  states 

45/48 

4060 

3583 

-13.3 

19 

USGP 

271/211 

31360 

29  280 

-7.1 

5 

-1.4 

Spring  wheat 

Minnesota 

11/13 

3 826 

2198 

-74.1 

30 

North  Dakota 

79/15 

11540 

9 735 

-18.5 

5 

SW  stales 

90/98 

15  366 

11933 

-28.8 

7 

Montana 

20/22 

2315 

1487 

-55.7 

24 

South  Dakota 

19/23 

2 05C 

2079 

1.4 

13 

MW  states 

39/45 

4 365 

3 566 

-224 

12 

USNGP 

129/143 

19  731 

15  499 

-27.3 

6 

b-4.6 

Total  wheat 


Montana 

56/60 

5 335 

3618 

-47.5 

12 

South  Dakota 

28/33 

3 090 

3 531 

12.5 

12 

MW  states 

84/93 

8425 

7149 

-17.8 

8 

USNGP 

174/191 

23  791 

19082 

-24.7 

5 

b— 4.9 

USGP 

407/431 

51091 

44  779 

-14.1 

4 

b-35 

Htw  LACIt  cuimaic  n mnifinmly  different  frum  ike  IRtmimtu  n thj  1 o-per  tern  level 


served  proportions  were  used  to  study  various 
aspects  of  acreage  estimation. 

Bias  due  to  classification  (weighted  analysis): 
Ground-truth  infoimation  from  blind  site  data  ob- 
tained at  harvest  was  used  to  estimate  the  bias  in  the 
aggregated  acreage  estimates  that  was  due  to 
classification.  The  procedure  is  described  in  the 
paper  by  Houston  et  al.  and  the  results  are  given  in 
table  V.  It  is  called  “weighted"  analysis  because  the 


proportions  for  the  various  segments  are  multiplied 
by  the  weights  used  in  the  aggregation.  Th.*se  results 
show  how  errors  in  proportior  estimates  affect  the 
aggregated  acreage  estimates. 

For  winter  wheat,  there  is  significant  under- 
estimation due  to  classification  in  both  the  USSGP 
and  USGP  regions,  mainly  caused  by  problems  in 
Oklahoma.  Also,  there  it'  significant  underestimation 
in  both  spring  wheat  in  the  USNGP  region  and  total 
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Table  I V.— Concluded 


01  Fatal 


Mtgtoa 

n/M 

SRS. 

LACJC.  RD.ptrcmt  CV.pmm 

Tat 

fftjmtwift 

iwnttc 

of  atm 

of  atm 

IPWIW  nftrof 


Colorado 

30/M 

2100 

iff  A 

11.6 

24 

KiHfH 

It/M 

It  300 

II  ,25 

-u 

5 

Natasha 

ms 

2950 

3399 

13.2 

11 

Oklahoma 

40/40 

6300 

4261 

-47.9 

14 

Texas 

47/49 

4700 

4344 

-1.2 

16 

USSGP 

333/240 

27450 

25133 

-6.3 

5 

-1.3 

Montana 

36/31 

3010 

2079 

-41.1 

21 

South  Dakota 

9/10 

970 

1452 

33.2 

23 

MW  ttataa 

4S/4S 

4050 

3 531 

-14.7 

19 

USGP 

27I/3IS 

31500 

29  364 

-7.3 

3 

-1.5 

Spring  ohm 

Minnesota 

11/13 

3193 

2191 

-77.1 

30 

North  Dakota 

nits 

11520 

9356 

-16,9 

5 

SW  suits 

tom 

15413 

12054 

-27.9 

7 

Montana 

mi 

2 335 

1516 

-540 

22 

• 

South  Dakota 

ms 

2020 

2079 

2J 

13 

MW  SUMS 

39/45 

4 355 

3 595 

-21.1 

12 

USNGP 

129/143 

19761 

15649 

-263 

6 

* — 4.4 

Total  whtai 


Montana 

56/60 

5415 

3 595 

-506 

12 

South  Dakota 

21/33 

2990 

3 531 

15.3 

12 

MW  suits 

14/93 

• 405 

7126 

-17.9 

s 

USNGP 

174/191 

23  III 

19  ISO 

-24.2 

5 

*-4» 

USGP 

407/431 

51261 

45013 

-139 

4 

*-3.5 

HlM  LACIf  M1IIMM  »«smIkuMt»  from  ill*  ttlwuMM  u live  IO^apsmi  Mvtl 


wheat  in  the  USGP  region.  These  results  agree  with 
the  SRS  comparisons  discussed  previously. 

Bias  due  to  classification  (unweighted  analysis): 
This  section  contains  five  studies  of  proportion 
estimation  error.  The  term  “unweighted"  is  used  to 
indicate  that  the  aggregation  weights  are  not 
involved. 

1.  End-of-seaaon  winter  wheat  proportion  estima- 
tion error.  By  October,  data  had  been  obtained  for 


10?  blind  sites  in  the  five-state  winter  wheat  region. 
An  investigation  was  performed  using  these  data  and 
the  CAMS  classification  results  corresponding  to  the 
October  LAC1E  estimates.  The  results  are  shown  in 
figure  3 and  tables  VI  and  VI). 

Figure  3 shows  plots  of  theoroportion  error  V - 
X as  a function  of  X,  where  X is  the  CAMS  wheat 
proportion  estimate  and  X is  the  ground-truth  wheat 
proportion.  These  plots  are  for  the  five  individual 
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Table  V. — Estimates  of  the  Bias  and  Relative  Bias  of  the  LACIE  Acreage  Aggregation  Estimates 

Using  Blind  Sites 


Region 

LACIE  acreage 

Aggregated 

Relating 

Standard 

90-percent 

estimate 

acreage  bias 

bias  6/ A, 

deviation 

confidence  limits 

thousands 

B,  thousands 

percent 

of  8 

forB 

ol  acres 

of  acres 

Winter  wheat 


Colorado 

2 704 

-26 

-1.0 

275.6 

Kansas 

11  125 

-988 

-8.9 

473.2 

Nebraska 

3 399 

199 

5.9 

381.4 

Oklahoma 

4 261 

-2  583 

-60.6 

590.9 

Texas 

4 344 

-483 

-111 

953.9 

USSGP 

25  833 

-3  881 

-15.0 

1305.6 

(-6029.-1  733) 

USSGP  (excluding  Oklahoma) 

21  572 

-1  298 

-6.0 

1 164.2 

(-3213,617) 

Montana 

2 079 

-913 

-43.9 

768.9 

South  Dakota 

1 452 

-470 

-32.4 

255.9 

USGP 

29  364 

-5  264 

— 17.9 

1536.6 

(-7  792.-2  736) 

Spring  wheat 

Minnesota 

2 198 

-2  275 

-103.5 

908.2 

Montana 

l 516 

-827 

-54.6 

393.3 

North  Dakota 

9 856 

-2  385 

-24.2 

801.9 

South  Dakota 

2079 

— Si 

-1.8 

592.0 

USNGP 

15  649 

-5  524 

-35.3 

1404.6 

(-7  835.-3  213) 

Total  wheat 


li3GP  45  013  -10788  -24.0  2078.2  (-14207.-7  369) 


states  and  the  total  USSGP  five-slate  region.  Points 
lying  above  the  horizontal  line  X — X — 0 corre- 
spond to  overestimation  of  wheat  proportions  by 
CAMS,  and  points  lying  below  the  line  correspond  to 
underestimation. 

The  plots  in  figure  J indicate  that  there  is  an  over- 
all trend  toward  negative  values  of  X — X as  X in- 
creases for  the  five-state  region  and  for  each  of  the 
individual  states  except  Colorado  In  other  words, 
for  these  regions,  CAMS  tends  to  underestimate  the 
true  wheat  proportion  when  the  true  wheat  propor- 
tion is  large.  In  fact,  for  X > 28  percent,  there  was 


only  1 blind  site  out  of  26  in  the  5-state  region  for 
which  the  CAMS  result  was  not  an  underestimate 
relative  to  ground  truth.  Also,  figure  3 indicates  that 
underestimates  occurred  in  Oklahoma  and  Texas  for 
all  values  of  X.  In  Oklahoma,  1 7 of  20  blind  sites  were 
underestimated,  as  were  15  of  19  in  Texas. 

A statistical  analysis  of  the  data  shown  in  figure  3 
was  performed  using  the  technique  described  in  the 
paper  by  Houston  et  al.  The  results  are  shown  in  ta- 
ble VI.  It  lists  the  following  factors:  (1)  the  number 
of  blind  sites  for  which  data  were  available  for  each 
state  or  region;  (2)  the  number  of  segments  allocated 
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Table  VI. — Winter  Wheat  Blind  Site  Results  for  the  USSGP 


Region 

na 

N» 

* 

S' 

T5 

SB 

90-percent 
confidence  limits 
for  tiff* 

Colorado 

13 

32 

14.7 

14.5 

-0.1 

1.0 

1-1.8. 1.6) 

Kansas 

34 

84 

23.9 

22.3 

-1.6 

.9 

(-3.1.  -0.1)* 

Nebraska 

18 

35 

14.1 

14.8 

.7 

1.1 

(-1.2.  2.6) 

Oklahoma 

20 

40 

24.'* 

17.6 

-6.6 

1.5 

(-9.2.  —4.0)* 

Texas 

20 

49 

13.3 

11.9 

-1.4 

1.4 

(-3.6,09) 

USSGP 

IQS 

240 

19.1 

17.2 

-1.9 

.6 

(-2.9.  -1.0)* 

‘Number  of  L sites. 

^Number  of  se^.»ienu  tlkxsicd 

^Winter  wheat  estimates  from  the  October  Crop  Assessment  Subsystem  (CAS)  Monthly  Report  (CMR). 
a Population  averace  dtffacnct. 

‘Significantly  different  from  zero  at  the  10- percent  level  of  significance 


Table  VII. — Comparison  of  LA  CIE  Estimates  and  Ground-Observed  Proportions 
in  Winter  Wheat  Blind  Sites  in  the  USGP 


Month 

Number  of 
segments 

MSE° 

T>* 

percent 

RMD,C 

percent 

Percentage 

underestimated 

February 

71 

157.5 

-6.5 

-30.6 

83 

March 

95 

112.8 

-5.4 

-26.2 

79 

April 

95 

112.8 

-5.4 

-26.2 

79 

May 

95 

102.5 

-4.4 

— 21.4 

75 

June 

95 

89.5 

-3.3 

-15.7 

72 

July 

95 

90.4 

-3.4 

-16.2 

70 

August 

95 

75.0 

-3.2 

— 15.2 

71 

September 

95 

65.3 

-2.8 

-13.3 

68 

October 

9S 

69.6 

— 2.8 

-13.7 

68 

Final 

95 

70.8 

-2.7 

-13.2 

68 

MSfc  — lit  V,  - \ fyn  where  is  the  wheat  proportion  estimate  for  the  rth  segment.  .X.  is  the  ground-observed  harvested  wheat  pmpi-rtion  for  the  /i*.  segment,  and  n is  the 
number  of  segments 

bB  - Hi  v - - J - T. 

‘ROD  - IVT 

**This  column  contuim  the  percentage  of  blind  site  segments  in  which  L At'ifc  underestimated  the  wheat  proportions. 


to  each  state  or  region;  (3)  the  average  ground-truth 
wheat  proportion  X;  (4)  the  average  CAMS  wheat 

proportion  estimate  (5)  ,ie  average  difference  D 

= $ — X;  (6)  the  standard  error  SjyofD ; and  (7)  90- 
percent  confidence  limits  for  the  average  error  p D. 

In  order  to  determine  if  the  population  average 
difference  for  a particular  region  is  significantly 
different  from  zero,  one  needs  only  to  observe 
whether  the  corresponding  confidence  interval  con- 
tains zero.  If  it  does,  the  average  difference  is  not  sig- 
nificantly different  from  zero;  i.e.,  there  is  insuffi- 
cient evidence  to  conclude  that  there  is  a bias  due  to 
classification  error.  If  it  does  not  contain  zero,  then 


the  hypothesis  of  no  bias  is  rejected  at  the  10-percent 
level  of  significance. 

In  the  following  paragraphs,  the  results  presented 
in  table  VI  are  discussed  separately  for  each  state  and 
for  the  USSGP  region.  The  discussion  also  includes 
preliminary  results  from  an  investigation  by  CAMS 
to  determine  the  causes  of  classification  error.  At  the 
end  of  the  1976  crop  year,  the  data  for  one-half  of  the 
blind  sites  in  the  USGP  were  released  to  CAMS  for 
evaluation  of  the  accuracy  and  sources  of  error  in  the 
operational  analysis  during  Phase  II.  These  evalua- 
tions were  carried  out  in  most  cases  by  the  analyst 
who  conducted  the  original  interpretation  and 
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classification.  In  the  following  paragraphs  these 
studies  will  be  referred  to  as  the  "CAMS  investiga- 
tion." The  biggest  overall  problem  for  both  spring 
and  winter  wheat  was  the  occurrence  of  unusual 
wheat  signatures,  which  were  taken  to  be  nonwheat. 

The  results  for  Oklahoma  (table  VI)  indicate  that 
there  is  a large  negative  bias  in  the  CAMS  estimates 
for  the  segments  allocated  to  Oklahoma.  The  C AMS 
investigation  showed  that  underestimates  were  due 
to  atypical,  weak,  and  missing  signatures,  small 
fields,  and  spotty  stands  Some  of  these  effects  were 
attributed  to  drought  conditions.  Only  one  of  the  seg- 
ments checked  in  the  CAMS  investigation  was  over- 
estimated, hail  damage  of  wheat  at  harvest  was  the 
cause  of  the  overestimate.  Figure  4 shows  an  exam- 
ple of  wheat  signature  • ariability  due  to  drought. 

In  table  VI,  it  appears  that  a “significant"  bias  oc= 
curs  for  the  state  of  Kansas.  However,  inspection  of 
the  data  plotted  in  figure  3 reveais  one  outlier,  a 
difference  of  —25  5b  percent,  corresponding  to  a 
ground  truth  of  bl  56- percent  wheat  Omitting  this 
one  outlier  yields  an  estimate  of  the  bias  that  ir  not 
significantly  different  from  rero  From  the  CAMS 
invcsligai’on,  it  was  concluded  that,  in  Kansas,  over- 
estimates were  cue  to  pasture,  fallow,  and  sorghum 
be  ng  included  as  wheat  Underestimates  were 
usually  caused  by  missed  wheat  signatures;  i.e., 
wheat  signatures  that  were  not  included  in  the  train- 
ing data. 

For  Texas.  74  percent  of  the  blind  sues  were  un- 
dcrcslintaied.  However,  the  Sp  was  so  large  that 


there  was  insufficient  evidence  to  conclude  that  a 
bias  existed.  Inspection  of  the  data  plotted  in  figure  3 
for  Texas  reveals  an  outlier,  a difference  of  +25  31 
percent,  corresponding  to  a ground  truth  of  0.69  per- 
cent. This  extreme  overestimate  was  due  to  fallow 
fields  and  pasture  fields  which  appeared  red  and  tan, 
respectively,  on  the  imagery  and  were  classified  as 
wheat.  The  underestimates  that  occurred  for  most  of 
the  segments  were  generally  due  to  atypical  sig- 
natures. Some  stands  of  wheat  were  spotty. 

Neither  of  the  average  differences  for  the  other 
two  slates,  Colorado  and  Nebraska,  were  signifi- 
cantly different  from  zero,  nor  were  any  apparent 
outliers  observed.  The  analysts  in  CAMS  were  ap- 
parently having  some  success  in  identifying  wheat 
for  these  two  states.  The  CAMS  investigation 
showed  that,  tn  Colorado,  overestimates  were  caused 
by  confusion  crops  such  as  spring  wheat  and  winter 
rye  being  classified  as  winter  w heat;  underestimates 
were  caused  by  missed  signatures  in  drought  areas 
and  by  strip-cropping  areas  not  resolvable  by  the 
Landsat  system.  In  the  latter  case,  the  wheat  pixels 
were  all  essentially  border  pixels  and  therefore  many 
were  misclatsified  as  non  wheal. 

in  Nebraska,  overestimates  were  caused  b\  atypi- 
cal wheat  signatures  and  small  fields.  Underesti- 
mates in  Nebraska  were  due  to  missed  signatures,  the 
absence  of  key  acquisitions  such  as  biowindow  2, 
some  narrow  fields  that  were  missed,  and  some 
wheat  fields  that  were  not  identified  on  the  imagery. 

At  the  USSGP  five-state  level,  there  was  sufficient 


lit,!  HI  — Wheat  signature  > uriahilitt  due  In  drought — M-gnwnt  1212,  hiiiws,  Oklahoma.  117(1  crop  u'ar  lln  niiiiilicrs  indliuli'  ihe 
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evidence  to  conclude  that  the  CAMS  wheat  propor- 
tion estimates  were  significantly  different  from  the 
ground-observed  wheat  proportions  at  the  10-percent 
level,  mainly  as  a result  of  problems  in  Oklahoma. 
These  results  agree  with  those  obtained  in  the 
weighted  analysis. 

2.  Variation  of  winter  wheat  proportion  error 
throughout  the  season.  Table  Vll  presents  the  results 
of  a blind  site  investigation  to  study  the  variation  of 
classification  error  throughout  the  season. 

At  the  time  this  investigation  was  performed 
(December  1976),  all  blind  site  data  were  available, 
but  not  all  of  the  segments  could  be  used  since 
CAMS  estimates  for  the  whole  season  were  not 
available  for  all  of  them.  It  is,  of  course,  desirable 
that  the  same  number  of  segments  be  used  for  each 
month.  It  was  found  that  9$  segments  had  data  for 
March  through  the  end  of  the  season  but  only  71  seg- 
ments had  data  for  February. 

In  table  Vll,  four  quantities  relating  to  the 
classification  error  are  given:  (1)  the  mean-squared 
error  MSE,  (2)  the  mean  difference  D,  (3)  the  rela- 
tive mean  difference  RMD,  and  (4)  the  percentage  of 
the  segments  in  which  LAC1E  underestimated  the 
at-harvest  wheal  proportions.  There  was  a declining 
trend  in  the  MSE  throughout  the  season.  The  final 
figure  represents  a 55-percent  reduction  in  error 
from  that  of  the  February  estimate. 

The  Z5and  the  HMD  showed  the  same  behavior; 
i.e.,  a general  reduction  in  the  size  of  the  error  as  the 
season  progressed.  These  errors  were  all  negative,  in- 
dicating underestimates  by  LACIE.  From  February- 
through  the  final  estimate,  there  was  a 58-percent 
reduction  in  the  magnitude  of  the  ZJand  a 57-percent 
reduction  in  the  magnitude  of  the  RMD. 

The  percentage  of  segments  underestimated  by 
LACIE  also  decreased  throughout  the  season,  from 
83  percent  in  February  to  68  percent  for  the  final  esti- 
mate. All  these  estimates  thus  indicate  a general  im- 
provement in  the  CAMS  estimates  as  the  season 
progressed. 

3.  End-of-season  spring  wheat  proportion  estima- 
tion error.  The  spring  wheat  blind  site  investigation 
was  conducted  for  38  segments  in  the  four  L'SNGP 
states  of  Minnesota,  Montana,  North  Dakota,  and 
South  Dakota.  Figure  5 shows  plots  of  fie  proportion 
error  X — Tas  a function  of  X , where  Vis  the  CAMS 
wheat  proportion  estimate  and  X is  the  ground-truth 
wheat  proportion. 


The  plots  in  figure  5 show  a tendency  toward  un- 
derestimation in  every  state  except  South  Dakota. 
Twenty-nine  of  the  thirty-eight  sites  in  the  USNGP 
were  underestimated  by  CAMS.  In  the  plot  for  the 
USNGP,  there  appears  to  be  a slight  dependence  on 
the  value  of  X (i.e.,  the  underestimates  seem  to  be 
greater  for  larger  values  of  X ),  but  this  trend  is  less 
pronounced  than  that  shown  in  figure  3 for  the 
USSGP. 

The  statistical  analysis  of  these  data  is  presented 
in  table  VIII.  The  quantities  listed  are  the  same  as 
those  in  table  VI. 

For  the  blind  sites  in  the  USNGP,  the  analysis  in- 
dicated a significant  bias  in  the  CAMS  wheat  propor- 
tion estimates.  These  results  agree  with  those  of  the 
weighted  analysis.  Table  VIII  shows  that  the  LACIE 
acreage  estimates  were  low  for  all  of  the  states  except 
South  Dakota,  the  only  state  for  which  the  average 
difference  is  not  significantly  different  from  zero  at 
the  10-percent  level  of  significance.  For  Minnesota, 
Montana,  and  South  Dakota,  the  number  of  data 
points  was  small.  Therefore,  no  inference  about  the 
population  average  difference  between  CAMS  esti- 
mates and  ground-truth  proportions  should  be  made. 

In  Minnesota,  underestimation  generally  occurred 
in  segments  with  very  high  wheat  density  and  was 
caused  by  unusual  wheat  signatures  on  the  imagery. 
There  is  some  evidence  that  these  unusual  signatures 
were  the  result  of  color  distortions  in  (he  Landsat  im- 
age processing. 

In  Montana,  underestimation  was  usually  due  to 
strip-fallow  areas  that  were  not  appropriately 
classified.  Some  overestimates  were  due  to  hay  being 
classified  as  wheat,  even  though  the  two  were  not 
confused  in  the  training  fields. 

In  South  Dakota,  both  overestimates  and  under- 
estimates were  caused  by  drought  conditions.  There 
was  noticeable  difference  between  the  Landsat  data 
for  this  area  and  for  the  USSGP.  In  the  spring,  wheat 
and  small  grains  appeared  very  similar  to  pasture, 
alfalfa,  and  corn  on  the  imagery  because  of  the  stress 
caused  by  drought.  At  harvest  tirtfe,  some  corn  was 
grazed  or  cut  for  silage  and  some  alfalfa  was  cut  and, 
because  of  the  drought,  never  reappeared  In  both 
cases,  it  was  difficult  to  distinguish  these  rops  from 
harvested  small  grains.  Many  small  grains  were  not 
harvested  but  were  fall-plowed  and  could  not  be  dis- 
tinguished from  harveste,  small  grains  by  CAMS; 
therefore,  wheat  was  overestimated.  Underestimates 
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FHil'AE  5.— Plots  of  spring  wheat  proportion  estimation  errors  versus  ground-truth  spring  wheal  proportions  for  blind  sites  in  the 
US!N€ii».  (a)  Minnesota,  (bl  North  Dakota,  (c)  Montana.  <d)  South  Dakota,  (e)  I SNGP. 


Table  VIII.— Spring  Wheat  Blind  Site  Results  for  the  USNGP 


t'egion 

n 

M 

X 

~T~ 

D 

SB 

90-perctnt 
confidence  limits 
Mis  d 

Minnesota 

5 

13 

35.4 

22.8 

-12.6 

5.0 

(-20.8,  -4.4)b 

. North  Dakota 

20 

85 

27.1 

23.0 

-4.1 

1.95 

(-7.6,  -0.7)® 

Montana 

7 

22 

12.7 

9.1 

-3.6 

2.0 

(-6.9.  -0.3)b 

South  Dakota 

6 

23 

11.3 

11.5 

.1 

3.0 

(-4.9,  S.l) 

USNGP 

38 

143 

23.1 

18.6 

-4.5 

1.4 

(—6.7.  — 2.2)b 

*Fiiul  estimate*  from  the  CAS  annual  report  for  the  1976  crop  year. 
^Significantly  different  from  zero  it  the  10-percent  level  of  significance. 


were  due  to  missing  signatures  from  poor  stands  of 
small  grains  and  poor  acquisition  histories. 

The  CAMS  investigation  found  many  factors  that 
contributed  to  the  underestimate  in  North  Dakota. 
Among  these  were  (1)  strip-fallow  areas  unresolva- 
ble  by  the  Landsat  system,  (2)  weak  or  missing  sig- 
natures, (3)  poor  color  balance  in  Landsat  images 
due  to  the  transformation  that  is  applied  to  the  Land- 
sat data  before  the  images  are  made,  (4)  the  absence 
of  early  biowindow  acquisitions,  (S)  the  omission  of 
some  late-planted  spring  wheat  because  its  signature 
was  behind  the  jointing  signature  being  indicated  by 
the  adjustable  crop  calendar,  and  (6)  problems  in 
choosing  training  fields  caused  by  small  fields  or  the 
absence  of  identifiable  field  patterns. 

4.  Contribution  of  the  classification  and  ratio  er- 
rors to  the  ratioed  wheat  proportion  estimation  error 
at  the  segment  level.  The  CAMS  makes  estimates  of 
the  small-grains  proportion  jc(  for  each  segment  / and, 
subsequently,  the  Crop  Assessment  Subsystem 
(CAS)  obtains  wheat  proportion  estimates  by 
multiplying  the  by  ratios  P,  of  the  wheat-to-small- 
grains  proportions  for  the  counties  in  which  the  seg- 
ments are  located.  These  county-level  ratios  were 
determined  from  the  1975  SRS  estimates,  In  this  sec- 
tion, the  blind  site  data  are  used  to  compare  the  error 
incurred  by  using  these  ratios  to  the  error  incurred  by 
misclassification  of  small  grains.  The  method  used  is 
described  in  the  paper  by  He-  ston  et  al. 

Table  IX  presents  the  numetical  results  obtained 
for  37  spring  wheat  blind  sites  for  Phase  II  in  Min- 
nesota, Montana,  North  Dakota,  and  South  Dakota. 
It  can  be  seen  that  the  reduction  in  bias  is  slightly 
larger  when  there  is  no  ratioing  error  than  when 
there  is  no  small-grains  classification  error.  On  the 
other  hand,  a much  larger  reduction  in  mean-squared 
error  is  obtained  when  there  is  no  small-grains 


classification  error  than  when  there  is  no  ratioing  er- 
ror. This  indicates  that  the  variability  in  spring  wheat 
proportion  estimation  errors  is  primarily  due  to 
classification  of  small  grains.  The  historical  wheat-to- 
small-grains  ratios,  however,  introduced  more  bias 
than  did  small-grains  classification  errors. 

5.  Variation  of  spring  wheat  proportion  error 
throughout  the  season.  Table  X shows  the  results  of 
a blind  site  investigation  of  the  variation  of  classifica- 
tion error  throughout  the  season.  Only  33  of  the  38 
segments  were  used.  The  definitions  of  the  quantities 
listed  are  the  same  as  those  given  in  table  VII. 

The  mean-squared  classification  error  dropped 
from  158.5  in  August  to  110.1  at  the  end  of  the 
season— a decrease  of  30  percent.  The  average 
difference  was  negative  for  all  months,  indicating 
that  the  wheat  proportions  were  consistently  under- 
estimated throughout  the  year.  The  magnitude  of  the 
errors  declined  45  percent  in  the  period  from  August 
to  the  final  estimate.  In  spite  of  these  reductions, 
there  was  still  substantial  nderestimation  at  the  end 
of  the  season.  At  that  time,  the  wheat  proportion  in 
79  percent  of  the  sites  was  still  being  underestimated 
by  LACIE. 

Acreage  estimation  bias  due  to  nonsampled  and  non- 
responsive  areas. — In  order  to  investigate  bias  due  to 
nonsampled  and  non  responsive  areas,  an  aggregation 
was  performed  in  which  the  CAMS  proportion  for 
each  allocated  segment  was  replaced  by  the  1975  SRS 
county  wheat  proportion  for  the  county  containing 
that  segment.  In  table  XI,  the  results  of  this  "mock 
aggregation”  are  compared  with  the  SRS  estimates 
for  winter  wheat  in  the  USSGP  and  total  wheat  in  the 
USNGP  and  the  USGP.  The  RD  at  the  USGP  level  is 
0.8  percent.  This  is  an  estimate  of  the  relative  bias 
due  to  Group  II  estimation  and  Group  III  ratioing  of 
those  counties  not  allocated  segments  (see  the  paper 
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Table  IX.— Phase  II  Final  Results  for  Spring  Wheal  Blind  Sites  In  the  USNGP 


Category 

B,  percent 

Standard 
error  of 
D.  percent 

Reduction 
In  bias, 
percent 

’Khpercent 
confidence  limits 
for  bias 

MSE 

Reduction  In 
MSE 
percent 

Phase  II  Anal  result 

-5.2 

1.3 

(-7.4, -3.1) 

104.5 

No  ratioing  error 

-2.2 

1.2 

57 .7 

(-4.3,  -0.2) 

78.6 

24.8 

No  classification  error 

-3.1 

.6 

40.4 

(-4.0,  —2.1) 

25.7 

75.4 

Table  X.- 

Measurements  of  Classification  Error  for  Spring  Wheat  (LACIE  Estimates  Versus  Ground-Observed 
Proportions)  Over  all  Available  Blind  Sites  in  the  USGP 

Month 

Number  of 
segmt  us 

MSE 

B,  percent 

RMD,  percent 

Percentage 

underestimated1 

August 

33 

158.5 

-9.29 

—41.6 

88 

September 

33 

120.1 

-5.72 

-25.6 

82 

October 

33 

115.3 

-5.38 

-24.1 

79 

Final 

33 

110.1 

—5.05 

—22.6 

79 

“This  column  contains  the  percentage  of  blind  site  segments  in  which  LACIE  underestimated  the  wheat  proportion. 


Table  XI. — Acreage  Estimation  Bias  Due  to 
Nonsampled  Areas 


Region 

M 

1975  SRS. 
thousands 
of  acres 

Mock 

aggregation, 
thousands 
of  acres 

RD. 

percent 

USSGP 

240 

29  748 

30422 

2.2 

USNGP 

191 

21035 

20  768 

-1.3 

USGP 

431 

50  783 

51  190 

.8 

by  Hallum  et  ai.  entitled  “Sampling,  Aggregation, 
and  Variance  Estimation  for  Area,  Yield,  and  Pro- 
duction in  LACIE"  for  definitions  of  Group  11  and 
Group  III).  For  practical  purposes,  this  RD  is  negligi- 
ble and  shows  that  the  underestimates  in  the  LACIE 
estimates  were  not  caused  by  the  methods  used  to 
estimate  nonsampled  areas.  In  fact,  not  all  the  allo- 
cated segments  were  processed  during  Phase  II  for 
various  reasons.  Those  areas  which  had  an  allocated 
segment  that  was  not  processed  are  called  non- 
responsive  areas. 

Table  XII  shows  the  aggregation  of  county  SRS 
estimates  for  crop  year  1974-75  for  all  segments  pro- 
cessed (394)  during  LACIE  Phase  II  (1975-76).  Since 
91.4  percent  of  the  allocated  segments  were  pro- 
cessed in  Phase  II,  table  XII  differs  only  slightly 


from  table  XI.  The  RD  at  the  USGP  level  is  0.1  per- 
cent.Therefore,  practically  speaking,  the  relative  bias 
due  to  Group  11  estimation  and  Group  III  ratioing  (of 
both  those  counties  whose  allocated  segments  were 
lost  to  nonresponse  and  those  counties  not  allocated 
segments)  is  negligible. 

When  the  results  of  table  XI  are  combined  with 
those  of  table  XII,  an  estimate  of  the  relative  bias  due 
to  the  Group  III  ratioing  of  the  37  counties  whose 
allocated  segments  were  lost  to  nonresponse  is  —0.7 
percent,  which  is  also  negligible. 

When  the  results  of  table  XII  are  compared  with 
the  results  of  table  XI,  the  RD  between  the  mock  ag- 
gregation estimate  of  wheat  acreage  at  the  USGP 
level  and  the  SRS  estimate  is  negligible  whether  or 
not  segments  are  lost  to  nonresponse.  Also,  the  esti- 
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Table  XU. —Acreage  Estimation  Bias  Due  to 
Nonsampled  and  Nonresponslve  Areas 


Region 

N 

I97SSRS. 
thousands 
of  acres 

Mock 

aggregation, 
thousands 
of  acres 

RD. 

percent 

USSGP 

233 

29  748 

30208 

1.5 

USNOP 

161 

2I03S 

20637 

-1.9 

USGP 

394 

50783 

50845 

.1 

mate  of  bias  due  solely  to  segments  lost  to  non- 
response is  negligible,  indicating  that  nonresponse  is 
probably  not  introducing  a bias. 


Special  Studies 

Several  special  studies  were  performed  in  Phase  11 
to  investigate  various  aspects  of  LAC1E  proportion 
estimation  procedures  These  are  described  in  detail 
in  reference  1 and  are  summarized  below. 

Dependence  of  CAMS  error  on  acquisition  date. — 
Two  investigations  were  carried  out  to  determine  the 
relationship  between  the  latest  available  acquisition 
and  proportion  estimation  error.  In  the  first,  the  er- 
rors for  blind  site  wheat  proportions  in  the  USGP 
were  studied  as  a function  of  the  month  of  the  latest 
acquisition  used  by  CAMS  to  obtain  their  estimate  of 
wheat  proportions.  All  of  the  winter  wheat  blind 
sites  in  the  USGP  for  which  data  were  available  were 
used.  Spring  wheat  was  not  studied  because  sufficient 
ground-truth  data  were  not  available.  Table  XIII 
gives  the  mean-squared  *r'z,,  the  bias,  and  the  stan- 
dard deviation  of  me  errors  for  each  month  from 
November  1975  to  July  1976.  The  errors  were 
relatively  large  for  estimates  made  with  the  latest  ac- 
quisitions being  from  November  through  February. 
They  decreased  sharply  with  March  acquisitions  and 
remained  relatively  small  through  the  end  of  the 
season. 

In  the  second  study,  the  CAMS  proportion  errors 
for  intensive  test  sites  were  plotted  as  a function  of 
the  date  of  the  last  acquisition  used  to  classify  the 
data.  This  was  done  separately  for  spring  and  winter 
wheat.  The  plots  are  displayed  in  figure  t . For  winter 
wheat,  the  estimates  based  on  very  early  acquisitions 
(before  December)  had  very  large  errors  (mostly  un- 
derestimates). For  later  acquisitions,  the  errors  were 
smaller.  However,  there  was  no  well-defined  depen- 


dence on  acquisition  date.  For  spring  wheat,  there 
was  a tendency  toward  underestimation  for  early  ac- 
quisitions and  overestimation  for  late  acquisitions. 

Effect  of  blophase  on  proportion  estimation.— A 
study  was  conducted  in  Phase  II  to  investigate  the 
effect  of  various  biophase  combinations  on  propor- 
tion estimation.  Table  XIV  gives  estimates  of  the 
bias  and  standard  deviation  of  the  proportion  errors 
that  were  obtained  from  blind  sites  analyzed  using 
the  various  biophase  combinations.  Only  winter 
wheat  blind  sites  in  the  USGP  were  used.  Spring 
wheat  blind  sites  were  not  studied  because  sufficient 
data  were  not  available.  The  best  results  were  ob- 
tained using  data  from  the  biophase  combinations 
1-2  and  1-2-3.  In  every  case  studied,  the  magnitude  of 
the  bias  and  the  standard  deviation  were  increased  by 
adding  biophase  4 data,  except  for  the  combination 
1-3  where  the  magnitude  of  the  bias  increased  but  the 
standard  deviation  remained  the  same.  These  results 
indicated  that  better  estimates  might  be  obtained  if 
data  from  biophase  4 were  not  used. 

Study  of  labeling  and  classification  errors. — An  in- 
vestigation of  labeling  and  classification  accuracy 
was  conducted  for  14  winter  wheat  intensive  test 
sites  and  10  spring  wheat  intensive  test  sites.  The 
data  consisted  of  15  wheat  fields  and  15  non  wheat 
fields  in  the  ground-observed  area  of  each  ITS.  These 
fields  were  used  to  determine  the  probability  of  cor- 
rect classification  (PCC)  by  comparing  the  classifica- 
tion results  for  these  fields  with  ground  truth  on  a 
pixel-by-pixel  basis.  Labeling  error  was  studied  by 
determining  the  percentage  of  training  fields  in  the 
ground-observed  area  that  was  labeled  correctly. 

For  winter  wheat,  it  was  found  that  both  labeling 
accuracy  and  PCC  were  considerably  higher  for  non- 


Table  XIII. — Full-Month  Classification  Error 
for  Winter  Wheat 


Acquisition 

period 

MSE 

Bias , 
percent 

S'furt'fW 
deviation , 
percent 

No.  of 
sites 

Novemwr 

120.1 

-4.5 

10.1 

36 

December 

161.8 

-5.0 

118 

47 

January 

114.9 

-55 

9.3 

61 

February 

123.5 

-5.7 

9.6 

60 

March 

80.5 

-1.3 

89 

64 

April 

45.2 

-3.3 

5.9 

63 

May 

70.2 

-.9 

84 

82 

June 

84.3 

-2.9 

88 

88 

July 

48.3 

-.6 

7.0 

56 
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FIGURE  6,— Plot*  of  CAMS  error  as  a function  of  trquUltlon  date,  (a)  Winter  wheal,  (b)  Spring  wheat. 


small-grains  than  for  small  grains.  For  non-small- 
grains,  the  PCC  and  labeling  accuracy  were  87  per- 
cent and  95  percent,  respectively;  for  small  grains, 
the  PCC  was  63  percent  and  the  labeling  accuracy 
was  86  percent.  The  results  suggested  that  the  lower 
value  for  the  PCC  for  small  grains  was  because  the 
analyst  missed  some  small-grains  signatures.  This 
problem  was  probably  the  major  cause  of  the  under- 
estimation in  Phase  II. 

For  spring  wheat,  only  PCC  was  studied.  The 
average  value  for  small  grains  (8i.9  percent)  was 
smaller  than  the  average  for  non-small-grains  (93.4 
percent),  but  the  difference  was  less  than  that  ob- 
served for  winter  wheat. 

Adjustable  crop  calendar  error.— The  ACC  is 
designed  to  indicate  to  the  CAMS  analyst  the  growth 
stage  of  wheat  and  other  crops  in  the  segments  being 
analyzed.  It  can  therefore  be  expected  to  have  a con- 
siderable impact  on  the  accuracy  of  the  CAMS  esti- 


Table  XI V. — Classification  Error  by  Biowindow 
Combination  ( Winter  Wheat ) 


Combination 

Bias. 

percent 

Standard 

deviation. 

pervenl 

No.  of 
sites 

\ 

-2.5 

9.2 

117 

1*2 

— .8 

68 

72 

1-3 

-5.1 

6.6 

19 

1-2-3 

.8 

4,9 

32 

M 

-6  1 

14  1 

!9 

1-2-4 

-2.0 

7.9 

33 

t-3-4 

-5.5 

66 

17 

1 -2-3-4 

1.1 

5.1 

31 

mates.  A study  was  performed  to  determine  the  ac- 
curacy of  the  ACC  by  comparing  it  with  ground-ob- 
served growth-stage  data  over  eight  IIS's  in  Texas 
and  Kansas.  In  most  cases,  the  LACIE  growth  stage 
estimate  was  behind  the  ground-observed  growth 
stage  and  the  difference  increased  as  the  season 
progressed  By  June,  all  ACC  predictions  were 
behind  the  ground-observed  stages. 

Subsequently,  an  investigation  was  performed  to 
determine  whether  crop  calendar  error  had  an  in- 
fluence on  the  accuracy  of  CAMS  estimates.  The 
classification  errors  were  correlated  with  crop  calen- 
dar errors  for  9 w:nter  wheat  sites  and  12  spring 
wheat  sites.  Significance  tests  applied  to  the  correla- 
tion coefficients  indicated  that  no  significant  correla- 
tion existed  between  crop  calendar  error  and 
classification  error. 


PHASE  III  (CROP  YEAR  1 976-77) 

The  Phase  II  blind  site  results  indicated  a tenden- 
cy to  underestimate  winter  wheat  proportions  and  a 
significant  underestimation  of  spring  wheat  propor- 
tions. These,  of  course,  led  to  a negative  bias  in  the 
area  and  production  estimates  for  total  wheat  Thus, 
at  the  beginning  of  Phase  III,  the  sample  strategy  for 
the  U.S.  Great  Plains  was  revised  to  achieve  a pro- 
duction estimator  CV  of  5 percent  in  order  to  allow 
for  some  bias  and  still  meet  the  90/90  accuracy  goal. 

The  spring  wheat  blind  site  analyses  indicated  that 
a major  portion  of  the  negative  bias  in  the  spring 
wheat  proportion  estimates  was  due  to  the  historical 
ratios  of  spring  wheat  to  small  grains  used  in  reduc- 
ing small-grains  proportion  estimates  to  spring  wheat 
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proportion  estimates.  Therefore,  a task  was  initiated 
early  in  Phase  til  to  develop  econometric  models  for 
forecasting  these  ratios  with  the  intent  of  eliminating 
or  reducing  this  bias  (see  the  paper  by  Umberger  et 
al.  entitled  “Econometric  Models  for  Predicting  Con* 
fusion  Crop  Ratios”  for  a detailed  description  of 
these  models). 

Dui'ng  Phase  III,  a new  classification  procedure, 
Procedure  1,  was  introduced  to  address  other  prob* 
lems  identified  as  a result  of  LACIE  experience 
through  Phase  II.  Procedure  1 .provided  the  first 
capability  to  process  multidate  Landsat  acquisitions 
in  a high-throughput  mode.  The  difficulty  in  obtain- 
ing accurate  area  estimates  in  regions  with  small 
fields  had  been  identified  as  a major  problem.  With 
the  single-pixel  training  approach  used  in  Procedure 
1,  it  was  believed  that  more  accurate  wheat  propor- 
tion estimates  could  be  made. 

With  the  advent  of  the  new  approach,  the  blind 
site  program  was  expanded  in  Phase  III  for  more 
detailed  classification  error  analyses.  Correct  labeling 
at  the  pixel  level  is  the  key  to  the  success  of  Pro- 
cedure 1.  Therefore,  the  blind  site  program  was 
modified  to  allow  a comparison  of  analyst  pixel 
labels  with  ground-observed  crop  types. 

In  Phase  III,  LACIE  estimates  of  wheat  area, 
yield,  and  production  continued  to  be  made  in  the 
U.S.  Great  Plains  region.  This  section  is  devoted  to 
the  area  error  source  analyses  conducted  in  the  U.S. 
Great  Plains  region  during  Phase  III.  However,  a 
brief  review  of  the  90/90  evaluation  end  relative  con- 
tributions of  area  and  yield  errors  to  the  production 
error  is  presented  first.  The  descriptions  of  the  yield 
and  production  results  are  detailed  in  the  papers  by 
Phinney  and  Marquis,  respectively. 


The  00/90  Evaluation  and  Production 
SanalMvity  Analyala 

As  in  Phases  I and  II.  the  LACIE  Phase  HI  winter 
wheat  production  estimate  for  the  USGP  supported 
the  90/90  accuracy  goal.  On  the  other  hand,  the 
USGP  spring  wheat  production  estimate  did  not  sup- 
port the  90/90  accuracy  goal  because  of  a large  nega- 
tive bias  in  the  estimate.  As  a result  of  the  under- 
estimation for  spring  wheat,  the  final  LACIE  total 
wheat  production  estimate  only  marginally  sup- 
ported the  90/90  accuracy  goal. 

The  results  of  a sensitivity  analysis  to  determine 
the  relative  contributions  of  area  and  yield  errors  to 
the  production  error  are  given  in  table  XV.  Unlike 


the  Phase  II  results,  these  results  indicate  that  the 
total  wheat  production  underestimation  was  pri- 
marily the  mult  of  yield  underestimation.  The  large 
negative  bias  indicated  for  the  spring  wheat  produc- 
tion estimate  is  primarily  attributed  to  yield  under- 
estimation, although  area  was  also  significantly 
underestimated. 


Area  Error  Source  Analyses 

The  purpose  of  these  analyses  was  to  quantify  the 
error  components  and  then  to  determine  the  causa- 
tive factors  in  the  LACIE  estimation  process.  The 
general  approach  in  the  U.S.  Great  Plains  in  LACIE 
Phase  III  was  to  compare  the  LACIE  area  estimates 
to  various  reference  standards.  The  reference  stand- 
ards included  the  groj»  '-observed  data  for  a randot  i 
sample  of  the  LACIE  operational  segments  the 
historical  SRS  county-level  area  estimates,  and  the 
current  SRS  state-level  area  estimates. 

Comparison  of  LACIE  and  SRS  acreage  esti- 
mates.— For  the  USGP-7  region  (composed  of  the 
seven  major  winter  wheat  producing  states  in  the 
USGP),  the  first  LACIE  winter  wheat  area  estimate 
was  significantly  lower  han  the  corresponding  SRS 
area  estimate  (see  fig.  7 and  table  XVI).  The  next 
LACIE  estimate  (presented  *n  the  May  report)  was 
not  significantly  different  from  the  SRS  estimate,  as 
the  LACIE  estimate  increased  by  more  than  9 
million  acres  and  the  SRS  estimate  decreased  by  $.6 
million  acres.  The  increase  in  the  LACIE  estimate 
was  a result  of  increased  emergence  and  ground 
cover  of  the  wheat,  and  the  SRS  decrease  is  at- 
tributed to  the  difference  between  planted  area  and 
area  for  harvest.  During  the  remainder  of  Phase  III, 
the  SRS  winter  wheat  area  estimate  for  the  USGP-7 
region  remained  essentially  unchanged.  The  LACIE 
estimate  was  significantly  higher  than  the  SRS  figure 


TABLE  XV. — Relative  Contributions  of  Area  and  Yield 
Errors  to  Bias  in  the  Production  Estimate 


Region 

Total  RD. 
percent 

RD  percent 

LACIE  acreage, 
SRS  yield 

LACIE  yield. 
SRS  acreage 

Winter  wheat 

-3.4 

+4.9 

—8.9 

Spring  wheat 

-2S.7 

-123 

— IS  S 

Total  wheat 

-too 

-.2 

-10.9 
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in  July  and  August  (RD's  of  8.7  and  7.2  percent, 
reapectively).  The  September,  October,  and  Decern* 
ber  LACIE  USOP-7  area  estimates  tended  to  be  high 
but  were  not  significantly  different  from  the  corre* 
sponding  SRS  estimates. 

At  the  state  level,  the  primary  winter  wheat  ares 
estimation  problem  occurred  in  Colorado  (final  RD 
of  26.3  percent);  Colorado  was  the  only  state  in 
which  the  RD  was  consistently  larger  than  in  Phase 
II.  Blind  site  investigations  indicated  sampling  may 
be  b problem  in  Colorado.  Investigations  were  con- 
tinuing at  the  time  of  this  writing.  Initial  large  under- 
estimates in  Oklahoma  similar  to  those  observed  in 


Phase  II  lessened  as  the  season  progressed. 

The  CV’s  for  all  states  in  the  USOP-7  except 
South  Dakota  were  smaller  than  those  of  Phase  II,  in- 
dicating ah  overall  higher  degree  of  reliability  for  the 
LACIE  Phase  III  area  estimates. 

The  LACIE  Phase  III  spring  wheat  area  estimates 
were  first  available  in  July.  As  shown  in  figure  7,  the 
LACIE  spring  wheat  area  estimate  for  the  USNGP 
was  lower  than  the  corresponding  SRS  estimate  in 
every  month.  However,  the  RD's  were  generally 
much  improved  over  those  of  Phase  II.  The  final  RD 
was  -8.5  percent  for  Phase  III  as  compared  to  -*26.3 
percent  in  Phase  II. 
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<0.0 
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30.0 
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I'lGL RE  7.— LACIE  and  SRS  acreage  eetlmale*  (SRS  cellmate*  through  April  22  arc  for  needed  acre-  -leaned  or  December  22. 
It70).(a)  USGP-7,  winter  wheal.  Ibl  USNGP,  earing  wheal,  (c)  USGP.  total  wheal. 
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An  underestimation  problem  in  Minnesota, 
although  much  leas  seven  than  in  Phase  II,  was  the 
major  problem  for  spring  wheat  area  estimation  in 
the  USNGP  during  Phase  lit.  Blind  rite  investiga- 
tions indicated  that  the  major  labeling  error  source  in 
Minnesota  was  boundary  pixels.  Boundary  pixels 
cause  spectral  and  spatial  concision  between  small* 


grains  fields  and  non*small*grains  Helds. 

The  CV1  of  the  LACIC  spring  wheat  area  esti* 
mates  for  Phase  111  were  generally  smaller  than  those 
of  Phase  II.  The  CVt  of  the  Minnesota  estimates 
showed  the  greatest  reduction  from  Phase  II  levels, 
although  they  were  among  the  largest  for  the 
USNGP  states  in  all  Phase  HI  reports.  For  the 


Table  XVI.~ Comparison  of  LACIE and  SRS  Area  Estimates 

(a)  FehruarjA 


Region 

n/M 

SRS. 
thousands 
of  acres 

LACIE, 
thousands 
of  acres 

LACIE  CV, 
percent 

RD.  percent 

Test 

statistic 

1977 

1976 

1977 

1976 

Winter  wheat 

Colorado 

25/31 

2 740 

1997 

21j0 

26 

-37.2 

30.0 

Kansu 

82/121 

13  200 

6 SSI 

139 

12 

-916 

-63.5 

Ntbrask* 

41/56 

3300 

3067 

M.9 

IS 

-7.6 

24.4 

Oklahoma 

35/46 

7100 

3206 

9.6 

24 

-143.3 

-90.0 

Texai 

25/35 

6150 

3 365 

16.7 

25 

-82.8 

-98.’ 

USSOP 

201/289 

33190 

18  523 

71 

9 

-79.2 

-46.0 

'-11.15 

Montana 

30/58 

3050 

2127 

21.1 

NAb 

-43.4 

NA 

South  Dakota 

6/21 

1 160 

800 

60.6 

na 

-4S.0 

NA 

MW  states 

36/79 

050 

2927 

22.4 

NA 

-43.8 

NA 

USGP-7 

244/36S 

37400 

21450 

6.8 

NA 

-74.4 

NA 

'-10.94 

(b)  May 


Region 

n/M 

SRS. 

thousands 
of  acres 

LACIF. 
ihousmdt 
of  ayes 

LACIE  O. 
percent 

19V  1976 

RD,  perrtnt 
1977  1976 

r.st 

statistic 

Winter  wheat 

Colorado 

22/31 

2 290 

3600 

14.2 

24 

36.4 

37.3 

Kansu 

98/121 

12  000 

10439 

6.2 

6 

-ISO 

-ISO 

Nebraska 

38/36 

3 050 

3 278 

11.4 

13 

7.0 

192 

Oklahoma 

39/46 

6500 

4 832 

10.0 

16 

-34.5 

-488 

Texu 

30/35 

4400 

4196 

142 

14 

-4.9 

189 

USSGP 

227/289 

28  240 

26  345 

4.5 

6 

-7.2 

-3  2 

-1.06 

Montana 

28/38 

2 800 

3 369 

18.8 

NA 

16.9 

NA 

South  Dakota 

3/21 

750 

1 107 

43.1 

NA 

322 

NA 

MW  jtates 

31/79 

3550 

4476 

17.7 

na 

20.7 

NA 

USOP-7 

258/368 

31790 

30821 

46 

NA 

-3.1 

NA 

-68 

Hai  pwdKUaa  through  April  ttUwMi  on  Iknmbw  V 1*76 
bD*U  Mr  tvtiUMc 

lTh>  LACIE  Mim*K  u »i*n>tlcwitl>  dillwwl  Iron  (he  Lis  nua  «i  the  ULptrtcnl  live! 
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Table  XVI.— Continued 


(c)  June 


Retton 


n/M 


SRS  LACIE.  LACIE  CV.  RD.  percent  Test 

thousands  thousands  percent  stannic 


of  octet 

of  octet 

197? 

1976 

1977 

1976 

Winter  wheat 

Colorado 

22/31 

2 360 

3608 

13.6 

23 

346 

36.6 

Kansas 

104/121 

12000 

11055 

5.8 

6 

-85 

-20 

Nebraska 

40/56 

3 050 

3839 

9.5 

12 

20.6 

28.1 

Oklahoma 

4v  46 

6500 

5 228 

9.0 

14 

-24  3 

-39.8 

Texas 

30/35 

4400 

4462 

12.5 

15 

14 

144 

USSGP 

236/219 

28  310 

28  192 

41 

5 

-.4 

3.9 

-0.10 

Montana 

29/51 

2100 

3 704 

17.8 

193 

24.4 

5189 

South  Dakota 

7/21 

680 

1401 

250 

43 

515 

103 

MW  states 

36/79 

3 480 

5105 

146 

6$ 

31.8 

— 146.5 

USGP-7 

272/368 

31  790 

33  297 

4.1 

6 

4.5 

-4.9 

1 10 

(d)  July 


Region 

n/M 

SRS. 

thousands 
of  acres 

LACIE. 
thousands 
of  acres 

LACIE  O'. 
percent 

RD.  percent 

Test 

statistic 

IV  77 

1976 

1977 

1976 

H im  et  wheat 

Colorado 

21/31 

2 360 

3 26* 

13  4 

25 

27.8 

233 

Kansas 

96/121 

12300 

12919 

45 

6 

48 

-2  8 

Nebraska 

29/56 

3050 

3 844 

116 

II 

207 

27.4 

Oklahoma 

35/46 

6 500 

5 755 

7.1 

15 

-129 

-56  5 

Texas 

24/J5 

4600 

5011 

116 

15 

8.2 

-89 

USSGP 

205/289 

28  810 

30  797 

36 

5 

65 

-4  5 

*1.81 

Montana 

27/58 

2 800 

2626 

98 

52 

-6  6 

-189  3 

South  Dakota 

9/21 

680 

1943 

403 

23 

650 

.'98 

MW  states 

36/79 

3 480 

4 569 

17  1 

25 

238 

-60.7 

USGP-7 

241/368 

32  290 

35  366 

39 

5 

8.7 

-9  4 

*2.23 

k1h*  l M II  c«um«ic  i*  »<#ntftednil\  different  intm  the  %IU  ctitmaic  d«  (he  llLpefx«ml  loci 


USNGP.  the  Phase  III  CV  was.  on  the  average,  about 
40  percent  smaller  than  that  of  Fhase  II. 

The  LACIE  total  wheat  area  estimates  for  the 
USGP  region  (available  from  July  onward)  were  not 
significantly  different  from  the  corresponding  SRS 
estimates  in  any  reporting  period  of  Phase  III.  In 
fact,  the  RD  between  the  two  estimates  stayed  be- 
tween — l.l  percent  and  2.5  percent  over  the  entire 
season.  Also,  the  CV  of  the  USGP  total  wheal  area 


estimate  was  considerably  smaller  than  in  Phase  II. 
The  final  Phase  III  CV  was  2.4  percent,  compared  to 
a final  CV  of  4 percent  in  Phase  II. 

Studies  based  on  ground-observed  proportions. — In 
Phase  III.  near-harvest  ground  observations  were  ob- 
tained and  analyzed  for  92  winter  wheat  segments 
and  53  spring  wheat  segments.  This  section  contains 
the  results  of  studies  based  on  this  data  set. 

Bias  due  to  classification  (weighted  analysis):  This 
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Table  XVI. — Continued 


(dj  July 


Region 

n/M 

SRS. 
thousands 
of  acres 

LACIE. 
thousands 
of  acres 

LACIE  CK 
percent 

RD.  percent 

Test 

statistic 

1977 

1976 

1977 

1976 

Spring  wheat 

Minnesota 

22/47 

3 202 

2420 

12.2 

NA 

—32.3 

NA 

North  Dakota 

13/103 

9 500 

9 071 

10.7 

NA 

-4.7 

NA 

SW  states 

35/150 

12  702 

11491 

8.9 

NA 

-10.5 

NA 

Montana 

5/48 

2185 

1 895 

37.6 

NA 

-15.3 

NA 

South  Dakota 

5/37 

2 332 

1 269 

40.4 

NA 

-83.8 

NA 

MW  states 

10/85 

4517 

3164 

27.7 

NA 

-42.8 

NA 

USNGP 

45/235 

17  219 

14  655 

9.2 

NA 

-17.5 

NA 

c — 1.90 

Total  wheal 


Montana 

30/73 

4 985 

4 521 

9.9 

NA 

-10.3 

NA 

South  Dakota 

13/45 

3 012 

3212 

17.9 

NA 

6.2 

NA 

MW  states 

43/118 

7 997 

7 733 

23.3 

NA 

-3.4 

NA 

USNGP 

78/268 

20699 

19  224 

16.1 

NA 

-7.7 

NA 

USGP 

283/557 

49  509 

50021 

3.4 

NA 

1.0 

NA 

0.29 

cThe  LACIE  estimate  is  significantly  different  from  the  SRS  estimate  at  the  10-pcrccm  level. 


section  presents  the  results  of  the  weighted  analysis 
of  the  aggregated  acreage  estimates  to  determine  the 
bias  that  was  due  to  classification.  A weighted 
average  of  the  differences  between  the  at-harvest 
wheat  proportion  estimates  and  the  ground-observed 
wheat  proportions  is  obtained,  where  the  weights  are 
those  used  in  the  LACIE  aggregation  process.  Table 
XVII  presents  the  results  of  the  weighted  analysis. 
The  results  indicate  the  presence  of  a negative  bias  in 
the  LACIE  at-harvest  area  estimation  process  due  to 
winter  and  spring  wheat  proportion  estimation  errors 
at  the  segment  level. 

Bias  due  to  classification  (unweighted  analysis): 
This  section  presents  the  results  of  three  segment- 
level  wheat  proportion  estimation  error  investiga- 
tions based  on  comparisons  of  LACIE  wheat  propor- 
tion estimates  with  corresponding  ground-observed 
wheat  proportions.  The  term  “unweighted"  is  used 
to  indicate  that  the  analyses  do  not  involve  the  ex- 
pansion factors,  or  weights,  from  the  aggregation 
logic. 


1.  Winter  wheat  proportion  estimation  error. 
Blind  site  results  for  the  investigation  of  winter 
wheat  proportion  estimation  errors  for  the  USGP-7 
region  are  shown  in  figure  8 and  table  XVIII.  The 
LACIE  proportion  estimates  used  are  from  the 
Phase  III  CAS  Annual  Report,  December  22,  1977. 
Figure  8 shows  plots  of  the  proportion  estimation  er- 
ror (X  — X)  versus  X for  the  February,  July,  and 
final  CAS  reports,  where  X is  the  LACIE  harvested 
wheat  proportion  estimate  and  X is  the  ground- 
observed  harvested  wheat  proportion.  Points  lying 
above  the  horizontal  line  X — X —0  correspond  to 
overestimates  and  points  lying  below  the  line  corre- 
spond to  underestimates  of  wheat  propor  lions. 

Table  XVIII  contains  the  results  of  the  statistical 
analysis  of  the  winter  wheat  blind  site  data.  The 
following  factors  are  listed:  (1)  the  average  wheat 
proportion  estimate  X , (2)  the  average  ground- 
observed  wheal  proportion  X,  (3)  the  average  dif- 
ference Z5  « X — X,  (4)  the  standard  error  of  the 
average  difference  Sp,  and  (5)  the  90-percent  confi- 
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Table  XVI. — Continued 


(e)  August 


Region 

tt/M 

SRS. 
thousands 
of  acres 

LACIE. 
thousands 
of  acres 

L ACIE  O . 
prnvnt 

RD.  percent 

Test 

statistic 

IV?  ? 

IV7(t 

IV?  ? 

IV76 

Winter  wheat 

Colorado 

26/31 

2 360 

3 253 

113 

' 24 

27.5 

22.3 

Kansas 

10.1/1 21 

12  300 

12  579 

4.8 

5 

22 

-1.5 

Nebraska 

.11  /56 

3050 

3 5S8 

102 

11 

143 

266 

Oklahoma 

37/46 

6 500 

5 963 

6.7 

15 

- 9 0 

-46  3 

Texas 

28/35 

4 700 

4 600 

128 

16 

-2.2 

-9.0 

USSGP 

225/289 

28  910 

29  95.1 

3.6 

5 

3.5 

-3.2 

097 

Montana 

39/58 

2 800 

3 355 

7.9 

35 

165 

-58  0 

South  Dakota 

12/21 

680 

1 594 

38.1 

23 

57.3 

29.8 

MW  states 

51/79 

3 480 

4 949 

13.4 

22 

29.7 

-19.7 

l'SGP-7 

276/368 

32  390 

34  902 

3.6 

5 

7.2 

-5.0 

c2.00 

Spring  wheat 

Minnesota 

30/47 

3 202 

2 55.1 

13.0 

40 

-25  4 

-1198 

North  Dakota 

39/103 

9 530 

9 220 

5.7 

14 

-3.4 

— 41.4 

SW  slates 

69/150 

12  732 

1 1 773 

5.1 

13 

-8.1 

-55.2 

Montana 

23/48 

2 185 

1 942 

18  0 

28 

-12.5 

- 105.4 

South  Dakota 

24/37 

2 332 

2 309 

13.4 

12 

-10 

5.S 

MW  slates 

47,85 

4517 

4 251 

11.0 

12 

-6.3 

— 32.4 

nsNGP 

116/235 

17  249 

16  024 

4.8 

10 

-7.6 

-49.5 

- 1.58 

Total  wheat 

Montana 

52/73 

4 985 

5 2% 

6.4 

19 

5.9 

-75.6 

South  Dakota 

30/45 

3012 

.1904 

86 

13 

22.8 

154 

MW  stales 

82/118 

7 997 

9 200 

128 

II 

13.1 

-26  0 

USNCiP 

151/268 

20  729 

20973 

9 2 

9 

1.2 

-434 

USGP 

376/557 

49  6.19 

50926 

2.6 

5 

2.5 

-18  7 

0.96 

v The  l Ai  II  MintiK  «mnifw«niU  dittrifni  liom  ihe  SKS  c\nm4if  At  the  ItFpcfceni  level 


dence  limits  for  the  population  average  difference 
The  formulas  for  calculating  these  factors  are 
given  in  the  paper  by  Houston  et  al. 

To  infer  whether  the  population  average 
difference  for  a particular  state  or  region  is  signifi- 
cantly different  from  zero,  one  may  simply  check 
whether  the  corresponding  90-percent  confidence  in- 
terval contains  zero.  If  it  does,  the  population 


average  difference  is  not  significantly  different  from 
zero;  that  is.  there  is  insufficient  evidence  to  con- 
clude that  there  is  a bias  due  to  proportion  estimation 
error.  If  the  confidence  interval  does  not  contain 
zero,  the  hypothesis  of  no  bias  is  rejected.  The  test  is 
performed  at  the  10-pcrcent  level  of  significance. 

The  plot  fer  February  winter  wheat  shows  that, 
early  in  the  1977  season,  there  was  a tendency  for  the 
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Table  X VI.— Continued 


if)  September 


Region 


n/M 


SRS.  IACIE.  LACIE  CV.  RD.  percent  Test 

thousands  thousands  percent  statistic 

of  acres  of  acres 

1977  1976  19 77  1 976 


Winter  wheat 


Colorado 

25/31 

2 360 

3059 

10.3 

24 

22.9 

18.6 

Kansas 

107/121 

12  300 

12468 

4.5 

5 

1.3 

-1.0 

Nebraska 

40/56 

3 050 

3130 

9.2 

11 

2.6 

11.7 

Oklahoma 

38/46 

6 500 

6 083 

7.2 

14 

-6.9 

-47.9 

Texas 

28/35 

4 700 

4613 

12.7 

16 

-1.9 

-8.2 

USSGP 

238/289 

28  910 

29  353 

3.5 

5 

1.5 

-6.2 

0.43 

Montana 

39/58 

2 800 

3 628 

6.9 

29 

22.8 

-43.6 

South  Dakota 

13/21 

680 

989 

26.5 

23 

31.2 

28.4 

MW  states 

52/79 

3480 

4617 

7.8 

20 

24.6 

-14.2 

USGP-7 

290/368 

32  390 

33  969 

3.2 

5 

4.6 

-7.2 

1.44 

Spring  wheat 


Minnesota 

33/47 

3 202 

2 474 

11.6 

27 

-29.4 

-50.0 

North  Dakota 

62/103 

9 530 

8 523 

5.0 

5 

-11.8 

-19.6 

SW  states 

95/150 

12  732 

10997 

4.6 

7 

-15.8 

-25.9 

Montana 

30/43 

2185 

2187 

12.2 

23 

.1 

-79.3 

South  Dakota 

26/37 

2 332 

1958 

13.1 

13 

-19.1 

2.1 

MW  states 

56/85 

4 S17 

4145 

9.0 

12 

-9.0 

-289 

USNGP 

151/235 

17  249 

15142 

4.2 

6 

-13.9 

-26.6 

c— 3.31 

Total  wheat 

Montana 

53/73 

4 985 

5815 

6.0 

14 

14  3 

-57.2 

South  Dakota 

33/45 

3012 

2 947 

11.0 

12 

-2.2 

12.9 

MW  stales 

86/118 

7 997 

8 762 

13.4 

9 

8.7 

-21.4 

USNGP 

181/268 

20  729 

19  759 

8.7 

6 

-4,9 

-24.3 

USGP 

419/557 

49  639 

49  111 

2.5 

4 

-1.1 

-13.9 

-0.44 

cTti<  LACIE  is  stfmfVaiiily  diHcreni  fmm  live  SRS  estimate  at  the  KLperccm  level 


proportion  of  wheat  in  the  segments  to  be  underesti- 
mated by  a greater  margin  for  segments  with  larger 
proportions  of  wheat.  This  trend  became  less  pro- 
nounced as  the  season  progressed,  and  it  appears  to 
be  insignificant  in  the  July  and  final  plots  for  winter 
wheat. 

The  results  in  table  XVIII  indicate  the  presence  of 
a negative  bias  in  LACIE  winter  wheat  proportion 


estimates  for  the  USGP-7  region  for  each  month 
shown.  This  indicates  that,  for  these  blind  sites,  the 
proportion  of  winter  wheat  for  the  USGP-7  region 
was  underestimated  in  each  reporting  period. 
However,  the  wheat  proportion  estimation  error 
decreased  in  magnitude  each  month,  starting  with 
May  and  ending  in  August.  From  August  through 
the  final  reporting  month,  there  was  a slight  increase 
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Table  XVi— Continued 


<gi  October 


Region 

n/M 

SRS. 

LACIE. 

LACIE  CK 

RD.  percent 

Test 

thousands 
o f acres 

thousands 
of  acres 

percent 

statistic 

1977  1976 

1977  1976 

Winter  wheat 


Colorado 

24/31 

2 360 

3 395 

99 

24 

30.5 

18  6 

Kansas 

108/121 

12  300 

12  669 

4.2 

5 

2.9 

-1.0 

Nebraska 

39/56 

3050 

3 375 

96 

11 

96 

11.7 

Oklahoma 

41/46 

6 500 

5 658 

7.7 

14 

-14  9 

-47.9 

Texas 

29/35 

4 700 

4 476 

13.7 

16 

-5.0 

-82 

USSGP 

241/289 

28  910 

29  573 

3.5 

5 

2.2 

-6.2 

0.63 

Montana 

43/58 

2 800 

3 314 

7.8 

28 

15.5 

-41.7 

South  Dakota 

14/21 

680 

883 

25.7 

23 

23.0 

28.4 

MW  states 

57/79 

3 480 

4 197 

8.2 

19 

17.1 

— 13.3 

USGP-7 

298/368 

32  390 

33  771 

3.2 

5 

4.1 

-7.1 

1.28 

Spring  wheat 

Minnesota 

37/47 

3 202 

2 289 

9.9 

30 

-39.9 

-74.1 

North  Dakota 

70/103 

9 530 

9 173 

4.4 

5 

-3.9 

— 18.5 

SW  states 

107/150 

12  732 

11  462 

4.0 

7 

— 11.1 

-28.8 

Montana 

33/48 

2 185 

2 150 

10.3 

24 

-1.6 

-55.7 

South  Dakota 

32/37 

2 332 

1 909 

11.6 

13 

-22.2 

1.4 

MW  states 

65/85 

4 517 

4 059 

7.7 

12 

— 11.3 

— 22.4 

USNGP 

172/235 

17  249 

15  522 

3,6 

6 

-11.1 

-27.3 

c—  3.08 

Total  wheal 

Montana 

58/73 

4 985 

5 464 

5.5 

12 

8.8 

-47.5 

South  Dakota 

38/45 

3 012 

2 793 

9.9 

12 

-7.8 

12.5 

MW  states 

96/118 

7 997 

8 257 

122 

8 

3.1 

- 17.8 

USNGP 

203/268 

20  729 

19  719 

7.7 

5 

-5.1 

-24.7 

USGP 

444/557 

49  639 

49  293 

2.4 

4 

- 7 

-14.1 

-0  29 

‘‘The  LAI  N is  sigml'tunUy  different  irom  ihc  SKS  CMiimtc  di  the  lO-pcrcem  level 


each  month  in  the  magnitude  of  the  wheat  propor- 
tion estimation  error  for  the  USGP-7  region.  Inspec- 
tion of  figure  8 for  the  final  estimates  indicates  that 
two  outliers  were  the  main  cause  of  the  increase. 

Although  the  average  winter  wheat  proportion 
estimation  errors  for  the  individual  states  in  the 
USGP-7  tended  to  be  negative,  they  decreased  in 
magnitude  as  the  season  progressed.  The  number  of 


states  with  a population  average  difference  that  was 
not  significantly  different  from  zero  at  the  10- 
percent  level  increased  from  two  in  February  to  six 
in  October.  In  the  February  and  the  final  report,  the 
average  proportion  estimation  error  for  Oklahoma 
was  nearly  twice  as  large  as  the  average  for  the  other 
states  in  the  USGP-7  region.  The  proportion  estima- 
tion error  for  Oklahoma  from  May  through  October 
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Table  XVi — Concluded 


(hi  Final 


Region 

n/M 

SRS. 

LACIE. 

LACIE  CK 

RD.  percent 

Test 

thousands 

thousands 

percent 

statistic 

of  acres 

of  acres 

19  77 

1976 

1977 

1976 

Winter  wheal 

Colorado 

24/31 

2 550 

3459 

9.8 

24 

263 

18.6 

Kansas 

;06/12l 

12  100 

12  494 

4.0 

5 

3.2 

-1.6 

Nebraska 

39/5b 

2 950 

3433 

9.2 

11 

14.1 

13.2 

Oklahoma 

42/46 

6 500 

5 675 

7.6 

14 

-14.5 

-47.9 

Texas 

29/35 

4 700 

4 476 

13.7 

16 

-5.0 

-8.2 

VJSSGP 

240/289 

28  800 

29  537 

34 

5 

2.5 

-6.3 

<0.74 

Montana 

43/58 

2800 

3 371 

7.9 

28 

16.9 

-48  1 

South  Dakota 

15/21 

680 

912 

25.0 

23 

25.4 

332 

MW  stales 

58/79 

3 480 

4 283 

8.2 

19 

18.7 

-14.7 

USGP-7 

298/368 

32  280 

33  820 

3.2 

5 

4.6 

-7.3 

14 

Spring  wheal 

Minnesota 

38/47 

3 222 

2 344 

9.5 

30 

-37.5 

— 77.1 

North  Dakota 

73/103 

9 150 

9183 

4.4 

5 

.4 

-16.9 

SW  states 

1 1 1/150 

12  372 

11  527 

40 

7 

-7.3 

-27.9 

Montana 

32/48 

2 260 

2174 

10.2 

22 

-4  0 

-S4.0 

South  Dakota 

35/37 

2 336 

1936 

9.6 

13 

-20.7 

28 

MW  states 

67/85 

4 596 

4 no 

7.0 

12 

-118 

-21.1 

USNGP 

178/235 

16  968 

15  638 

35 

6 

-85 

-26.3 

'•-2.43 

Total  wheat 


Montana 
South  Dakota 

57/73 

41/45 

5060 
3 016 

5 545 
2 848 

5.4 

9.1 

12 

12 

8.7 

-5.9 

-506 

15.3 

MW  stales 

98/118 

8 076 

8 393 

11.7 

8 

3.8 

-17.9 

USNGP 

209/268 

20  448 

19921 

7.6 

5 

-2.6 

-24  2 

USGP 

449/557 

49  248 

49  458 

2.4 

4 

4 

-13.9 

0.17 

cThe  L At  II  estimate  it  ugmlkanik  dillercnt  lr»im  the  SRS  evtimatc  it  the  KUpcrccnt  level 


does  not  appear  to  be  significantly  different  from  the 
other  states'  estimates.  The  reason  for  this  is  that  the 
two  outliers  referred  to  previously  were  in 
Oklahoma.  One  was  acquired  for  the  October 
analysis  (note  the  increase  in  Z5and  Sp  from  Septem- 
ber to  October  for  Oklahoma  in  table  XVIII)  and  the 
second  was  acquired  for  the  final  analysis  (note  the 
further  increase  in  T>  and  Sp  from  October  to  final 


for  Oklahoma). 

Figure  9 displays  plots  of  proportion  estimation 
error  versus  ground-observed  proportion  for  each 
slate  in  the  USGP-7  winter  wheat  region,  using  the 
final  LACIE  proportion  estimates.  The  two  outliers 
are  again  apparent  in  the  plot  for  Oklahoma.  In- 
vestigation of  these  two  blind  sites  indicated  that 
there  was  no  Landsat  acquisition  during  the  tillering- 
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Table  XVII. — Estimates  ofLACtE  Acreage  Estimation  Bias  Due  to  Classification 


Region 

n/Nu 

l.ACIE  area 
estimate  A. 
thousands 
Of  acres 

Ruts  B. 
thousands 
of  acres 

Standard 

Relative 

bias, 

percent 

Cl. 

percent 

Test 

statistic 

deviation 

ofa 

Winter  wheat 

Colorado 

1 1/24 

3 459 

-567 

340 

-16.4 

98 

Kansas 

24/106 

12  494 

-1161 

47b 

-93 

3.8 

Nebraska 

I6/.W 

3 433 

-218 

227 

-64 

6.6 

Oklahoma 

15/42 

5 675 

-831 

442 

-14  6 

7.8 

Texas 

d/29 

4 476 

-141 

708 

-3  2 

15  8 

USSliP 

75/240 

29  537 

-3049 

1104 

- 10.3 

3.7 

b-2  8 

Montana 

14/4.1 

3 371 

157 

222 

+4.7 

6.6 

South  Dakota 

.VI 5 

912 

-451 

49 1 

-49  5 

538 

USGP-7 

92/298 

33  820 

-3213 

1181 

-9  5 

3 5 

b-2.7 

Spring  H'heat 


Minnesota 

It/38 

2 344 

-770 

356 

-32.8 

152 

Montana 

9/32 

2 174 

-780 

425 

-35.9 

195 

North  Dakota 

21/73 

9 183 

-1442 

535 

-15  7 

5.8 

South  Dakota 

12/35 

t 936 

-672 

499 

-34.7 

258 

CSNtiP 

53/178 

15  638 

- 3653 

916 

-23  4 

5.9 

b— 4.0 

Total  wheat 

USGP 

145/449 

49  498 

-6440 

1441 

-13  0 

2 9 

b-4.5 

*lh*  n <«  the  number  *»(  blind  xuc*  in  the  tffion.  the  ' t*  the  number  uf  acquired  tcgmcnti  in  ibc  teuton 
NndwtMC*  I'UnitVithm  bu%  it  MfmfwttttU  tS<0>rrnl  from  re  to 


to-heading  stage  of  wheal  and.  as  a result,  the  analyst 
mislabeled  most  of  the  wheat  pixels  as  non-small- 
grains.  Excluding  these  two  outliers  yields  an  average 
proportion  estimation  error  of  -0.8  with  a standard 
error  of  1.4  for  the  remaining  13  blind  sites,  and  the 
negative  bias  is  no  longer  indicated. 

Two  other  states  with  seemingly  large  standard  er- 
rors of  the  average  differences  for  the  final  estimates 
are  Texas  and  South  Dakota.  The  large  standard  error 
is  expected  for  South  Dakota,  as  only  three  blind 
sites  are  available.  However,  there  arc  nine  blind 
sites  in  Texas,  and  inspection  of  the  plot  for  Texas 
reveals  one  outlier  that  is  an  extreme  overestimate. 
Omitting  this  outlier  yields  an  average  difference  of 
—4.5  with  a standard  error  of  1 6,  indicating  a nega- 
tive bias  in  the  Texas  winter  wheat  proportion  esti- 
mates. Investigation  of  this  site  indicated  an  acquisi- 
tion pattern  similar  to  that  of  the  two  Oklahoma  out- 
liers. In  this  case,  though,  missing  a key  acquisition 


led  to  overestimation  rather  than  underestimation. 
This  indicates  that  when  a key  acquisition  is  missing, 
a proportion  estimate  should  not  be  made  since  posi- 
tive identification  of  pixel  labels  is  very  difficult. 

2.  Spring  wheat  proportion  estimation  error. 
Figure  10  and  table  XIX  contain  spring  wheat  pro- 
portion estimation  error  results  that  are  analogous  to 
the  winter  wheat  results  contained  in  the  preceding 
section. 

The  downward  trend  that  was  evident  in  the 
February  plot  of  winter  wheat  proportion  estimation 
error  versus  the  ground-observed  proportion  of 
winter  wheat  is  also  seen  in  the  July  spring  wheal 
plot.  This  means  that  the  problem  of  underestimat- 
ing the  proportion  of  wheat  early  in  the  season  in 
segments  with  larger  proportions  of  w heat  exists  for 
spring  wheat  as  well  as  for  winter  w heat.  There  was  a 
gradual  improvement  in  the  LACIE  estimates  of  the 
proportion  of  spring  wheat  (in  the  segments  with 
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FIGURE  8. — Plots  of  proportion  estimation  errors  versus  ground-observed  proportions  for  winter  wheat  blind  sites,  (a)  February.  <b) 
July,  (c)  Final. 


large  proportions  of  spring  wheal)  as  the  season 
progressed,  but  the  trend  is  still  present  in  the  final 
spring  wheat  plot. 

The  average  wheat  proportion  error  for  spring 
wheal  had  a tendency  to  be  negative.  The  average 
spring  wheat  proportion  estimation  error  for  the 
USNGP  region  was  negative  for  each  month  and,  ex- 
cept for  July,  the  population  average  differences 
were  significantly  different  from  zero  at  the  10-per- 
cent level  (see  table  XIX).  This  sequence  of  negative 
average  wheat  proportion  estimation  errors  for  the 
USNGP  region  increased  in  magnitude  from  July 
through  September  and  decreased  slightly  in  the  Oc- 
tober and  final  reports.  From  August  through  the 


final  report,  the  average  proportion  estimation  errors 
for  Montana  and  South  Dakota  were  not  signifi- 
cantly different  from  zero  at  the  10-percent  level.  In 
July,  South  Dakota  had  an  average  wheat  proportion 
estimation  error  that  was  significantly  different  from 
zero  at  the  10-percent  level.  There  were  no  data  for 
Montana  in  July. 

Figure  1 1 displays  the  plots  of  proportion  estima- 
tion error  versus  ground-observed  proportion  for 
each  state  in  the  USNGP  spring  wheat  region.  There 
are  no  obvious  outliers  for  any  of  the  states.  In  each 
state,  though,  the  tendency  to  underestimate  the 
larger  proportions  is  apparent. 

3.  Relative  contribution  of  the  classification  and 
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Table  X VUl. — Winter  Wheat  Blind  Site  Results 


Region  n/M 


* 


* 


T! 


Sp  90^ercetil 

confidence 
limits  for  Mp 


February 


Colorado 

10/31 

12.9 

223 

-9  5 

1.8 

(-12  7,  -6.3)* 

Kansas 

(9/121 

14.9 

302 

-15.3 

39 

t-22.0,  —8.61* 

Nebraska 

16/56 

20.8 

17.7 

3.1 

3.0 

(-2  2. 8 3) 

Oklahoma 

14/46 

1 7.0 

368 

-19.9 

4.2 

(-27.3.  -12.5)* 

Texas 

9/35 

153 

256 

-10  3 

34 

(-166.  -4.0)» 

Montana 

7/58 

88 

147 

-60 

1.9 

(-9.7.  -2.2)* 

South  Dakota 

2/21 

79 

11.3 

-3  4 

26 

(-19.9.  13  2) 

USGP-7 

77/368 

15  6 

253 

-98 

17 

(-12  6,  -7,l)» 

May 


Colorado 

10/31 

154 

223 

-68 

2.2 

(-109.  -2.7)» 

Kansas 

23/121 

22.1 

30.6 

-85 

26 

(-129.  -4.1)8 

Nebraska 

lb/56 

134 

17  1 

-3  2 

18 

(-6.4.  -0.D* 

Oklahoma 

15/46 

25.3 

34.3 

-9.0 

34 

(-150.  -3.1)8 

Texas 

10/35 

194 

73.4 

-40 

2 5 

(-8.6. 06) 

Montana 

5/58 

12  6 

172 

-4.6 

29 

(-10  7.  16) 

South  Dakota 

2/21 

62 

111 

-5  1 

4 3 

(-321.219) 

USGP-7 

81/368 

189 

25.4 

-65 

11 

(-8  4.  -4  6)8 

June 


Colorado 

10/31 

16  2 

223 

-6.1 

24 

( - 10.4.  -1  8)" 

Kansas 

25/121 

22  3 

290 

-6.7 

24 

(-10  8.  -2  6)8 

Nebraska 

17/56 

18  0 

16  7 

13 

1 6 

(-15.  4.1) 

Oklahoma 

15/46 

26  1 

.M3 

-8  2 

3 2 

(-1.19.  -26)* 

Texas 

10/35 

20  2 

23  4 

-3  2 

25 

(-7  8.  14) 

Montana 

5/58 

14  5 

P.2 

-26 

28 

1 -86.  3 4) 

South  Dakota 

3/21 

5 7 

78 

-2.1 

3 7 

( -12  8,  8.6) 

USGP-7 

85/368 

20  1 

246 

-4  5 

1 1 

(-6.1.  -26)* 

July 


Colorado 

7/31 

184 

19  3 

-0  9 

14 

(-3  6.  18) 

Kansas 

21/121 

265 

29  1 

\ 7 

1 4 

t-5  2.  -0.2)* 

Nebraska 

14/56 

164 

)7|) 

- 6 

1 9 

(-4  0,  2 81 

Oklahoma 

13/46 

31  4 

35  2 

-38 

1 7 

( 6 8.-0  8)8 

Texas 

8/35 

21.4 

25  5 

- 4 1 

2 5 

t -8  8.0.5) 

Montana 

8/58 

II  3 

15  3 

-40 

1 6 

(-  ’ 1,  -()9)« 

South  Dakota 

3/21 

7 7 

'6 

- 4 

1 0 

t - 3 3.  2.6) 

USGP-7 

74/368 

21  7 

24  2 

-2  5 

? 

(-.17,  -1  .1)8 

*Ni(n)t'uint!*  dillnrm  tfom  ;r  > ii  the  1 0 prr.rni  l<-\ r1 


ratio  errors  to  the  rutioed  spring  wheat  proportion 
estimation  errors.  As  in  L ACl£  Phase  II.  the  l ACII-! 
Phase  III  wheat  proportion  estimates  for  a segment 
were  obtained  by  multiplying  the  small-grains  pro- 
portion estimate  obtained  in  CAMS  by  a wheat-to- 
small-grains  ratio.  The  wheat-to-small-grains  ratios 


used  in  Phase  III  were  obtained  from  econometric 
models  at  the  CRD  level  in  the  USNGP.  The  pur- 
pose of  this  sec  tion  is  to  provide  the  results  of  a sen- 
sitivity analysis  used  to  determine  the  contributions 
of  classification  and  ratio  errors  to  proportion 
estimation  errors  at  harvest. 


Table  X VIII, — Concluded 


Region 

n/M 

5 

* 

d 

SB 

no  — — .—  - — - 

confidence 
limits  for  hq 

August 

Colorado 

10/31 

19.9 

21.3 

-1.4 

1.8 

(-4.7. 1.8) 

Kansu 

22/121 

28.0 

30.6 

-2.6 

1.3 

(-4.8.  —0.4)* 

Nebraska 

14/S6 

IS.5 

16.2 

-.8 

1.3 

(-3.0. 1.S) 

Oklahoma 

13/46 

35.3 

36.9 

-1.6 

1.6 

(-4.4. 1.2) 

Texu 

9/JS 

22.4 

25.2 

-2.8 

2.8 

(-8.1. 2.5) 

Montana 

12/58 

11.8 

14.0 

-2.2 

1.2 

(-4.5,00) 

South  Dakota 

3/21 

7.0 

7.6 

-.6 

.8 

(-3.1. 1.9) 

USGP-7 

83/368 

22.4 

24.2 

-1.8 

.6 

(-2.8.  -0.8)* 

September 


Colorado 

11/31 

17.3 

20.2 

-2.9 

1.6 

(-5.8.  -0.1)* 

Kansu 

23/121 

28.0 

30.5 

-2.5 

11 

(-4.4.  —0.5)* 

Nebraska 

17/56 

13.7 

16.0 

-2.3 

It 

(-4.2.  —0.4)* 

Oklahoma 

13/46 

36.3 

36.9 

-.5 

16 

(-3.4.  2.4) 

Teau 

9/35 

226 

25.2 

-26 

2.9 

(-80,2.8) 

Montana 

12/58 

12.8 

13.6 

-.7 

1.0 

(-2.6. 1.1) 

South  Dakota 

3/21 

5.0 

7.6 

-2.6 

2.6 

(-101,4.9) 

USGP-7 

88/368 

21.7 

23.7 

-1.9 

6 

(—2.9.  -09)* 

October 


Colorado 

11/31 

17  8 

20.2 

-2.4 

17 

(-5.4. 0.7) 

Kansu 

24/121 

27.0 

29.4 

-2.4 

11 

(-{3.  -0.6)* 

Nebruka 

16/56 

157 

18.0 

-2.2 

1.3 

(-4  5.01) 

Oklahoma 

14/46 

348 

38.2 

-34 

28 

(-8  3.1.6) 

Teau 

9/35 

22.7 

25.2 

-2.5 

29 

(-7  9.29) 

Montana 

14/58 

13.6 

13  4 

1 

to 

(-1.7.19) 

South  Dakota 

3/21 

5.0 

7.6 

-26 

2.6 

(-101.49) 

USGP-7 

91/368 

219 

24.0 

-2.1 

.7 

(-3  3.  -1.0)* 

Final 


Colorado 

11/31 

17.8 

19.8 

-20 

1.5 

(-4  7.0.7) 

Kansas 

24/121 

26.5 

293 

-28 

II 

(-4.7,  —0.9)* 

Nebraska 

16/56 

16.5 

18.0 

-IS 

II 

(-3  5.0.5) 

Oklahoma 

15/46 

344 

402 

-58 

3.1 

(-113,  -0.4)* 

Texas 

9/35 

22.7 

24.3 

-1.6 

29 

(-6.9,  3.7) 

Montana 

14/58 

13.7 

134 

3 

II 

(-1.7.  2.2) 

South  Dakota 

3/21 

50 

7.6 

-26 

2.6 

(-101.  50) 

USGP-7 

92/368 

220 

24.4 

-2.4 

.7 

(-36.  -12)* 

SmnifWantlt  dillerem  (com  it* u at  the  Itt^terwnt  loci 


This  analysis  was  made  for  33  blind  sites  in  the 
spring  wheat  region  of  Minnesota  and  North  Dakota, 
using  the  at-harvest  LACIE  proportion  estimates. 
These  are  the  two  states  for  which  a negative  bias 
wtas  indicated  for  the  Tina)  spring  wheat  proportion 
estimates  (see  table  XIX).  The  results  of  this  analysis 


are  presented  in  table  XX.  The  line  labeled  “'No  ratio- 
ing  error"  was  obtained  using  the  LACIE  small- 
grains  proportion  estimate  with  the  corresponding 
ground-observed  ratio  of  spring  wheat  to  small 
grains.  Likewise,  the  line  labeled  “No  classification 
error"  was  obtained  using  the  LACIE  estimate  of  the 
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FKil  RK  9.— Plod  of  at-hir«r»<  proportion  tMimalion  error*  *rr*u»  tround-obwnrd  proportion*  for  winter  wheat  blind  «ite«  by  Mate, 
(a)  Colorado,  (b)  Kan*a*.  (cl  Nebraska,  (d)  Oklahoma,  (el  Texas,  (f)  Montana.  ((I  South  Dakota. 


spring-wheat-to-small-grains  ratio  (forecast  from  a III,  The  primary  reason  for  this  reduction  was  an  in- 
CRD-level  econometric  model)  and  the  correspond-  crease  in  small-grains  classification  precision  in 

ing  ground-observed  proportion  of  small  grains.  Phase  HI  as  indicated  by  the  reduction  in  mean- 

The  results  indicate  that  classification  error  was  squared  error  from  78.6  in  Phase  II  to  33.4  in  Phase 
the  major  contributor  to  both  the  bias  and  the  mean-  III  for  proportion  estimates  with  no  ratioing  error 

squared  error  of  the  total  spring  wheat  proportion  (i.e.,  classification  errors  only).  This  is  at  least  par- 

estimation  error  in  North  Dakota  and  Minnesota  in  tially  due  to  the  use  of  Procedure  I in  Phase  III.  The 

Phase  III.  Comparison  of  these  results  with  those  of  mean-squared  error  for  proportion  estimates  with  no 

Phase  II  (table  IX)  indicates  a significant  reduction  classification  error  was  about  the  same  for  Phases  II 

in  total  mean-squared  error  from  Phase  II  to  Phase  and  III.  However,  the  use  of  econometric  models  in 
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Phase  111  Tor  forecasting  the  spring-wheat-to-small- 
grains  ratios  apparently  reduced  the  bias  considers* 
bly  from  that  obtained  in  Phase  II  using  historical 
ratios.  In  Phase  111.  there  was  an  estimated  bias  of 
— 1.3  percent  as  compared  to  an  estimated  bias  of 
—3.1  percent  in  Phase  II  for  proportion  estimates 
with  no  classification  error  (i.e.,  ratioing  errors  only). 

Contribution  rf  sampling  and  classification  errors 
to  the  variability  of  area  estimates:  This  study  was 


performed  for  the  purpose  of  measuring  the  con- 
tributions of  classification  and  sampling  errors  to  the 
within-stratum  area  variability  and  estimating  the 
classification  and  sampling  error  contributions  to  the 
CV's  of  the  regional  area  estimates.  Since  the  propor- 
tion estimates  used  in  this  section  are  for  ratioed 
wheat  (winter  or  spring),  the  classification  error 
referred  to  herein  is  actually  compounded  with  the 
ratio  error. 
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To  estimate  the  within-stratum  area  variances 
resulting  from  classification  and  sampling  errors,  the 
following  three  basic  regression  models  are  con* 
strutted. 

1.  True  segment  proportion  versus  historical 
stratum  proportion 

2.  LAC1E  segment  proportion  versus  ground- 
truth  segment  proportion 

3.  LACIE  segment  proportion  versus  historical 
stratum  proportion 

These  regression  models  are  used  to  obtain,  respec- 
tively, an  estimate  of  sampling's  contribution  to  the 
variance,  an  estimate  of  classification's  contribution 
to  the  variance,  and  an  estimate  of  a linear  combina- 
tion of  the  classification  and  sampling  variances.  The 
maximum  likelihood  estimation  technique,  assum- 
ing normality  and  that  the  regression  models  in  1, 2, 
and  3 are  applicable,  is  then  used  to  obtain  maximum 
likelihood  estimates  of  the  contributions  of  sampling 
and  classification  to  the  area  variance.  A detailed 
description  of  this  approach  is  presented  in  the  paper 
by  Houston  et  at.  Table  XXI  gives  the  results  of  this 
analysis  made  of  449  Phase  III  operational  segments. 
146  of  which  were  blind  sites. 

These  results  show  that  the  sampling  CV  is  larger 
than  the  classification  CV  for  winter,  spring,  and 
total  wheat  area  estimates.  The  implication  is  that 
sampling  contributes  slightly  more  to  the  area 
variance  than  does  classification.  Moreover,  winter 
wheat  has  smaller  CV's  for  both  classification  and 
sampling  than  does  spring  wheat;  that  is,  there  is  less 
variability  in  the  winter  wheat  area  estimates  than  <r. 
the  spring  wheat  area  estimates  for  the  USGP  region. 
The  sampling  CV  for  the  total  wheat  area  estimate  is 


1.9  percent,  which  is  well  within  the  sampling  ac- 
curacy goa!  of  2.3  percent. 

Acreage  estimation  bias  due  to  nonsompled  and  non • 
responsive  arras.— In  order  to  investigate  bias  due  to 
the  ratio  estimation  procedure  used  to  estimate  the 
wheat  area  in  nonsampled  and  nonresponsive  areas 
in  the  United  States,  aggregations  were  performed  in 
which  the  LACIE  proportion  estimate  for  each  seg- 
ment was  replaced  by  the  corresponding  1976  SRS 
county  wheat  proportion.  Table  XXII  contains  the 
results  of  this  "mock  aggregation”  for  all  allocate 
segments  and  the  comparisons  with  1976  SRS  esti- 
mates. The  RD  at  the  USGP  level  is  -2.5  percent, 
indicating  a possible  small  negative  bias  due  to  the 
Group  II  and  Group  III  ratio  estimation  procedure 
used  for  those  counties  not  allocated  segments.  This 
is  larger  than  the  observed  RD  of  0.8  percent  ob- 
tained in  a similar  study  of  the  Phase  ll  sample  seg- 
ment allocation  to  the  U.S,  Great  Plains  (see  table 
XI).  The  Phase  II  allocation  was  based  on  wheat  pro- 
duction for  an  epoch  year,  whereas  the  Phase  111 
allocation  was  based  on  small-grains  production  for 
an  epoch  year. 

An  investigation  was  undertaken  to  determine  the 
allocation  that  would  have  resulted  from  using  the 
epoch-year  wheat  production  rather  than  the  epoch- 
year  small  grains  production.  It  was  found  that  32 
currently  designated  Group  III  counties  should  have 
been  Group  I or  Group  II  counties  and  that  16  cur- 
rently designated  Group  1 counties  and  43  currently 
designated  Group  II  counties  should  have  been 
Group  III  counties.  The  decision  was  made  to 
redesignate  the  16  Group  I and  43  Group  II  counties 
as  Group  III  counties.  This  caused  the  original  alloca- 
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TaBLE  XIX.  —Spring  Wheat  Blind  Site  Results 


Region 

ASM  IS 

! X 

D 

OOfercent 

confidence 

Umtafinnu 

My 


Minnesota 

6/47 

9.1 

111 

-20 

2.5 

(-7.I.3.0) 

Montana 

0/41 

— 

_ 

— 

— 

Nofth  Dakota 

2/103 

326 

36.1 

•4.2 

10.3 

(-69.2,606) 

South  Dakota 

3/37 

11.2 

15.1 

-46 

4.9 

(-11.9.9.1) 

USNOP 

11/235 

13.9 

17.1 

-3.1 

23 

(-7.3.10) 

Augatt 

Minneaou 

10/47 

17.3 

226 

-5.2 

2.4 

(-96.  -0.9)* 

Montana 

4/41 

4.2 

11.7 

-7.5 

5.9 

(-21.3.6.3) 

North  Dakota 

1/103 

244 

27.3 

-2.1 

3.4 

(-9.4.37) 

South  Dakota 

0/37 

9.1 

11.3 

-16 

2.0 

(-5.3. 2.1) 

USNGP 

31/23$ 

15.3 

19.1 

-36 

1.5 

(-6.3. -1.3)* 

September 

Minnesota 

11/47 

19.0 

23.7 

-4.7 

2.3 

(-SO.  -06)* 

Montana 

7/4S 

9.9 

12.1 

-2.2 

24 

(-6  *.  7 4) 

North  Dakota 

17/103 

20.9 

25.7 

-4.1 

1.7 

(-7.1.  -!.«)• 

South  Dakota 

9/37 

S.4 

11.3 

-2.9 

2.5 

(-7  6.  IS) 

USNGP 

44/23$ 

16.1 

201 

-4.0 

1.1 

(-50.  -2.2/* 

October 


Minnesota 

J2/47 

116 

22.9 

-4.3 

2.2 

(— S.2.  —0.4)* 

Montana 

9/4S 

11.9 

15.7 

-3.1 

2.3 

(-«. 1.0.5) 

North  Dakota 

20/103 

21.0 

251 

-4.0 

IS 

(-66.  —1.5)* 

South  Dakota 

9/37 

7.9 

9.4 

-1.5 

23 

(-50.  20) 

USNGP 

50/235 

16.4 

20.1 

-36 

1.0 

(-52.  -20)* 

Final 


Minnesota 

12/47 

105 

22.9 

-4.4 

22 

(-13.  -0  5)* 

Montana 

9/4S 

12.0 

15.2 

-3.2 

2.3 

(-7  5.  1.0) 

North  Dakota 

21/103 

21.3 

247 

-3.4 

14 

(-50.  - I D* 

South  Dakota 

12/37 

SO 

11.1 

-3.1 

l.i 

( -6  4. 02) 

Total 

54/235 

16.2 

197 

-3  5 

9 

(-50.  —2.1)* 

diiTtnni  hwii  Ufa  u th«  Ihffrwti  tc.*l 


tion  to  the  United  State*  of  601  segment*  to  be 
reduced  to  557  segments.  The  results  in  table  XXII 
are  for  the  S$7  segments  after  redesignation.  It  was 
infeasible  at  the  time  to  allocate  more  sample  set* 
mente  to  the  32  Group  111  counties  that  should  have 
been  Group  I or  Group  II  counties.  The  use  of  the 
Group  111  estimator  to  estimate  their  wheat  area  ac- 
counts for  at  least  part  of  the  observed  difference. 


Table  XXIII  contains  the  results  of  aggregating 
the  1976  SRS  county  wheat  proportions  for  each  seg- 
ment acquired  and  processed  for  each  Phase  III 
monthly  estimate  made  except  the  final.  The  result 
for  the  final  estimate  is  expected  to  be  similar  to  that 
for  the  October  estimate.  The  difference  between  the 
mock  aggregation  and  the  SRS  estimate  in  this  study 
is  due  to  error  in  the  Group  II  and  Group  111  ratio 
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FIGURE  11 — Plots  of  it-harvest  proportion  eslimsiion  errors  versus  ground-observed  proportions  for  spring  wheat  blind  sites  by  state, 
(a)  Minnesota,  (b)  Montana,  (c)  North  Dakota.  (d>  South  Dakota. 


estimation  procedure  used  for  both  those  counties 
not  allocated  segments  and  those  counties  whose 
allocated  segments  were  lost  to  nonresponse. 

The  results  indicate  that  the  error  due  to  the  ratio 
estimation  of  the  nonsampled  and  nonresponsive 
areas  for  each  month  during  Phase  III  is  about  the 
same  as  that  due  to  nonsampled  areas  only.  This  in- 


dicates that  the  error  due  to  Group  II  and  Group  III 
ratio  estimation  of  areas  lost  to  nonresponse  is 
negligible.  However,  the  results  do  suggest  the  pres- 
ence of  a small  negative  bias  in  the  ratio  estimation 
technique  applied  to  the  nonsampled  areas,  particu- 
larly in  the  winter  wheat  region. 
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Table  XX. — Relative  Contribution  of  Classification  and  Ratio  Errors  to  Final  Phase  III  Spring  Wheat 

Proportion  Estimation  Errors 


Category 

n/M 

B. 

percent 

SB- 

percent 

Reduction 
in  bias, 
percent 

90-percent 
confidence 
limits  for 
“D 

MSE 

Reduction 
In  MSE, 
percent 

Final  Phase  III  result 

33/150 

-3.8 

t.i 

. 

(-5.7,  -1.8) 

68.3 

. 

No  ra.ioing  error 

-25 

.8 

34.2 

(-3.9,  —1.2) 

33.4 

51.1 

No  classification  error 

33/ ISO 

-1.3 

.8 

65.8 

(—2.7.  -0.02) 

26.9 

60.6 

Table  XXI. — Contribution  of  Sampling  and  Classification  Errors  to  Variability  of  Area  Estimates 


Crop 

Hi  thin - 
stratum 
area 
variance 

Variance  component 

Class i-  Sampling 
fication 

Percentage  error 

Classt-  Sampling 
fication 

Area 

' O',  percent 

Classification 
O'.  percent 

Sampling 
O',  percent 

Winter  wheat, 
USGP-7 

104.1 

41.6 

625 

40 

60 

3.2 

2.0 

2.5 

Spring  wheat, 
USNGP 

656 

262 

39.4 

40 

60 

3.5 

2.3 

2.8 

Total  wheat, 
USGP 

100.4 

396 

60.8 

40 

60 

2.4 

1.5 

1.9 

Table  XXII. — Acreage  Estimation  Bias  Due  to 
Sonsampled  Areas 


Region 

M 

1976  SRS. 

thousands 
of  acres 

Mock 

aggregation, 
thousands 
of  acres 

RD.  percent 

Winter  wheat. 
llSGP-7 

368 

31  500 

30  478 

-3.4 

Spring  wheal. 
USNGP 

235 

19  768 

19  527 

-1.2 

Total  wheal. 
USGP 

557 

51  268 

50005 

-2.5 

Special  Studies 

The  method  of  estimating  wheat  proportions 
using  LACIE  Procedure  I requires  a shift  from  the 
labeling  of  fields  to  the  labeling  of  individual  grid  in- 
tersection dots  or  picture  elements  (pixels).  Pre- 
viously, the  analyst-interpreter  ( Al)  could  select  the 
desired  fields  for  labeling.  Procedure  1 requires  that 
the  AI  label  pixels  from  a fixed  list  of  randomly 
selected  pixels  taken  from  the  209  intersections  of 


the  grid  overlay  of  the  imagery.  The  accuracy  of  Pro- 
cedure 1 depends  to  a large  extent  upon  the  Al's 
ability  vo  discern  accurately  which  of  these  pixels  are 
small  grains.  Using  the  software  system  described  in 
the  paper  by  Pitts  et  al.  entitled  “Accuracy  Assess- 
ment System  and  Operation,"  the  AI  labels  and  the 
corresponding  ground-observed  labels  can  be  com- 
pared to  evaluate  the  dot  labeling  accuracy.  The 
results  of  such  studies  are  presented  in  this  section 
(see  reference  2 for  more  detail). 

Analyst  dot  labeling  accuracy. — The  results  pre- 
sented in  this  section  are  from  a comparison  of 
ground-observed  and  analyst-designated  labels  of 
dots  from  51  blind  sites  located  in  North  Dakota, 
Minnesota,  Montana,  Colorado,  and  Oklahoma. 
These  dots  are  type  2 dots,  which  are  those  dots  used 
to  perform  the  stratified  area  estimation  part  of  Pro- 
cedure I.  The  accuracy  of  the  segment-level  propor- 
tion estimate  is  critically  dependent  on  t\e  labeling 
accuracy  of  these  type  2 pixels. 

Table  XXIV  presents  the  at-harvest  total  omis- 
sion and  commission  error  rates  (as  a percentage  of 
total  pixels  labeled  in  a state)  for  each  state  as  well  as 
the  omission  and  commission  error  rates  for  the 
three  major  error  sources  identified  in  Phase  III. 
Omission  error  is  the  result  of  mislabeling  small- 
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Table  XX Hi — Acreage  Estimation  Bias  Due  to 
Nonsampled  and  Nonresponsive  Areas 


Region 

N/M 

1976  SRS, 

Mock 

RD. 

thousands 

aggregation. 

percent 

Of  acres 

thousands 
Of  acres 

February 

Winter  wheat. 

244/368 

31  500 

30408 

-3.6 

USGP-7 

May 

Winter  wheat. 

256/368 

31  500 

30  737 

-2.5 

USGP-7 

June 

Winter  wheat. 

272/368 

31  500 

30  556 

—3.1 

USGP-7 

July 

Winter  wheat. 

241/368 

31  500 

30478 

- 1,7 

USGP-7 

4ieiir.tr 

Winter  wheat. 

276/368 

31  500 

30  678 

-17 

USGP-7 
Spring  wheat. 

t 16/234 

14  768 

14  434 

.8 

USNGP 
Total  wheat. 

376/557 

51  268 

50612 

-13 

USGP 

September 

Winter  wheat. 

240/368 

31  500 

30  641 

-28 

USGP-7 
Spring  wheat. 

151/234 

14  7t>8 

14  523 

- 1.3 

USNGP 
Total  wheat. 

4IW557 

51  268 

50  164 

_ ■>  T 

USGP 

(\lober 

Winter  wheat. 

248/368 

31  5(H) 

30  475 

-3.4 

USGP-7 
Spring  wheat, 

172/234 

14  768 

14  548 

-l.l 

USNGP 
Total  wheat. 

444/557 

51  268 

50023 

-2.5 

USGP 


grains  pixels  as  non-small-grains;  commission  error 
is  the  result  or  mislabeling  non-small-grains  pixels  as 
small  grains. 

The  results  in  table  XXIV  show  that  the  omission 
error  is  consistently  larger  than  the  commission  error 
by  state  and  by  error  source.  This  occurrence 
typically  leads  to  underestimation  of  the  small-grains 
proportion  in  a segment  and,  in  fact,  is  what  was 
found  in  the  blind  site  analyses  of  proportion  estima- 
tion error  described  previously. 

Abnormal  signatures,  boundaries,  and  inadequate 
acquisitions  were  found  to  be  the  three  major  sources 
of  labeling  error.  “Abnormal  signatures"  refers  to  a 
signature  (small-grains  or  non-small-grains)  that, 
under  the  conditions  believed  by  the  A1  to  be  occur- 
ring in  the  segment,  is  not  the  expected  signature  or 
does  not  follow  the  expected  temporal  sequence. 
“Boundaries"  arc  made  up  of  two  types  of  pixels, 
border  pixels  and  edge  pixels.  A border  pixel  is  one 
which  presents  an  interpretation  problem  because  its 
signature  is  spectrally  mixed;  that  is,  it  represents 
both  a small-grains  area  and  a non-small-grains  area. 
An  edge  pixel  is  one  for  which  the  signature  is 
spatially  mixed;  that  is,  on  the  acquisitions  used  by 
the  A!  for  proportion  estimation,  the  edge  pixel 
moves  at  least  once  from  a small-grains  field  to  a 
non-small-grains  field  because  of  misregistration, 
"inadequate  acquisitions"  refers  to  labeling  errors 
that  occur  because  the  AI  attempts  to  label  a segment  • 
when  key  acquisitions  arc  missing  The  Al  is  usually 
guessing  for  many  of  the  pixels  in  this  case  and  prob- 
ably should  not  pass  an  estimate.  This  particular  er- 
ror occurred  in  only  one  or  two  blind  sites  per  state; 
however,  when  it  occurred,  both  the  labeling  error 
and  the  proportion  estimation  error  were  large.  For 
example,  the  3.0-percent  omission  error  due  to  inade- 
quate acquisitions  for  the  1 1 blind  sites  in  Oklahoma 
came  from  one  segment.  This  particular  segment  ac- 
counts for  one  of  the  two  extreme  underestimates  in 
Oklahoma  referred  to  previously  in  the  proportion 
estimation  error  analysis.  The  other  outlier  in 
Oklahoma  was  not  included  in  this  study  but  it  had 
the  same  acquisition  history. 

Labeling  errors  in  the  “other"  category  include 
clerical  errors  and  inconsistent  labeling  errors.  Incon- 
sistent labeling  occurs  when  an  AI  has  labeled 
several  pixels  correctly  and  then  incorrectly  labels 
one  or  two  pixels  following  the  same  temporal  se- 
quence in  the  same  segment. 

Note  that  nonresolvable  small-grains  strip-fallow 
pixels  were  excluded  from  the  study  of  Montana. 
These  arc  pixels  for  which  the  MSS  resolution  is  not 
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Table  XXI V.— Phase  HI  Label  Error  Causes 


I Percentage  of  total  pixels  labeled] 


Cause  of  error 

Stale  (number  of  blind  sites) 

North 

Dakota 

m 

Minnesota 

(6) 

Montana? 

(10) 

Colorado 

(6) 

Oklahoma 

(II) 

OM6 

COKf 

OM 

COM 

OM 

COM 

OM  COM 

OM 

COM 

Abnormal  signatures 

4.4 

0.5 

2.6 

0.3 

1.4 

0.9 

2.8  - 

3.3 

1.4 

Boundaries 

3.2 

.7 

4.0 

1.1 

1.0 

.6 

2.3  0.8 

2.2 

.8 

Inadequate  acquisitions 

1.5 

1.0 

— 

— 

.5 

— 

— — 

3.0 

— 

Other 

2.1 

.8 

2.5 

1.2 

1.9 

.6 

.9  — 

1.4 

3.3 

Total  errors 

11.2 

3.0 

9.1 

2.6 

4.8 

2.1 

6.0  .8 

9.9 

5.5 

*Nonresotv«bic  small-grains  imp-fallow  pixels  excluded 
^Omission  error  rale. 

Commission  error  rale. 


fine  enough  to  show  the  strips  in  the  Landsat  imag- 
ery. The  signature  is  integrated  for  the  whole  Held 
and  hence  cannot  be  called  a boundary-type  sig- 
nature. Either  label,  small  grains  or  non-small-grains, 
could  be  considered  correct  for  these  areas.  Hence, 
because  of  the  inability  to  characterize  the  error,  they 
were  omitted.  In  Montana,  10.3  percent  of  the 
labeled  pixels  fell  in  these  nonresolvable  strip-fallow 
areas.  Of  these,  54  percent  were  labeled  non-small- 
grains  and  46  percent  were  labeled  small  grains  by 
the  AI's.  This  is  fairly  good  since  one  would  expect 
50  percent  of  these  areas  to  be  small  grains. 

About  1 1 percent  of  the  pixels  labeled  in  Montana 
fell  in  resolvable  strip-fallow  areas.  The  relatively 
low  error  rate  for  boundaries  (1.0-percent  omission, 
0.6-percent  commission)  indicates  that  the  analysts 
labeled  quite  accurately  in  these  areas.  Overall,  the 
Montana  small-grains  signatures  were  found  to  be 
quite  good.  There  were  very  few  abnormal  signatures 
and  there  was  good  separation  of  the  small-grains 
and  non-small-grains  signatures.  Recalling  the  pro- 
portion estimation  error  study,  neither  the  winter 
wheat  nor  the  spring  wheat  blind  site  analysis  indi- 
cated a bias  for  the  Montana  proportion  estimates. 

Excluding  the  outlier  for  Oklahoma,  the  largest 
total  labeling  errors  in  the  study  were  for  Minnesota 
and  North  Dakota.  These  errors  were  primarily  due 
to  omission  errors  for  abnormal  signatures  and 
boundaries.  In  the  spring  wheat  proportion  estima- 
tion error  study,  these  were  the  only  two  states  for 
which  a negative  bias  was  indicated.  The  large  errors 


of  omission  apparently  caused  this  proportion 
estimation  underage. 

Figure  12  contains  an  example  displaying  the  two 
largest  sources  of  omission  errors  in  Minnesota  and 
North  Dakota.  The  blind  site  is  located  in  Grant 
County,  Minnesota.  The  pixels  identified  as  1, 2,  and 
3 are  examples  of  a border  pixel,  an  edge  pixel,  and 
an  abnormal  signature,  respectively.  (The  upper  left 
corner  of  the  grid  intersection  is  designated  as  the  ex- 
act location  of  the  pixel.) 

Pixel  1 lies  on  the  border  between  a spring  wheat 
field  and  a sunflower  field.  From  the  ground-truth 
map,  it  was  determined  that  the  pixel  contained 
more  spring  wheat  than  sunflowers,  but  the  analyst 
labeled  the  pixel  as  non-small-grains.  The  more  ac- 
curate ground-truth  determination  is  possible 
because  the  ground  observations  are  made  at  a sub- 
pixel level,  one-sixth  the  size  of  a pixel.  The  evalua- 
tor thought  that  the  Al  should  have  labeled  the  pixel 
as  small  grains  because  close  inspection  of  the  imag- 
ery revealed  that,  in  the  heading  acquisition,  the  pix- 
el was  more  red  than  green  and,  in  the  turning  ac- 
quisition, it  was  more  green  than  red. 

Pixel  2 is  a classic  example  of  an  edge  nxel.  In  the 
heading  acquisition,  the  pixel  is  on  a road.  In  the 
turning  acquisition,  the  pixel  is  in  a spring  wheat 
field.  The  turning  acquisition  was  the  base  acquisi- 
tion for  this  segment.  When  an  AI  works  a segment, 
he  selects  one  of  the  acquisitions  to  be  the  base  ac- 
quisition. This  means  that  the  pixels  are  to  be  labeled 
as  to  their  location  in  the  base  acquisition.  The  grid 
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intersections  of  the  other  acquisitions  are  registered 
to  the  base  acquisition  for  labeling.  In  this  example, 
pixel  2 was  labeled  as  non-small-grains,  but  since  the 
turning  acquisition  wus  the  base  acquisition,  it 
should  have  been  labeled  as  small  grains.  This  may 
have  been  a clerical  error. 

Pixel  3 provides  an  example  of  an  abnormal  sig- 
nature. It  is  green  in  the  heading  acquisition  and  red 
in  the  turning  acquisition.  However,  this  pixel  lies  on 
the  edge  of  a small  body  of  water  The  ground  truth 
indicated  that  the  wheat  field  came  right  up  to  the 
edge  of  the  water.  The  Al  labeled  the  pixel  as  non- 
small-grains. The  evaluator  thought  that  the  Al 
believed  the  pixel  to  be  grass  growing  on  the  edge  of 
the  water.  The  evaluator  determined  that  the  pixel 
was  actually  spring  wheat,  as  indicated  by  the  ground 
truth,  but  the  development  of  the  spring  wheat  in 
this  pixel  had  been  delayed  because  of  excess 
moisture  and  was  still  in  the  heading  stage  although 
the  majority  of  the  wheat  in  the  segment  was  in  the 
turning  stage. 

Effects  of  Ai  acquisition  history,  and  bias  correction 
on  proportion  estimation  error. — The  Image  100  pro- 
cessor and  data  from  eight  U.S.  blind  sites  were  used 
in  an  experiment  wherein  each  site  was  analyzed  by 
three  ATs  to  give  a "raw”  and  a "bias-corrected”  esti- 


mate of  the  proportion  of  small  grains  in  each  seg- 
ment. The  segments  were  of  two  types;  namely, 
those  having  acquisitions  in  all  four  biophases  and 
those  having  only  early-season  acquisitions.  The  seg- 
ments were  selected  at  random  from  the  blind  sites 
for  which  detailed  ground  truth  was  available. 

The  objectives  of  the  experiment  were  (1)  to 
evaluate  the  performance  of  Procedure  1 in  terms  of 
absolute  proportion  estimation  error  and  its 
repeatability  with  different  AI’s,  (2)  to  make  com- 
parisons between  “bias-corrected”  and  “raw”  Pro- 
cedure 1 estimates,  and  (3)  to  determine  whether  the 
performance  was  better  when  acquisitions  from  all 
biostages  were  used  than  when  only  the  early-season 
acquisition  was  used. 

The  third  objective  could  not  be  properly  achieved 
because  of  the  small  number  of  segments  used  (four 
of  each  type).  It  was  later  estimated  that  to  make 
effective  comparisons  of  this  type  in  a fully  nested 
design,  one  would  need  about  10  times  as  many  seg- 
ments. The  efficiency  of  the  test  could  be  improved 
if  the  same  segments  were  analyzed  first  using  only 
early-season  acquisitions  and  then  using  all  acquisi- 
tions; however,  there  would  be  potential  biasing 
problems  in  such  replication  if  the  same  AI  analyzed 
the  segment  under  both  the  early-season  and  the  full- 


HEADING  - JUNE  23.  1977 
SW  - RED 
NW  - GREEN 


TURNING  - JULY  29,  1977 
SW  - GREEN 
NW  - RED 


1 BORDER  PIXEL  - SPECTRAL  CONFUSION  OF  SW  AND  SUNFLOWERS 

2.  EDGE  PIXEL  - SHIFTS  FROM  ROAD  (HEADING)  TO  SW  (TURNING) 

3.  ABNORMAL  SIGNATURE  - EXCESS  WATER  RETARDED  SW  DEVELOPMENT 


M(il  Ht  12. — Phase  ill  omission  labeling  error  examples. 
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season  conditions.  If  different  AI's  performed  the 
analysis,  the  potentially  large  variability,  as  found  in 
the  experiment  reported  here,  would  further  increase 
the  number  of  segments  required. 

Table  XXV  shows  the  absolute  proportion  estima* 
tion  error  |X  — T|,  where  X is  the  ground-truth 
small-grains  proportion  and  X is  the  analyst's  esti- 
mate of  X,  for  the  various  treatment  combinations. 
Averages  are  blocked  off  from  the  basic  data;  for  ex- 
ample, the  average  absolute  error  for  AI  “B"  on 
early-season  segments  was  11.6  for  the  raw  estimate 
and  11.8  for  the  bias-corrected  estimate.  The  average 
absolute  error  on  all  segments  was  7.9  for  raw  esti- 
mates and  11.1  for  bias-corrected  estimates.  The 
average  absolute  error  for  all  three  AI's  was  12.8  for 
raw  early-season  estimates,  6.3  for  raw  full-season 
estimates,  and  9.S  for  all  eight  segments  with  raw 
estimates.  The  grand  mean  was  10.0. 

The  most  obvious  feature  of  table  XXV  is  the 
large  variability  between  AI's  and  between  segments. 
If  this  variation  is  taken  to  be  typical,  then  future  ex- 
periments should  be  designed  so  that  segments  and 
AI's  are  “crossed"  with  treatments  as  much  as  possi- 
ble; that  is,  each  segment  should  be  worked  by  each 
AI  using  each  treatment. 

Analysis  of  variance  was  used  to  test  for  the 
effects  of  AI’s,  time  (i.e.,  early  season  versus  all  ac- 
quisitions), method  (raw  versus  bias  correction),  and 
their  interactions.  The  results  led  to  the  following 
conclusions. 


1.  The  large  disparity  between  data  from  various 
AI’s  was  not  consistent  over  segments;  i.e.,  an  AI 
would  do  better  on  one  segment  than  on  another. 

2.  There  was  no  significant  difference  between 
methods;  i.e.,  the  use  of  bias  correction  just  traded 
one  random  error  for  another  of  comparable  mag- 
nitude. 

3.  Any  test  involving  acquisition  history  was  not 
significant. 

As  stated  earlier,  these  tests  had  extremely  low 
power  because  of  insufficient  numbers  of  segments 
to  account  for  the  large  Al-to-Al  and  segment-to-seg- 
ment  variability. 


Summary  of  Phase  III 

The  Phase  III  results  indicate  that  significant  im- 
provement has  been  realized  over  Phase  I and  Phase 

II  results  because  of  the  L ACIE  area  estimation  tech- 
nology improvements.  The  incorporation  in  Phase 

III  of  the  ratios  of  wheat  to  small  grains  as  forecast 
using  the  econometric  models  proved  to  be  much 
better  than  using  historical  ratios.  The  new  classifica- 
tion procedure.  Procedure  1,  apparently  helped  in- 
crease the  precision  of  small-grains  proportion  esti- 
mates, particularly  in  the  spring  wheat  area.  The  in- 
creased precision  in  classification,  together  with  the 
achievement  of  the  Phase  III  goal  of  a 2.3-percent 
sample  error,  resulted  for  the  first  time  in  a total 


Table  XXV. — Image  100 — Procedure  l Data 

1 1 $ — ( small  grains )} 


Acquisition 

history 

Segment 

Raw 

Bias  correction 

Overall 

A 

Analyst 

B 

c 

Average 

A 

Analyst 

B 

C 

Average 

' average 

Early  season  only 

1642 

16.5 

10.8 

2.0 

18.9 

8.7 

16.8 

1651 

11.4 

18.5 

21.3 

5.6 

18.3 

19.7 

1660 

9.7 

14.6 

30.3 

8.0 

11.9 

19.5 

1662 

8.4 

2.5 

7.0 

1.6 

8.2 

1 5 

Average 

11.5 

116 

15.2 

12.8 

8.5 

11.8 

14.4 

11.6 

12.2 

Full  season 

1603 

0.8 

1.4 

0.9 

1.4 

1.4 

2.0 

1614 

S.2 

10.6 

31.7 

9.7 

32.9 

32,6 

1637 

1.3 

.3 

15.1 

7.2 

5.0 

14.0 

1656 

1.7 

4.7 

2.4 

2.7 

2.5 

2.5 

Average 

2.2 

S9 

12.5 

6.3 

5.3 

10.5 

12.8 

9.5 

7.9 

Overall  average 

■a 

IB 

13.8 

9.5 

6.9 

11.1 

13.6 

10.5 

10.0 
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wheat  area  estimate  Tor  the  United  States  for  which 
the  90/90  hypothesis  could  not  be  rejected. 

The  expanded  blind  site  program  proved  to  be  ex- 
tremely useful  for  evaluating  the  area  estimation 
technology  in  Phase  III  and  is  expected  to  be  invalua- 
ble for  future  technology  advancements.  The  major 
sources  of  labeling  error — abnormal  signatures, 
boundaries,  and  inadequate  acquisitions— were  iden- 
tified through  the  use  of  the  ground  data  acquired 
and  processed  in  Phase  IU.  As  a result,  classification 
procedures  have  already  been  modified  to  eliminate 
segment  estimates  based  on  poor  acquisition  histo- 
ries. The  abnormal  signature  and  boundary  problems 
are  still  under  investigation.  Potential  solutions  are 
being  investigated  at  this  writing. 


CONCLUSIONS 

The  3 years  of  area  estimation  results  in  the  U.S. 
yardstick  region  support  the  U.S.S.R.  and  Canadian 
experience.  In  Phase  II,  the  wheat  area  for  Canada 
was  grossly  underestimated.  Problems  were  ap- 
parently due  to  incorrect  ratios  of  wheat  to  small 
grains  and  omission  errors  in  small-grains  classifica- 
tion. The  omission  errors  are  thought  to  be  the  result 
of  boundary  pixels  and  abnormal  signatures.  The 
boundary  pixels  are  due  to  small  fields  (e.g.,  strip- 
fallow  cropping  practice)  and  nonhomogeneous 
fields.  The  abnormal  signatures  are  like  those  ex- 
perienced in  the  USNGP  spring  wheat  area  during 
LACIE. 

In  the  U.S.S.R.,  the  fields  are  very  large  (average 
about  500  hectares)  and  the  cropping  practices  ap- 
pear to  be  more  uniform  than  in  the  United  States. 
The  large  fields'  result  in  fewer  boundaries,  and  the 
uniform  cropping  practices  result  in  fewer  abnormal 
signatures.  This  situation  should  result  in  better  pro- 
portion estimates  at  the  segment  level,  according  to 
the  Phase  III  blind  site  analyses,  and  hence  in  im- 
proved wheat  area  estimates  for  the  U.S.S.R. 

For  regions  such  as  Canada  with  a variety  of  crop- 
ping practices  (e.g.,  irregular  planting,  grazing,  and 


stripping)  that  result  in  many  boundaries  and  abnor- 
mal signatures  and  the  prevalence  of  many  confu- 
sion crops,  it  is  believed  that  improvements  in  the 
area  estimation  technology  are  required.  However,  it 
is  expected  that  significant  advances  in  the  Landsat 
scanner  and  in  Landsat  data  processing  will  improve 
the  area  estimation  capability  for  these  regions. 
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of  the  World 

D.  E.  Phlnney ,a  R.  G.  Stuff, b A.  G.  Houston, b E.  M.  Hsu,a  and  M.  H.  Trenchant 


INTRODUCTION 

The  LACIE  wheat  production  for  a specific  region 
is  calculated  as  the  product  of  the  total  area  of  wheat 
harvested  and  the  average  yield  per  unit  area  in  the 
region.  Although  Landsat  data  are  used  in  making 
area  estimates,  their  use  in  yield  estimation  either 
alone  or  in  combination  with  conventional 
meteorological  data  is  still  in  the  developmental 
stage.  The  current  LACIE  yield  model  makes  an  in- 
dependent estimate  based  on  weather  variables  ob- 
tained from  ground  reports.  These  weather  observa- 
tions are  provided  by  the  national  weather  service  in 
each  country  and  are  transmitted  internationally  by 
the  synoptic  network  of  the  World  Meteorological 
Organization  (WMO). 

The  LACIE  yield  models  for  the  United  States 
(ref.  1)  represent  the  “first  generation”  of  yield 
models  designed  for  large-area  application.  These 
models  and  their  development  are  described  in  detail 
in  the  paper  by  Strommen  et  al.  entitled  “Develop- 
ment of  LACIE  CCEA-I  Weather/Wheat  Yield 
Models.”  However,  a brief  review  will  be  given  of 
some  salient  points  which  materially  affect  the  per- 
formance of  these  models. 

The  models  were  derived,  using  multiple  linear 
regression,  from  historical  time  series  of  selected 
weather  variables  and  yield.  The  resulting  models  are 
area  specific  with  one  model  for  each  region.  The 
candidate  weather  variables  were  functions  of 
monthly  mean  air  temperature  and  monthly  total 
precipitation.  The  final  selection  of  parameters  used 
in  a given  model  was  based  partly  on  statistical  con- 
siderations and  partly  on  agronomic  interpretation  of 
the  critical  weather  factors  for  the  modeled  area  (see 
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the  paper  by  Strommen  et  al.)  The  methodology  used 
to  develop  the  models  is  shown  schematically  in 
figure  I. 

Figures  2 and  3 show  the  modeled  areas  for  the 
United  States  and  the  U.S.S.R.  Spring  and  winter 
wheat  were  modeled  separately,  resulting  in  14 
models  for  12  areas  in  the  United  States  and  44 
models  for  33  areas  in  the  U.S.S.R. 

Data  from  1932  to  1976  were  used  to  construct  a 
data  bass  for  U.S.  model  development  and  evalua- 
tion. The  regional  yields  were  aggregated  from  U.S. 
Department  of  Agriculture  (USDA)  Statistical  Re- 
porting Service  (SRS)1  crop  reporting  district  (CRD) 
data.  The  weather  data  used  for  model  development 
consisted  of  averages  of  temperature  and  precipita- 
tion for  climatological  divisions.  Weighted  regional 
averages,  based  on  1973  acreage  distributions  for  the 
U.S.  models,  were  calculated  for  each  weather  vari- 
able. 

In  foreign  areas,  the  length  of  available  historical 
records  varied  greatly.  Yields  were  modeled  for  polit- 
ical subdivisions  which  correspond  to  the  official  re- 
porting scheme  for  the  area.  The  handling  of  the 
meteorological  data  also  varied  from  country  to 
country  (sec  the  plenary  paper  by  Strommen  et  al. 
entitled  “The  Impact  of  LACIE  on  a National 
Meteorological  Capability”). 

A trend  component  of  the  year-to-year  variation 
in  yield  has  long  been  recognized  and  has  been  at- 
tributed to  technological  factors  such  as  improved 
varieties,  increased  fertilization,  and  changing 
cultural  practices.  The  LACIE  yield  models  use  a 
linear  trend  based  on  year  which  is  fit  piecewise  as 
shown  for  the  North  Dakota  spring  wheat  model  in 
figure  4. 


*Thc  Statistical  Reporting  Service  (SRS)  has  since  become 
part  of  the  Economics,  Statistics,  and  Cooperatives  Service 
(ESCS). 
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Hl.tRI  I.— Center  for  Climatic  and  Kntirunnienlal  Assessment  (t'f'KA)  first-generation  » heat  sield  model. 


For  use  in  predicting  yields  through  the  crop 
season,  the  LACIE  yield  models  were  used  with 
coefficients  which  were  estimated  using  only  those 
weather  variables  available  up  to  the  time  of  the  esti- 
mate. 


TECHNICAL  APPROACH 


Test  of  90/90  Criterion 

The  goal  of  LACIE  was  to  predict  wheat  produc- 
tion at  harvest,  over  large  areas,  to  within  10  percent 
of  the  true  value  90  percent  of  the  time.  This  was 
referred  to  as  the  90/90  criterion. 

An  evaluation  of  the  yield  models  in  the  context 


I'Hit  Rl‘  1. — Map  showing  boundaries  for  I’.S  (i ml  Plain* 
winter  and  kpring  wheal  model*  a*  ghen  b«  (be  CCKA.  (a) 
W inter  wheal  model  boundaries.  (bl  lipring  wheal  model  bound- 
aries. 
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H(il  KK  3.— Map  thimlm  l .S.S.H.  crop  regions  cut  rfcd  b>  (hr  opr  Inti  and  winter  wheat  regression  modelo  ( retiiono  31  and  32  are  nut 
oliimn).  W lo  winter  wheal:  S in  oprinti  wheat:  and  M h mined  winter  and  opring  wheat. 


of  the  90/90  criterion  can  be  carried  out  indepen- 
dently of  acreage  estimation  errors  by  using  the 
reference  standard  acreage  estimates  together  with 
the  yield  model  prediction.  As  shown  below,  the 
90/90  criterion  for  a production  estimate  with  both 
acreage  and  yield  errors  is  equivalent  to  a 90/93  cri- 
terion for  "a  production  estimate  with  only  yield  er- 
rors. A 90/93  criterion  specifics  that  the  production 
estimate,  with  no  acreage  errors,  be  within  7 percent 
of  the  true  production  with  a probability  of  at  least  90 
percent 

The  90/90  criterion  for  production  may  be  written 
as 


Probability  ( P P < 0.1/*J 


> 0.9 


11(11  KF  4.— t.mpli  uf  iltldt  and  mudrlrd  trend  fur  North 
Dakota  spring  wheal  (1932-76). 
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where  A is  the  LACIE  estimate  or  wheat  production 
and  P is  the  true  wheat  production. 

Assuming  that  the  yield  and  acreage  estimates  are 
independent  and  are  unbiased  estimates  of  the  true 
yield  and  acreage  and  that  the  production  estimates 
are  normally  distributed,  it  has  been  shown  (ref.  2) 
that  the  probability  statement  can  be  written  in 
terms  of  the  variance  c r/5  of  the  production  estimate. 

°P  °P  f 

It  can  be  shown  that  the  variance  of  the  produc- 
tion estimate  may  be  estimated  from  the  vari- 
ance <rp*  of  a production  estimate  made  using  the  ac- 
tual acreage  and  the  yield  estimate. 


op  • 'A  op. 


It  then  follows  that 


Pr  ^ H < 0.070 7 j > 0.9 

That  is,  the  90/90  criterion  for  a production  estimate 
with  both  acreage  and  yield  errors  is  equivalent  to  a 
90/93  criterion  for  a production  estimate  with  only 
yield  errors. 

A random  variable  / can  be  defined  as  follows: 


Z • |£*  p\  0.0707/* 

An  indicator  function  H»(/)  is  defined  such  that 

A >(Z)  « 1 i(Z  < 0 
MZ)  » 0 ifZ  > 0 

The  test  statistic  HZ)  can  be  evaluated  from 
binomial  tables  for  a significance  level  u of  0.07  and  a 


number  of  samples  equal  to  10.  For  example,  for  a 
10-year  test,  if  lA'(Z)  9 8,  the  hypothesis  that  the 
90/90  criterion  has  been  supported  is  not  rejected. 

The  same  approach  may  be  used  to  test  a single 
model  by  modifying  the  random  variable. 

, Ift  „|  0070 IP 

2 ■ r - 'I  - -^r 

where  is  the  fraction  of  the  total  production  con- 
tained in  the  modeled  region.  Accepting  the  90/90 
criterion  test  for  a single  model  is  equivalent  to  say- 
ing that  the  model  performance  is  acceptable  provid- 
ing that  all  other  models  which  comprise  the  total 
area  are  simitar  and  that  the  model  errors  are  not  cor- 
related. 


Ten- Year  Teats 

An  evaluation  of  the  yield  models  was  made  by 
obtaining  10  years  of  yield  predictions  and  corre- 
sponding prediction  error  estimates,  where  the  pre- 
dictions were  obtained  using  a “bootstrap"  pro- 
cedure. In  this  procedure,  predictions  were  made  for 
a particular  year,  then  the  resulting  yield  and  weather 
for  that  year  were  used  to  recalculate  the  model 
coefficients  to  predict  for  the  next  year.  The  yield 
predictions  were  compared  with  the  reference  stan- 
dard over  the  10  years  at  the  level  at  which  the  model 
is  developed.  The  reference  standard  in  the  United 
States  was  the  IJSDA  SRS  estimates.  In  foreign  coun- 
tries. official  country  estimates  were  used  as  the 
reference  if  available;  otherwise.  USDA  Foreign 
Agricultural  Service  (FAS)  estimates  were  em- 
ployed. These  comparisons  were  used  to  determine 
whether  biases  were  indicated  and  where  improve- 
ments were  needed. 

For  those  areas  with  relatively  short  historical 
records,  the  predictive  model  was  developed  using 
data  from  all  years  except  the  year  to  be  estimated. 
Permutation  of  the  test  year  resulted  in  a set  of  quasi- 
independent estimates.  This  was  called  he  “jack- 
knife" test. 

The  squared  prediction  errors  estimated  for  each 
of  the  10  predictions  are  compared  with  the  observed 
mean  squared  error  over  the  test  set  for  each  yield 
model  zone.  This  comparison  indicates  any  short- 
comings in  the  estimator  for  the  prediction  error. 
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PHILOSOPHIC  APPROACH 

Models  were  developed  and  tested  during  LACIE 
for  Argentina,  Australia,  Brazil,  Canada,  India,  the 
USSR.,  and  the  United  States.  All  models  were  sub* 
jected  to  historical  tests.  The  models  for  Canada,  the 
U.S.S.R.,  and  the  United  Stales  were  also  used  in 
LACIE  operations.  The  objective  of  the  model 
evaluations  was  to  determine  whether  it  was  possible 
to  provide  yield  estimates  with  sufficient  accuracy 
and  reliability  to  improve  predictive  abilities,  particu* 
larly  in  foreign  applications. 

However,  in  foreign  areas,  it  was  difficult  to  iso* 
late  the  error  sources  such  that  an  orderly  develop* 
ment  of  yield  technology  could  be  carried  out.  The 
specification  of  trend,  the  density  of  input 
meteorological  data,  and  the  reliability  of  even  offi- 
cial yield  statistics  were  all  confounded  in  most 
foreign  situations.  Thus,  the  U.S.  Great  Plains 
(USGPl  was  selected  as  a “yardstick"  region.  By 
focusing  the  available  LACIE  yield  evaluation 
resources  on  the  USGP,  it  was  possible  to  understand 
more  fully  the  strengths  and  weaknesses  of  opera- 
tional yield  models. 


PHA8E  1(1975  CROP  YEAR)  EVALUATION 

Yield  models  were  developed  during  Phase  I for 
regions  covering  the  nine  states  of  the  USGP.  These 
models  were  applied  at  the  CR  D lev*!  as  well  as  at 
the  regional  level.  During  evaluation  of  these  models, 
it  was  found  that  there  were  no  significant 
differences  between  regional  predictions  obtained 
directly  from  the  tegiona!  models  and  applying  the 
regional  model  at  the  CRD  level  with  individual 
CRD  weather  (ref.  ?).  Exploratory  studies  indicated 
that  models  derived  and  applied  at  the  individual 
CRD  had  the  potential  for  improved  model  perfor- 
mance due  to  greatei  homogeneity  of  weather  and 
yield  data.  However,  limitations  in  resources  pre- 
vented taking  advantage  of  the  pote*it>a-  inherent  in 
modeling  smaller  areas. 

The  mtrdcl  estimates  were  aggregated  to  the  USGP 
level  and  evaluated  using  the  W test  criterion.  The 
results  shown  in  table  I indicate  that  the  Phase  I 
models  did  not  support  project  accuracy  goals.  The 
individual  models  were  also  evaluated  for  their  per- 
formance over  the  lO-vear  period.  Results  are  shown 


in  table  It.  All  models,  except  North  Dakota  and 
Kansas,  sup  tort  the  90/90  objective  when  projected 
to  the  USGP  level. 

Besides  V.ing  evaluated  against  the  90/90  cri- 
terion, mode.s  were  examined  to  determine  their 


Table  l.-Ten-  Year  Bootstrap  Test  for  the  US  Phase  I 
Yield  Models  Aggregated  to  the  USGP  by  Year 
With  90/90  Criterion  Test 


Year 

SMS. 

buiaett 

LACIE. 

butatrr 

Emu a 

Z 

*(Z>* 

1965 

24.0 

225 

0.6 

-29141205 

1 

1966 

22.5 

24.9 

-2.4 

29256745 

0 

1967 

21.5 

205 

1.0 

-20226126 

1 

1961 

26.0 

24.1 

12 

-26250526 

1 

1969 

21.1 

20.5 

-2.2 

1 1994656 

0 

1970 

212 

212 

- 1 

-59542901 

1 

1971 

20.1 

21.1 

2.7 

16716220 

0 

1972 

292 

291 

- 1 

-6427*092 

1 

1973 

201 

26.7 

-5.9 

146521492 

0 

1974 

22.1 

2*4 

-46 

13572*192 

0 

*Mmd  «m»  --lot  j/rttt.  IMU  - J rr  kwwrt 
*itl/>  - S.  rtxtl  tO/TO 


Table  //.— Ten-  Year  Bootstrap  Test  ( 1965-  74) 
for  the  U.S.  Phase  I Yield  Models 
With  90/90  Criterion  Test  by  Mode!  Region 


Model 

Crop 

Mean 

error. 

buhete 

BMSE. 

bu/acre 

Support 

9 am 

Montano 

sw 

04 

240 

Yes 

North  Dakota 

sw 

-2  3 

455 

No 

Rett  River 

sw 

-26 

469 

Yes 

South  Dakota 

sw 

0 

224 

Yes 

Montana 

ww 

7 

371 

Yes 

Badlands 

ww 

19 

530 

Yes 

Nebraska 

ww 

22 

442 

Yes 

Colorado 

ww 

3 

433 

Yes 

Kansas 

ww 

-21 

719 

No 

Oklahoma 

ww 

-1.7 

3 41 

Yes 

Panhandle 

ww 

- 4 

329 

Yes 

Texas  Low 
Plains 

ww 

14 

30* 

Yes 

Total 

sw 

-20 

351 

Total 

ww 

- 5 

351 

Total 

w 

-IP 

2.77 
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ability  to  respond  adequately  to  extreme  weather 
conditions.  Based  on  subjective  analysis  of  a series  or 
descriptive  statistics,  10  of  the  1 2 models  were  judged 
to  lack  adequate  sensitivity  variations  in  weather 
variables  (ref.  2). 


PHASE  II  (1 976  CROP  YEAR)  RESULTS 


tf<>4«l  Modifications 

As  a result  of  the  Phase  I evaluation,  the  U.S. 
models  were  modified  bef  *e  use  in  Phase  II  opera* 
lions.  Two  major  types  of  modifications  were  imple- 
mented (ref.  3).  The  first  change  involved  limiting 
the  allowable  range  of  values  for  each  meteorological 
variable.  In  the  event  that  the  observed  total  pre- 
cipitation exceeded  the  90th  percentile  of  historically 
observed  values,  the  total  was  reduced  to  the  90th- 
perccmile  value.  For  temperatures,  observations  oc- 
curring outside  the  historical  95th  and  Sth  percentiles 
were  assigned  the  value  at  the  appropriate  percentile. 

The  second  modification  of  the  yield  models  for 
Phase  II  was  ba  ^d  on  an  analysis  of  factors,  includ- 
ing cropping  practices  and  fertilizer  application, 
which  are  inherent  in  the  trend  term.  The  models  for 
Texas.  Oklahoma,  and  the  Texas-Okahoma  Panhan- 
dle assumed  that  the  slope  of  the  trend  line  was  zero 
after  I960.  The  remaining  models  assumed  that  the 
trend  di'i  not  continue  to  increase  after  1972.  The 
original  models  had  extended  the  trend  through  the 
prediction  year. 


Historical  Tasting 

The  improvement  in  the  overall  performance  of 
the  Phase  II  models  can  be  seen  through  the  results 
of  the  evaluation  tests.  Table  III  presents  the  results 
of  an  II -year  bootstrap  test  of  yield  models  aggre- 
gated to  the  USGP  icvel.  In  contrast  to  Phase  I.  the 
Phase  II  models  supported  the  90/90  goal.  Both  the 
mean  error  and  the  rooi-mcan-square  error  (RMSE) 
of  the  aggregated  results  were  sharply  reduced.  Table 
IV  shows  that  all  the  individual  models  supported 
the  90/90  criterion.  The  improved  performance  can 
be  seen  by  comparing  the  results  presented  in  table 
IV  with  those  given  in  table  II.  Kansas  and  North 
Dakota  showed  marked  improvement. 

An  analysis  of  the  relation  between  the  estimated 
variance  of  the  prediction  and  the  observed  mean 


square  error  is  shown  in  figure  S.  Overall,  the 
variance  estimates  appeared  reasonable.  The  models 
for  Kansas.  Oklahoma,  and  Texas  had  greater  than 
expected  numbers  of  cases  which  were  outside  the 
calculated  90-percent  conOdencc  interval,  a fact 
which  suggests  a probable  underestimate  of  variance 
for  those  models. 


Table  III. — Eleven-  Year  Bootstrap  Test  for  the  VS 
Phase  II  Yield  Models  Aggregated  to  the  USGP 
by  Year  With  90/90 Criterion  Test 


Year 

SRS. 

Mom 

LACIE. 

Moor 

Error6 

Z 

9<Z)ft 

196$ 

24.0 

244 

-0.2 

—49025712 

t 

1966 

22$ 

24.2 

-1.7 

2007422 

0 

1967 

21.7 

22.2 

- S 

-42796920 

1 

1961 

26.0 

24.4 

1.7 

-702S946 

1 

1969 

21.4 

29.2 

-9 

-27911611 

1 

1970 

21.2 

26.9 

1.2 

— 2I6SI926 

1 

1971 

20.1 

21.1 

26 

IS921400 

0 

1972 

29.2 

29.1 

.2 

-64047762 

l 

1972 

20J 

29.9 

( 

-$6027279 

t 

1974 

22.1 

in 

-24 

12279019 

0 

197$ 

26.1 

27.4 

-.6 

-64975427 

1 

*MfM  arnt  • -0 1 fcu/am.  It  MU  - t kt  tsu'K't 
hviZt  - I.K(WW« 


Table  1 Y. — Eleven-  Year  Bootstrap  Test  (1 96 5-  75) 
for  U.5.  Phase  II  Yield  Models 
With  90/90  Criterion  Test  by  Model 


Model 

Crop 

Mean 

fPPOP. 

bulocre 

RMSE. 

bu/aerr 

Support 

moo 

Montana 

sw 

07 

216 

Yet 

North  Dakota 

SW 

-2.5 

242 

Yet 

Rad  Rivtr 

SW 

-20 

296 

Yet 

South  Dakota 

sw 

-.2 

245 

Yet 

Montana 

ww 

10 

237 

Yet 

Badlands 

ww 

16 

$00 

Yet 

Nabraaka 

ww 

2.7 

423 

Yet 

Colorado 

ww 

- 5 

4 $$ 

Yet 

Kama* 

ww 

- 2 

372 

Yet 

Oklahoma 

ww 

16 

300 

Yes 

Panhandle 

ww 

l.l 

323 

Yes 

Texas  Low 
Plains 

ww 

.2 

259 

Yet 

Total 

sw 

-16 

270 

Total 

ww 

.7 

ISO 

Total 

w 

- 1 

161 

380 


32 


INDIVIDUAL  PREDICTION 
ERRORS  SQUARED, 

S^HXilXXC'Xk] 


Fl(il  RE  5. — Comparison  of  estimated  and  observ ed  variances  for  yield  prediclions. 


Significant  bias,  detected  by  a t-test  on  the  mean 
error,  was  found  for  the  North  Dakota,  Nebraska, 
and  Oklahoma  models.  This  bias  was  attributed  to 
differences  between  the  zone  boundaries  used  for 
testing  (fig.  2)  and  those  used  lor  developing  the 
models  (ref.  3).  The  areas  not  included  in  model 
development  had  relatively  small  acreages  of  wheat 
but  may  have  contributed  to  the  observed  bias.  More 
significant  were  those  areas,  indicated  by  hatching  in 
figure  6,  which  were  used  in  the  development  of  adja- 
cent models.  For  example,  inclusion  of  the  low  yield 
in  the  Nebraska  Panhandle  in  the  development  of 
the  Nebraska  model  would  result  in  low  estimates 
when  applied  to  the  rest  of  Nebraska. 


Operational  use  of  the  yield  models  for  the  USGP 
area  during  Phase  II  gave  extremely  promising 
results.  Table  V shows  a comparison  of  the  LACIE 
and  SRS  yield  estimates  for  the  end  of  season.  The 
results  are  given  by  slate  together  with  aggregated 
figures  for  the  spring  and  winter  wheat  regions  and 
for  the  USGP.  With  the  exception  of  South  Dakota, 
where  the  LACIE  yield  estimates  did  not  completely 
capture  the  full  effects  of  a severe  drought,  the  per- 
formance was  remarkable.  Reexamining  the  test 
results  for  these  models  (table  III)  reveals  that  there 
were  only  2 years  during  the  1965-75  period  in  which 
the  Phase  11  accuracy  was  equaled  or  exceeded. 

Table  VI  gives  a month-by-month  comparison  of 
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the  LACIE  and  SRS  yield  estimates  for  spring, 
winter,  and  total  wheat  for  the  USGP.  As  can  be 
seen,  the  SRS  estimates  rise  steadily  and  converge 
with  the  relatively  constant  LACIE  estimates. 

Historical  studies  or  the  LACIE  yield  models  for 
Canada  (ref.  4)  and  for  the  U.S.S.R.  (ref.  5)  were  con- 
ducted. Evaluation  of  the  model  tests  indicates  that 
the  90/90  criterion  was  supported  at  the  country 
level.  All  Canadian  models  and  all  winter  wheat 
models  individually  supported  90/90.  However,  7 of 


nUtlK  6.— Boundaries  far  weather  data  used  In  developing  the 
yield  models  tested  in  Phase  II. 


Table  V. — Phase  It  (1976  Crop  Year ) Results  From 
LACIE  Operational  Yield  Models  Compared  With  SRS 
for  Final  Estimate 


Area 

Crop 

SRS. 

bu/aerr 

LACtE. 

bu/acre 

Error. 

bu/acre 

Rtf 

Montana 

SW 

29.4 

27.1 

2.3 

-8.5 

North  Dakota 

SW 

24.7 

27.0 

-23 

85 

Minnesota 

SW 

32.4 

303 

2,1 

-6.9 

South  Dakota 

SW 

10.9 

17  2 

-6.3 

36.6 

Montana 

ww 

32.0 

29.9 

2.1 

-7.0 

South  Dakota 

ww 

18.0 

31.6 

-13.6 

43.0 

Nebraska 

ww 

32.0 

32.7 

.7 

2.1 

Colorado 

ww 

21.5 

19.6 

1.9 

-9.7 

Kansas 

ww 

30.0 

31.0 

— 1.0 

32 

Oklahoma 

ww 

24.0 

22.6 

1.4 

-6.2 

Texas 

ww 

22.0 

18.7 

3.3 

-17.6 

USGP 

SW 

253 

26.2 

-.9 

3.4 

USGP 

ww 

27.0 

27.0 

0 

0 

USGP 

TW 

264 

26.7 

-.3 

1.1 

*R«l*ltvc difference  * ((LACIE  - SRS*  LACIE)  * I00jvef«n) 


23  U.S.S.R.  spring  wheat  yield  models  were  judged 
inadequate. 


Operational  Testing 

Limited  testing  of  the  LACIE  models  for  Canada 
and  the  U.S.S.R.  was  carried  out  in  an  operational 
mode.  Table  VII  compares  the  LACIE  yield  esti- 
mates with  those  provided  by  the  USD  A FAS.  The 
operational  test  covered  the  Canadian  prairies,  repre- 
senting 16  CRD's,  and  two  indicator  regions  in  the 
U.S.S.R.,  covering  36  districts. 

The  lack  of  detailed  regional  figures  in  foreign 
areas  makes  meaningful  evaluation  of  these  esti- 
mates difficult.  This  points  to  the  very  real  need  to 
base  the  primary  determination  of  the  capabilities  of 
the  LACIE  yield  models  on  their  performance  in  the 
USGP  area,  where  a detailed  error  analysis  can  be 
carried  out. 


PHASE  III  (1 977  CROP  YEAR)  RESULTS 


Model  and  Methodology  Modifications 

For  Phase  111.  two  additional  yield  models  were 
developed  to  expand  coverage  to  areas  not  pre- 
viously modeled.  Increasing  wheat  production  in 
parts  of  Minnesota  traditionally  planted  to  other 


Table  VI.— Comparison  of  Phase  Ilf 1976  Crop  Year) 
LACIE  and  SRS  Yield  Estimates  bv  Month  for  the 
USGP 


(In  bushels  per  acre / 


Month 

Spring  wheat 

Winter  wheat 

Total  wheat 

SRS 

LACIE 

SRS 

LACIE 

SRS 

LACIE 

February 

19.8 

27  6 

March 

19.8 

27.0 

April 

22.7 

25.9 

May 

24.9 

25.3 

June 

24.8 

26.5 

July 

26.4 

26.7 

August 

24.3 

26.3 

269 

26.7 

259 

266 

September 

26.4 

26.3 

269 

27.0 

26.7 

26.8 

October 

25.7 

26.2 

26.9 

27,0 

26,4 

267 

Final 

25.3 

26.2 

27.0 

270 

26.4 

26.7 
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crops  resulted  in  a new  model  for  that  part  of  the 
state  not  previously  covered  by  the  Red  River 
model.  In  addition,  a model  for  south-central  Texas 
and  parts  of  the  coastal  areas  was  implemented.  In 
each  of  these  models,  a linear  trend  from  the  period 
1955-75  was  used  with  no  change  in  trend  after  1975. 

A modification  to  the  historical  testing  of  the 
models  was  implemented  to  make  the  historical  tests 
more  representative  of  the  results  that  would  have 
been  obtained  if  the  models  had  been  in  operation 
throughout  the  test  period.  On  the  assumption  that 
an  inflection  point  in  the  trend  term  would  not  be 
recognized  in  real  time,  the  trend  component  of  the 
model  was  continued  for  2 years  beyond  the  year 
that  hindsight  analysis  had  shown  to  be  a breakpoint. 


Historical  Testing 

The  results  of  a 10-vear  (1967-76)  period  of  using 
this  "continued”  trend  procedure  arc  shown  in  U'  »le 

Table  VI i — Phase  II  (1976  Crop  Year)  Results  From 
LACIE  Operational  Yield  Models  for  Foreign  Indicator 
Regions  Compared  with  FAS  Estimates 


(a)  V.S.S.R.  winter  wheal  indicator  region 


Period 

FAS. 

LACIE. 

Error. 

RrP 

ql/lia 

qllha 

ql/ha 

Marly  season 

24  0 

25.7 

-1.7 

6.6 

Midscason 

24.7 

25.3 

- 6 

2.4 

Harvest 

27.6 

246 

3.0 

-12.3 

(h)  USSR,  spring  wheat  indicator  region 

Period 

FAS. 

LACIE. 

Error. 

RltP 

qllha 

qllha 

qllha 

Marly  season 

10.0 

107 

-0.7 

6.5 

Midscason 

10.1 

106 

.3 

-2.8 

Harvest 

11.3 

105 

8 

-7.6 

(ell  anada  spring  wheat  iniliealar  region 


Period 

FAS. 

hit/acre 

LACIE. 

hn/acre 

Error. 

hu/acre 

RIY1 

Marly  season 

216 

27.7 

11 

-6.8 

Midseason 

216 

278 

18 

65 

Harvest 

31.1 

27.7 

3.4 

-13  3 

*RcUlivc  iltrtcicnvc  “ 111  M l t I ASI  - l Mil  I * 100  percent 


VIII.  The  estimated  yields  for  1967,  1973,  and  1974 
arc  those  affected  by  the  modification  ip.  testing  pro- 
cedures. Other  small  changes  from  the  results  shown 
in  table  III  stem  from  including  the  newly  modeled 
regions  in  the  aggregation.  The  reported  RMSE's  in- 
creased as  a result  of  this  procedure,  reflecting  a 
more  realistic  picture  of  the  models'  true  predictive 
abilities.  The  test  results  also  show  that,  based  on  the 
historical  test,  the  Phase  111  yield  models  supported 
the  90/90  criterion. 

Table  IX  shows  that  all  the  individual  models  sup- 
ported the  90/90  criterion.  The  spring  wheat  models 
as  a group  tended  to  overestimate  yield,  with  particu- 
lar problems  occurring  in  the  North  Dakota  and  Red 
River  models.  Figure  7 gives  the  results  of  a con- 
tingency test  for  the  spring  wheat  models.  The  devia- 
tion of  the  actual  yields  from  the  model  trend  was 
compared  with  the  relative  model  error.  An  over- 
estimation  of  bclow-normal  yields  and  an  under- 
estimation of  above-normal  yields  were  found.  The 
X 2 value  was  equal  to  33.79  with  16  degrees  of 
freedom  and  is  significant  at  the  1-percent  level.  The 
modeled  trends  overall  appear  to  be  overestimates  of 
the  actual  trend,  contributing  to  the  tendency  toward 
a positive  bias  for  the  aggregated  total  spring  wheat. 

The  winter  wheat  models  performed  well  as  a 
group.  The  models  for  the  Badlands.  Colorado,  and 
Kansas  showed  the  largest  error  rates.  The  Kansas 


Table  VIII. — Ten-Year  Bootstrap  Test 
for  U.S.  Phase  III  Models  With  Continued  Trend 


/In  bushels  per  acre/ 


Year 

Total  wheat 

Spring 

wheat 

It  inter  wheat 

SRS 

Model 

error11 

SRS 

Model 

errof 

SRS 

Model 

error1 

1967 

21.6 

0.9 

22.9 

0.3 

21  0 

l.l 

1168 

260 

-14 

26  1 

-1.9 

25.9 

-1.2 

1161 

284 

1.0 

284 

2.2 

284 

5 

1170 

28.2 

-1.6 

23.5 

-1.0 

304 

-19 

1171 

30.8 

-2.9 

30.6 

-1.7 

309 

-3.7 

1172 

29  3 

-.2 

28.5 

2.2 

29.7 

-1.5 

1173 

308 

-.2 

27.7 

2 

324 

- 3 

1174 

238 

4.6 

208 

6.6 

25.5 

3.4 

1175 

268 

.5 

25.7 

8 

27.4 

.3 

1976 

264 

.7 

25.3 

2.0 

27.1 

- 1 

error,  0 I hu-‘4vrc.  RMS|  , I 90  bu/actc 
^Mc«n  error.  I 0 bu/Wre.  RMS!  , 2 5b  bu'turc 
lMe*n  enor.  -04  bu-acte.  RMSI  . I 84bu/<utc 
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model  may  be  of  particular  concern  because  in  recent 
years  Kansas  has  accounted  for  nearly  40  percent  of 
the  winter  wheat  and  25  percent  of  the  total  wheat 
production  in  the  USGP. 

For  comparative  purposes,  the  acreage  and  pro- 
duction of  the  newly  modeled  reg.ons  were  removed 
from  the  aggregation  for  1976.  The  resulting  esti- 
mated yield  for  the  USGP  in  1976  was  26.95  bushels 
per  acre,  which  corresponds  well  with  the  27.0 
bushels  per  acre  obtained  during  Phase  II  operations. 
This  is  important  in  that  there  was  concern  that  the 
weather  data  used  for  model  input  during  operations, 
which  was  derived  from  analyses  of  the  synoptic 
scale  weather  network,  was  not  comparable  with  the 
data  used  for  model  development  and  testing,  which 
came  from  a much  higher  density  climatic  observa- 
tion network.  Figure  8 shows  the  results  of  a more 
detailed  comparison  between  yield  estimates  calcu- 
lated from  high-  and  low-density  meteorological 
data.  On  the  basis  of  2 years'  data,  it  appears  that  the 
operational  handling  of  meteorological  data  in- 
troduces no  significant  bias  to  the  yield  estimate. 

Table  IX. — Ten-  Year  Bootstrap  Test  (1967- 76) 
for  U.S.  Phase  III  Yield  Models  Using  C ntinued  Trend 
With  90/90  Criterion  Test  by  Model  Region 


Model 

Crop 

Mean  error, 
but  acre 

RMSE. 

bu/acre 

Support 

90/90 

Montana 

SW 

0.6 

2.18 

Yes 

North  Dakota 

SW 

-1.2 

2.94 

Yes 

Red  River 

SW 

-1.4 

3.9S 

Yes 

Minnesota 

SW 

-.6 

3.8! 

Yes 

South  Dakota 

SW 

-.8 

3.00 

Yes 

Montana 

WW 

.3 

2.69 

Yes 

Badlands 

WW 

.1 

4.61 

Yes 

Nebraska 

WW 

-.2 

2.92 

Yes 

Colorado 

WW 

.8 

3.42 

Yes 

Kansas 

WW 

.3 

3.39 

Yes 

Oklahoma 

WW 

-.1 

2.21 

Yes 

Panhandle 

WW 

.5 

2.69 

Yes 

Texas  Low 

WW 

.6 

2.74 

Yes 

Plains 

Texas  Edwards 

WW 

.8 

2.88 

Yes 

Plateau 

Texas  South 

WW 

-.8 

2.69 

Yes 

Central 

Total 

SW 

-1.0 

2.56 

Total 

WW 

.4 

1.84 

Total 

w 

-.1 

1.90 

Early  in  Phase  III,  a series  of  tests  was  conducted 
in  which  the  ability  of  the  models  to  predict  yield 
before  the  at-harvest  estimate  was  evaluated.  The 
results  of  these  trials  for  Kansas  are  typical  and  may 
be  seen  in  table  X.  Clearly,  as  successive  months  of 
weather  data  are  added  to  the  model,  the  skill  of  the 
prediction  increases.  The  variability  associated  with 
fitting  the  piecewise  trend  as  each  new  year  is  added 
to  the  model  is  evident.  Since  the  models  are  predict- 
ing weather-induced  variations  around  the  trend,  the 
trend  specification  is  a significant  source  of  potential 
error. 


PERCENT  SRS  DEVIATION  FROM  TREND 


MOOEl 

ERROR 


<-  20 

- 20 
TO 
- to 

♦ to 
TO 

♦ to 

♦ to 

TO 

♦ 20 

> + 20 

<-20 

- » 
TO 
- 10 

1 

2 

2 

- to 

TO 

♦ to 

2 

0 

It 

1 

1 

♦ 10 
TO 
♦ 20 

4 

2 

2 

>+20 

4 

1 

MODEL 

UNDERESTIMATED 


X*  = M.T0 
d.l.  : 16 


OVERESTIMATED 


YKLO 

BELOW  TREND 


VCLD 

ABOVE  TREND 


FKil'RF.  7.— Contingency  table  of  model  error  and  deviation  of 
actual  yield  from  trend  for  all  spring  wheat  models. 


YIELD  ESTIMATE 
FROM  LOW-DENSITY 
MET  DATA. 
BUIACRE 


YIELD  ESTIMATE  FROM  HIDH-OENSITY  MET  ATA.  BU/ACRE 


FKil  RF  8.— C omparison  of  yield  estimates  resulting  from  high- 
and  low-density  input  meteorological  data  for  crop  years  1976 
and  1977  for  each  IJ.S.  model. 
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Operational  Tasting 

Table  XI  gives  the  results  obtained  by  L ACIE  dur- 
ing the  Phase  III  (1977)  crop  year.  A comparison  of 
the  end-of-season  yield  estimates  produced  by 
LACIE  and  SRS  by  state  shows  sizable  errors.  Sig- 
nificant underestimates  occurred  in  Montana  and 
Minnesota  for  spring  wheat.  In  winter  wheat,  notable 
underestimates  also  occurred  in  Texas  and 
Oklahoma,  with  a corresponding  overestimate  in 
Kansas.  The  resulting  aggregations  also  underesti- 
mated the  yield  for  spring,  winter,  and  total  wheat. 
An  examination  of  the  historical  tests  shows  that — 
in  contrast  with  Phase  II — only  in  2 years  were  the 
aggregated  yield  estimates  worse  than  during  Phase 
III.  Table  XII  shows  the  monthly  progression  of  SRS 
and  LACIE  yield  estimates. 

During  Phase  111  operations,  the  only  foreign  ap- 
plication of  the  LACIE  yield  models  was  in  the 
U.S.S.R.  Table  XIII  presents  a comparison  of  the 
LACIE  and  FAS  yield  estimates  on  a month-by- 
month basis.  LACIE  yield  estimates  for  total  wheat 
yield  were  11  percent  below  those  supplied  by  FAS 
for  the  at-harvest  estimate. 

A complete  assessment  of  the  radically  different 
performance  of  the  LACIE  yield  models  between 
Phase  II  and  Phase  III  in  the  USGP  is  difficult.  Phase 
II  represented  a relatively  normal  crop  year. 
Although  conditions  were  dry,  the  weather  was  fairly 
constant.  Phase  III  began  dry,  and  the  LACIE  yield 
models  place  emphasis  on  early-scason  moisture 
conditions.  Plentiful  but  erratic  precipitation 
followed  throughout  the  growing  season.  The  models 
were  unable  to  reflect  the  adaptability  of  the  crop, 
which  took  advantage  of  the  erratic  but  improving 
conditions. 

As  a result  of  the  ongoing  evaluation  of  the 
LACIE  yield  models,  a model-by-modc!  revision  was 
completed  prior  to  operational  use  during  the  LACIE 
Transition  Year.  A complete  statistical  reevaluation 
of  the  weather  and  trend  components  of  each  model 
was  completed,  resulting  in  the  first  complete  revi- 
sion in  yield  models  for  LACIE  operations.  At  this 
time,  an  evaluation  of  the  revised  models  has  not 
been  completed.  However,  a consensus  seems  to 
have  emerged  that  the  results  presented  here  repre- 
sent the  state  of  the  art  for  a model  of  this  spatial  and 
temporal  resolution. 


Table  X.— Results  of  Ten - Year  Bootstrap  Test 
for  Phase  III  Kansas  Winter  Wheat  Yield  Model 
by  Truncation 


(In  bushels  per  acre] 


Year 

SRS 

CCEA  truncation 

Trend 

Feb. 

Mar. 

May 

June 

1967 

20.0 

25.4 

22.4 

20.8 

22.4 

20.6 

1968 

26.0 

24.5 

23.3 

22.3 

24.0 

244 

1969 

31.0 

25.1 

26.8 

30.1 

30.7 

31.7 

1970 

33.0 

26.9 

26.9 

29.1 

29.3 

30.0 

1971 

34.5 

28.8 

28.7 

27.7 

28.6 

28.9 

1972 

33.5 

30.7 

29.9 

28.6 

29.6 

29.6 

1973 

37.0 

31.2 

32.7 

35.0 

34.6 

35.9 

1974 

27.5 

32.1 

33.4 

33.6 

32.2 

32.8 

1975 

29.0 

31.5 

31.3 

32.0 

32.3 

31.9 

1976 

30.0 

31.2 

29.0 

29.2 

30.2 

30.3 

Mean 

1.41 

1.71 

1.31 

0.76 

0.54 

error 

RMSE 

4.54 

4.17 

3.89 

3.35 

3.11 

Table  XI. — Phase  HI  (1977  Crop  Year)  Results  From 
LACIE  Operational  Yield  Models  Compared 
With  SRS  for  Final  Estimates  for  the  USGP  Area 


Area 

Crop 

SRS. 

bu/acre 

LACIE. 

bu/acre 

Error, 

bu/acre 

RD° 

Montana 

sw 

22.0 

18.0 

4.0 

—22.2 

North  Dakota 

sw 

24.9 

23.1 

1.8 

-7.8 

Minnesota 

sw 

39.9 

32.0 

7.9 

-24.7 

South  Dakota 

sw 

23.5 

20.8 

2.7 

— 13.0 

Montana 

ww 

29.0 

26.5 

2.5 

-9.4 

South  Dakota 

ww 

25.0 

27.1 

-2.1 

7.7 

Nebraska 

ww 

35.0 

32.0 

3.0 

-5  4 

Colorado 

ww 

22.0 

22.5 

-.5 

2.2 

Kansas 

ww 

23.5 

28.8 

-5.3 

18.4 

Oklahoma 

ww 

27.0 

20.0 

7.0 

-35.0 

Texas 

ww 

250 

203 

<',.7 

—23.2 

USGP 

sw 

27.1 

23.4 

3.7 

— 15.8 

USGP 

ww 

27.7 

25.6 

2.1 

-8.2 

USGP 

TW 

27.5 

24.9 

2.6 

-10.4 

"Relative  difference  • ((LACIE  - SRS)  + LACIE)  x 100  percent 
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Table  XII.— Comparison  of  Phase  III  (1977  Crop  Year ) LAClEandSRS  Yield 
Estimates  by  Month  for  the  USGP  Area 

(In  bushels  per  acrel 


Month 

Spring  wheat 

Winter  wheat 

Total  wheat 

SRS 

LACIE  Error 

SRS 

LACIE  Error 

SRS 

LACIE  Error 

February 

22.5 

25.7 

May 

28.2 

25.5 

June 

29.2 

25.5 

July 

27.0 

24.8 

28.4 

25.6 

28.0 

25.4 

August 

27.7 

23.4 

27.7 

25.6 

27.7 

24.9 

September 

26.9 

23.6 

27.8 

25.5 

27.5 

24.9 

October 

26.7 

23.4 

27.8 

25.6 

27.5 

24.9 

Final 

27.1 

23.4  3.7 

27.7 

25.6  2.1 

27.5 

24.9  2.6 

Table  XIII. — Comparison  of  LACIE  and  FAS/U.S.S.R.  Yield  Estimates  for  Phase  111(1977  Crop  Year) 


Month 

Winter  wheat 

Spring  wheat 

Total  wheat 

FAS/U.S.S.R. 

ql/ha 

. LACIE. 
ql/ha 

percent 

FASI 

USSR.. 

ql/ha 

LACIE. 

ql/ha 

RD, 

percent 

FAS/ 

U.SSR.. 

ql/ha 

LACIE. 

ql/ha 

RD. 

percent 

April 

24.3 

May 

24.1 

June 

25.6 

July 

25.9 

August 

27.0 

25.5 

-5.9 

11.0 

9.0 

-22.2 

15.2 

-5.3 

September 

28.8 

25.6 

-5.5 

mms 

-7.8 

16.1 

14.7 

-9.5 

October 

28.8 

25.6 

-5.5 

-10.2 

16.1 

14.5 

-11.0 

Final 

28.8 

25.6 

-5.5 

■i 

-10.2 

16.1 

14.5 

— 11.0 

CONCLUSIONS 


The  LACIE  yield  models  which  were  developed, 
implemented,  and  tested  during  the  three  phases  of 
this  experiment  represent  the  first  generation  of 
yield  models  designed  for  the  large-area  prediction  of 
wheat  yields.  The  models  are  capable  of  supporting 
the  stated  project  goal  of  being  within  10  percent  of 
the  actual  wheat  production  90  percent  of  the  time. 

The  limitations  of  these  models  arc  inherent  in 
their  nature.  The  temporal  resolution  (1  month) 
limits  their  ability  to  handle  the  erratic  weather  oc- 
curring in  critical  situations.  The  assumption  that  the 
crop  is  always  in  the  same  growth  stage  during  a 
given  month  limits  the  model's  ability  to  respond  to 
early  or  late  crop  development  in  a particular  year. 


This  was  particularly  apparent  in  1974  when  planting 
in  the  spring  wheat  region  was  up  to  1 month  late. 
The  relatively  large  spatial  resolution  of  the  in- 
dividual models  limits  the  capture  of  localized  but 
important  episodic  events.  However,  the  LACIE 
yield  models  have  provided  a valuable  baseline  un- 
derstanding of  the  problems  associated  with  predict- 
ing yields  for  large  areas.  In  addition,  these  models 
provide  a valuable  benchmark  against  which  to  com- 
pare more  sophisticated  models  which  a:c  designed 
to  overcome  current  limitations  and  to  provide  a sec- 
ond generation  of  predictive  yield  models  (see  the 
paper  by  Stuff  et  al.  entitled  “Status  of  Yield  Estima- 
tion Technology:  A Review  of  Second-Generation 
Model  Development  and  Evaluation”  and  the  paper 
by  Cate  et  al.  entitled  “The  Law  of  the  Minimum  and 
an  Application  to  Wheat  Yield  Estimation"). 
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Accuracy  and  Performance  of  LACIE  Crop 
Development  Models 

5.  K.  Woolley * V.  S.  Whitehead * R.  G.  Stuff*  and  W.  £.  Ova * 


INTRODUCTION 

The  intent  of  this  paper  is  to  describe  the  accuracy 
or  the  crop  development  model  during  each  of  the  3 
years  of  LACIE.  The  estimated  and  observed  crop 
development  data  were  compared  in  order  to  estab- 
lish a measure  of  confidence  in  the  model  and  to 
identify  consistent  discrepancies  that  would  adverse- 
ly affect  LACIE  operations.  Although  the  model 
provided  reliable  estimates  for  various  wheat-grow- 
ing regions  of  the  world,  it  was  found  that  there  are 
still  areas  in  need  of  further  model  improvement  or 
development. 


CROP  CALENDAR  MODEL 

Crop  development  models,  referred  to  in  this  re- 
port as  crop  calendar  models,  served  two  purposes  in 
LACIE.  First,  they  provided  an  estimate  of  the  stage 
of  development  to  the  yield  model  builder.  This  is  an 
important  variable  in  the  more  recently  developed 
yield  models  such  as  the  Feyerherm  model  (ref.  I) 
and  the  Cate-Liebig  model  (see  paper  by  Cate  et  al. 
entitled  “The  Law  of  the  Minimum  and  an  Applica- 
tion to  Wheat  Yield  Estimation").  These  model 
forms  were  developed  to  account  for  the  change  in 
response  of  wheat  yield  to  environment  as  the  plant 
progresses  toward  maturity.  By  providing  growth 
stage  estimates,  the  crop  calendar  model  determined 
when  the  coefficients  or  response  functions  in  the 
yield  model  changed  from  one  set  to  another.  Sec- 
ond, the  crop  calendar  model  output  provided  an  in- 
dicator to  the  analyst-interpreter  of  what  the  ap- 
pearance of  the  crop  should  be  (i.e.,  at  2.3,  the 
analyst-interpreter  should  begin  to  detect  vegetation; 


8 Lock  heed  Electronics  Company.  Houston,  Texas. 
^NASA  Johnson  Space  Center,  Houston,  Texas. 


near  S.O,  he  should  detect  the  vegetation  turning 
from  green  to  gold;  etc.). 

The  accuracies  required  to  support  these  users 
have  been  difficult  to  ascertain.  Because  the  applica- 
tion of  the  crop  development  esi 'mates  refers  to 
several  fields  or  farms  (sample  segment  for  analysis; 
pseudozone  for  yield),  it  would  not  be  practical  to 
strive  for  better  accuracy  in  time  than  the  equivalent 
spatial  scatter  of  the  crop  progression  in  the  area  of 
interest.  The  standard  deviation  of  crop  progression 
within  the  sample  segments  has  not  been  previously 
determined  and  probably  varies  greatly  with  time 
and  location  and  time  of  year,  but  a period  of  5 to  7 
days  appears  to  be  a reasonable  goal  for  crop  calendar 
accuracy. 

A study  was  made  to  compare  and  evaluate  the 
principal  phenological  crop  calendar  models  availa- 
ble for  spring  wheat.  The  models  evaluated  were  the 
growing-season  degree-day,  photothermal  units,  and 
Robertson's  triquadratic  model  (ref.  2).  From  these 
tests,  the  Robertson  model  (ref.  3)  was  recom- 
mended and  selected  for  use  in  LACIE  applications. 

The  Robertson  model  predicts  the  rate  of  progres- 
sion of  wheat  through  its  biological  development. 
Daily  maximum  and  minimum  temperatures  and 
day  length  are  the  input  variables.  Day  .length  is 
calculated  internally  from  the  latitude  and  the  date. 
The  principal  output  of  the  crop  calendar  model  is  a 
daily  increment  of  development  (DID)  through  six 
physiological  stages  of  growth. 

Robertson’s  model  consists  of  the  product  of 
quadratic  expressions  involving  the  three  input 
variables  (thus  the  term  “triquadratic  model").  For 
the  LACIE  application,  a quadratic  equation  was 
used  to  calculate  the  DID  within  each  of  six 
physiological  stages.  The  increments  are  accumu- 
lated from  stage  to  stage.  Because  wheat  responds 
differently  to  the  environment  during  each 
physiological  stage  of  growth,  five  different  equa- 
tions are  required. 


The  model  was  developed  using  only  data  from 
Canada  for  spring  wheat.  Terms  and  coefficients  are 
the  same  for  all  locations.  In  1976,  Feyerherm  (ref. 
1)  developed  a scalar  multiplier  that  was  applied  to 
the  initial  equations  between  emergence  and  heading 
and  that  reflected  the  effect  of  dormancy  on  winter 
wheat.  With  the  Feyerherm  multipliers,  the  model 
with  Robertson's  original  coefficients  produced 
fairly  reliable  estimates  of  the  heading  and  ripening 
times  for  winter  wheat.  A more  detailed  explanation 
of  the  model  appears  in  the  paper  by  Whitehead  and 
Phinney  entitled  “Growth  Stage  Estimation." 


APPLICATIONS  SUMMARY 


Phase  I Operations 

The  National  Oceanic  and  Atmospheric  Adminis- 
tration (NOAA)  had  the  responsibility  for  design, 
implementation,  and  operation  of  the  adjustable  crop 
calendar  (ACC)  model.  Because  of  limited  NOAA 
resources,  the  National  Aeronautics  and  Space  Ad- 
ministration (NASA)  assisted  in  the  design  and  im- 
plementation of  the  model.  In  Phase  I.  the  model 
was  designed  to  be  run  at  the  crop-reporting-district 
(CRD)  level  on  the  U.S.  Great  Plains.  NASA  pro- 
vided normal  crop  calendars  for  all  CRD's  in  the  U.S. 
Great  Plains  states  (ref.  4).  The  model  then  brought 
together  the  data  base  and  current  meteorological 
data  to  generate  updates.  The  model  was  not  fully 
implemented  until  spring  1975.  For  winter  wheat  in 
each  CRD  in  the  U.S.  Great  Plains  states  where  a 
winter  wheat  sample  segment  existed,  the  U.S. 
Department  of  Agriculture  (USD A)  supplied  the 
Yield  Estimation  Subsystem  (YES)  with  the  actual 
date  at  which  50  percent  of  the  crop  had  begun  to 
joint.  The  model  was  started  in  spring  wheat  CRD's 
on  the  actual  date  when  50  percent  of  the  crop  had 
been  planted.  Daily  maximum  and  minimum  tem- 
peratures were  selected  for  a representative  first- 
order  weather  station  in  a particular  CRD.  If  none 
existed,  weighted  values  were  used  from  the  nearest 
neighboring  stations.  The  temperature  data  used 
were  in  punched  card  form.  The  model  was  updated 
every  2 weeks  in  the  batch  mode  on  the  IBM  370/168 
computer  facility  at  the  University  of  Missouri  by 
personnel  from  the  NOAA  Center  for  Climatic  and 
Environmental  Assessment  (CCEA)  at  Columbia, 
Missouri,  and  was  mailed  to  the  NASA  Johnson 
Space  Center  (JSC).  Seven  developmental  stages 


were  implemented  for  the  LACIE  project  for  spring 
and  winter  wheat.  These  stages  of  development  were 
identical  to  those  used  by  Robertson,  but  the  corre- 
sponding numbers  selected  by  LACIE  were  one 
greater  than  those  defined  by  Robertson.  The  stages 
of  development  and  their  corresponding  numbers  on 
the  time  scale  were  as  follows.  . 

1.0  Planted 

2.0  Emerged 

3.0  Jointed 

4.0  Headed 

5.0  Soft  dough 

6.0  Ripe 

7.0  Harvest 

The  whole  numbers  indicate  that  50  percent  of  the 
wheat  in  an  area  had  reached  that  particular  develop- 
ment stage.  Thus,  for  stage  4.0,  50  percent  of  the 
wheat  in  the  area  had  begun  to  head. 

Phase  II  Oparatlona 

The  NASA  assumed  the  responsibility  for  design 
and  implementation  in  the  new  winter  und  spring 
wheat  regions.  The  winter  wheat  regions  under  study 
in  Phase  II  were  expanded  to  include  the  U.S.S.R. 
and  the  People’s  Republic  of  China  (P.R.C.),  in  addi- 
tion to  the  U.S.  Great  Plains.  The  corresponding 
spring  wheat  areas  were  Canada,  the  U.S.S.R.,  the 
P.R.C.,  and  the  U.S.  Great  Plains.  NOAA  retained 
the  responsibility  for  the  operation  of  the  model,  but 
the  location  for  model  operations  was  transferred 
from  Columbia,  Missouri,  to  the  IBM  360/65  facility 
in  Washington,  D.C. 

The  model  for  winter  wheat  was  started  with  a 
normal  end-of-dormancy  restart  model  (ref.  5)  in  the 
U.S.  Great  Plains.  Corresponding  end-of-dormancy 
dates  and  development  stage  numbers  were  obtained 
from  climatic  analogs  and  were  transferred  to  crop 
calendar  stations  in  the  U.S.S.R.  and  the  P.R.C.  The 
model  for  spring  wheat  was  started  in  all  spring 
wheat  countries  using  the  planting  model  developed 
by  Feyerherm  (ref.  6).  Crop  calendar  adjustments 
were  provided  for  a weather  station  instead  of  for  an 
area,  such  as  the  CRD  usage  in  Phase  I.  These  esti- 
mates were  updated  biweekly  at  first-order  stations 
and  transmitted  to  JSC  via  a time-sharing  operation 
(TSO). 

Toward  the  end  of  Phase  II  operations,  the  loca- 
tion of  model  operation  was  transferred  from  Wash- 
ington, D.C.,  to  the  IBM  360/195  computer  facility  at 
Suitland,  Maryland.  The  data  continued  to  be 
transmitted  to  JSC  via  a TSO. 
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During  the  1976  growing  season.  Southern 
Hemisphere  calendars  were  initiated  Tor  25  selected 
stations  in  Argentina,  Australia,  and  Brazil.  The 
transfer  of  crop  calendar  estimates  was  delayed 
several  times  because  of  model  errors  and 
meteorological  data  problems  and  was  not  completed 
until  the  summer  of  1977. 


Phase  III  Operations 

Because  of  the  lack  of  success  of  a winter  wheat 
starter  model,  the  ACC  was  started  using  normal  fall 
planting  dates  in  the  U.S.  Great  Plains,  the  U.S.S.R., 
and  the  P.R.C.  The  model  estimates  were  generated 
on  the  IBM  360/195  computer  facility  at  Suitland, 
Maryland,  and  continued  to  be  transmitted  to  JSC 
via  a TSO.  The  estimates  were  updated  biweekly  and 
were  modified  by  use  of  the  Feyerherm  scalar 
multiplier  at  each  crop  calendar  station.  The 
multipliers  were  applied  to  the  ACC  equations  be* 
tween  emergence  and  heading  to  improve  the 
model's  accuracy.  In  addition  to  the  multipliers, 
another  control  was  introduced  to  the  model  to 
prohibit  crop  calendar  advancement  beyond  stage 
2.85  before  January  1 to  prevent  the  model  from  pre- 
dicting jointing  before  spring  green-i  ■?. 

At  the  end  of  January  1977,  a significant  error  in 
the  LACIE  crop  calendar  algorithm  was  identified. 
The  problem  stemmed  from  an  ambiguity  in 
Robertson's  original  paper  on  the  crop  calendar, 
which  led  to  an  error  in  the  technique  used  in  the 
LACIE  model  to  eliminate  “negative”  growth.  When 
temperatures  were  very  low  or  day  lengths  short,  the 
LACIE  model  erroneously  allowed  the  development 
to  continue  during  the  emergence  to  jointing  stage. 
As  a result  of  the  model's  error,  operations  were  tem- 
porarily suspended.  After  the  ACC  program  was 
changed  to  incorporate  the  corrected  algorithm,  the 
winter  wheat  estimates  were  restarted  from  the  fall 
planting  dates  and  run  through  the  winter.  New  esti- 
mates for  the  United  States,  the  U.S.S.R.,  and  the 
P.R.C.  arrived  at  JSC  during  the  middle  of  March 
and  continued  to  be  acquired  on  terminal  via  a TSO 
on  the  scheduled  biweekly  basis  for  the  remainder  of 
the  growing  period.  The  P.R.C.  was  soon  dropped 
from  further  LACIE  Phase  III  analysis  by  a project 
directive  changing  the  scope  of  Phase  III  operations. 
Output  of  the  adjustments  continued  to  be  provided 
in  isoline  map  format. 

The  spring  wheat  starter  model  was  once  again 
used  to  begin  operations  in  the  spring  wheai  areas  of 


the  United  States,  the  U.S.S.R.,  and  Canada.  By  the 
middle  of  July,  it  was  obvious  that  the  model  esti- 
mates for  the  U.S.S.R.  east  of  the  Ural  Mountains 
were  running  possibly  3 weeks  ahead  of  the  develop- 
ment stages  as  determined  from  the  imagery  ac- 
quired from  Landsat.  Agricultural  practice  in  the 
U.S.S.R.  is  to  delay  planting  in  these  “New  Lands" 
areas  until  the  end  of  May,  regardless  of  weather  con- 
ditions. Thus,  the  start  dates  derived  from  the  spring 
wheat  starter  model  proved  to  be  several  weeks  early. 
To  obtain  revised  estimates  for  this  spring  wheat 
region,  the  ACC  mode  was  operationally  restarted 
using  June  5 development  stage  estimates  for  15 
meteorological  stations.  The  restart  development 
stage  estimates  were  agreed  on  jointly  by  the  YES 
managers  and  the  Classification  and  Mensuration 
Subsystem  (CAMS)  team  of  U.S.S.R.  analysts.  These 
were  subjectively  obtained  from  planting  informa- 
tion contained  in  USDA  Foreign  Agricultural  Service 
(FAS)  reports  from  the  region  and  from  correspond- 
ing Landsat  imagery.  By  the  end  of  July,  the  revised 
crop  calendar  updates  were  processed  and  delivered 
to  the  users  at  JSC. 

A second  form  of  crop  calendar  output  was  made 
available  by  means  of  a gridded  format  at  the  seg- 
ment basis,  whereby  the  crop  calendars  produced  by 
NO  A A were  extended  to  the  LACIE  sample  seg- 
ments (refs.  7 and  8). 


PERFORMANCE  RESULTS 


Phnol 

Approximately  every  18  days,  ground-truth  data 
were  recorded  on  forms  at  specified  intensive  test 
sites  (ITS's)  by  personnel  from  the  USDA 
Agricultural  Stabilization  and  Conservation  Service 
and  mailed  to  JSC  (fig.  1).  Only  data  from  six  winter 
wheat  and  six  spring  wheat  sites  were  available  for 
Phase  I analyses.  The  development  stage  of  wheat  at 
each  ITS  was  obtained  by  converting  the  growth 
stage  reported  at  the  fields  within  the  site  to  the 
LACIE  version  of  the  Robertson  biometeorological 
time  scale  (BMTS)  noted  in  table  I and  then  simply 
averaging  these  numbers.  Graphic  comparisons  of 
the  differences  between  the  ACC  estimate,  the 
historical  calendar  for  that  particular  CRD.  and  the 
ITS  ground  truth  were  then  made.  Such  a comparison 
is  shown  in  figure  2 for  the  ITS  in  Finney  County. 
Kansas. 
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The  first  step  in  the  analysis  was  to  determine  the 
difference  in  days  between  the  observed  and  the 
computed  data.  Tables  of  delta  days  were  con- 
structed using  a plus  sign  to  indicate  that  the  model 
date  was  earlier  than  the  observed  date  (i.e.,  model 
fast)  and  a minus  sign  tc  reflect  a model  date  that 


Tabu:  l.— Robertson  BMTS  and  Observed  ITS  Wheat 
Phenological  Stages 


Stage 

BMTS 

ITS 

PeuriptHm 

Planled 

to 

01 

Planted 

02 

Planted,  no  emergence 

Emergence 

2.0 

03 

Emergence 

Jointing 

30 

04 

Tillering,  prebooung,  prebud- 

ding 

3.5 

05 

Booted  or  budded 

Heading 

40 

06 

Beginning  to  bead  or  flower 

45 

07 

Fully  headed  or  flowered 

Sufi  dough 

5.0 

OS 

Beginning  to  ripen 

Ripening 

60 

00 

Ripe  to  mature 

llarve*! 

70 

to 

Harvest 

was  later  than  the  observed  date  (i.e.,  model  stow).  A 
similar  comparison  was  made  of  the  historical  crop 
calendar  and  the  development  observed  at  the 
various  ITS's  in  an  effort  to  determine  whether  the 
model  estimates  provided  more  realistic  information 
than  the  normals. 

The  bias  at  each  stage  was  computed  by  the 
following  expression: 


“AS,  - i i (ITS,  - ACC,) 
<•» 


where  n - number  of  ITS’s 

ITS,  - date  when  an  ITS  reported  that  50 
percent  of  the  crop  was  at  stage  /, 
where/  - 3. 3.5. 4.0. 4.5. 5.0.  and  6.0 
ACC,  - date  when  the  ACC  model  reached  a 
particular  stage./ 

Similarly,  by  substituting  the  historically  averaged 
crop  calendar  for  the  ACC  curve,  one  can  use  the  ex- 
pression to  determine  another  bias,  which  becomes 


WAS,  • i £(lTS,  HIST,) 


where  HIST,  — date  that  the  crop  normally  reached 
a particular  stage  / (from  historical  average). 

The  standard  deviation  (SD)  and  the  root  mean 


.4  ■ Jk  A. mJk -J t ■ , - - » 

a ia  » * u <•  h * u »»  J4  t « mi 

MA¥  ju*|  >OW 


HU  lit  2.—  HltloricaJ.  uWnrd,  wt  predicted  crop  calendar 
Magee  tur  « Inter  wheat  In  Finney  Cuunt*.  Kaunu,  1*74-75. 
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square  error  (RMSE)  were  calculated  for  each  di*ta 
set  by  the  following  formula: 


i 

_ 2 

- (Itt  - amJ1  ) 


I 


RMSE -[if  (ITS, 


The  computed  bias  could  have  been  due  to  several 
factors,  such  as  model  errors,  differences  in  the  stage 
definitions  from  one  locale  to  another,  and  observer 
errors.  The  values  of  the  bias,  SD,  and  RMSE  for 
both  model  estimates  versus  ITS  observed  data  and 
historical  data  versus  ITS  data  are  given  in  uble  II. 
The  biases  and  SD's  for  winter  and  spring  wheat 
were  plotted  for  the  crop  development  stages  of 
heading  and  soft  dough  and  are  shown  in  figures  3 
through  6.  The  crop  calendar  results  at  heading  were 
chosen  for  closer  examination  because,  at  that  partic- 
ular development  stage,  very  little  ambiguity  about 
its  definition  existed  among  ground  observers.  The 
analyses  at  soft  dough  were  also  scrutinised  more 
closeiy  because  the  general  color-infrared  charac- 
teristics of  wheat  change  from  red  to  orange  shortly 
after  reaching  this  stage  and  are  readily  distinguish- 
able on  Landsat  imagery. 


Table  II.— Comparison  of  LACIE  Phase  I ACC  and  Historical  CRD  Calendars  With  Observed 
Development  Stages  In  the  U.S.  1974-7'  Winter  and  1975  Spring  Wheat  ITS's 


ITS 

I county,  i tote) 

ACC  v*.  ITS.  dayt 

Hlttortcal  n.  ITS.  day » 

Jointing 

Heading 

Soft 

dough 

Ripe 

Jointing 

Heading 

Soft 

dough 

Ripe 

JO 

J.5 

4.0 

4.5 

5.0 

60 

JO 

5.5 

4.0 

4.5 

5.0 

6.0 

Winter  wheat 

Deaf  Smith.  Tex. 

lal 

3 

3 

7 

-3 

5 

IP 

15 

18 

14 

12 

Oldham,  Tex. 

<M 

1 

0 

4 

-3 

-2 

5 

12 

14 

13 

9 

Randall.  Tex. 

-4 

-II 

-8 

0 

0 

0 

6 

9 

11 

10 

8 

Morton.  Kan*. 

-12 

-9 

-8 

-2 

3 

19 

10 

1 

5 

II 

3 

Finney.  Kan*. 

-23 

-13 

-10 

0 

4 

1 

2 

3 

5 

8 

6 

Rice.  Kan*. 

(a) 

6 

s 

7 

0 

-16  -II 

-4 

2 

9 

7 

But  day* 

-38 

-30 

2.7 

02 

12 

3.7 

6.0 

9.2 

10.8 

7.5 

SD.day* 

8.1 

6.5 

3.9 

29 

11.3 

7,8 

7.2 

6.2 

2.3 

30 

RMSE.  day* 

S3 

6.6 

44 

2.7 

10.4 

80 

89 

10.8 

110 

80 

Spring  wheal 

Burke.  N.  Dak. 

2 

1 

-1 

1 

3 

8 

23 

18 

12 

9 

8 

II 

William*.  N.  Dak. 

-13 

-10 

-5 

-2 

0 

6 

7 

8 

7 

8 

4 

9 

Hill.  Mont 

S 

2 

1 

2 

2 

-2 

23 

10 

3 

3 

7 

9 

Liberty.  Mont. 

s 

S 

8 

8 

9 

17 

26 

14 

10 

11 

13 

27 

Toole.  Mont. 

-6 

-8 

-10 

0 

8 

25 

12 

-1 

-8 

0 

13 

40 

Polk.  Minn. 

-1 

-3 

-5 

-4 

0 

3 

15 

6 

0 

5 

10 

14 

Bias,  day* 

-OS 

-1.7 

-20 

08 

3.7 

95 

17.7 

92 

40 

60 

92 

183 

SD.  day* 

7.7 

6.7 

62 

4.1 

39 

99 

7.5 

6.6 

7.4 

41 

36 

126 

RMSE.  days 

7.1 

6.4 

60 

3.9 

5 1 

12.4 

18.9 

11.0 

78 

71 

9.7 

216 

• ACC  data  not  available 
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From  the  limited  data  aet,  it  an  be  wen  that  the 
model  development  estimates  were  closer  to  those 
observed  at  the  ITS's  than  were  the  historial  (nor* 
mal)  curves.  This  statement  held  true  at  each 
development  stage  for  both  spring  and  winter  what. 
At  soft  dough,  the  model  estimates  were  signiflantly 
better  (at  the  l*percent  level)  than  the  historial 
calendars  for  both  winter  and  spring  what. 

The  model  for  winter  what  averaged  about  4 days 
behind  the  ITS's  at  heading,  was  3 days  ahad  at  soft 
dough,  and  then  approached  ground  truth  at  ripe.  A 
similar  trend  was  noted  for  spring  what,  except  that 
the  model  values  were  almost  !0  days  arly  at  ripe 
because  of  the  large  differences  at  Liberty  and  Toole, 
Montana. 

Much  of  the  further  testing  and  evaluation  of  the 
ACC  originally  scheduled  for  fall  1975  and  winter 
1976  was  delayed  because  resources  were  diverted  to 
the  development  of  a winter  what  ratart  model  and 
a spring  what  starter  model  and  the  support  of  test- 
ing, evaluation,  and  implementation  of  yield  models. 
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Phase  II 

The  performance  of  the  ACC  model  was 
monitored  for  the  United  States,  the  U.S.S.R., 
Canada,  and  the  P.R.C.  This  tracking  effort  enabled 
YES  to  evaluate  the  reliability  of  the  ACC  estimates 
as  an  in-house  tool  for  quality  control  and  served  as 
an  indiator  to  flag  regions  for  which  the  estimates 
varied  widely  from  those  observed  using  ground- 
truth  or  Landsat  imagery  dau. 

The  ground-truth  ITS  network  was  expanded  in 
Phase  II  to  include  26  ITS's  in  the  United  States  and 
an  additional  1 1 ITS's  in  Canada  (figs.  7 and  8).  The 
: <tme  analysis  procedure  used  in  Phase  I was  con- 
tinued in  Phase  II.  Comparisons  were  made 
throughout  the  growing  season  between  ITS  ground 
observations  and  both  the  crop  calendar  adjustments 
and  the  historical  calendars.  A sample  plot  of  both 
the  ACC  estimate  and  the  historical  curve  versus  the 
development  stages  as  reported  in  the  Ellis,  Kansas. 
ITS  ground-truth  report  is  given  in  figure  9.  Tables  III 
through  V summarize  these  comparisons  and  pre- 
sent the  model  bias  (in  days),  the  SD's,  and  the 
RMSE's  at  various  stages  of  development  at  these  37 
winter  and  spring  what  sites  in  the  United  States 
and  Canada.  The  sign  convention  remains  the  same 
as  that  used  in  Phase  I. 

From  these  data,  it  can  be  observed  that  for  winter 
what,  the  model  estimates  were  closer  to  the  ITS 
data  at  jointing  (-3  days  compared  to  the  historical 
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fHil  lll:  5. — Comparison  of  |i,S.  winter  wheat  ITS  observations 
with  CRD  historical  calendars  and  LACTIC  adjustments  at  soft 
dough. 
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H(,l  HI  M, — Observed  and  predicted  crop  calendar  stages  for 
winter  wheat  in  Fill*  County,  Kansas,  1975-76. 
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Table  Hi— Comparison  of  LACIE  Phase  II  ACC  and  Historical  CRD  Calendars  With  Observed 
Development  Stages  in  the  1975-76  U.S,  Winter  Wheat  ITS's 


ITS 

(county,  state ) 

ACC  vs.  ITS,  days 

Historical  as.  ITS,  days 

Jointing 

Heading 

Soft 

dough 

Ripe 

Jointing 

Heading 

Soft 

dough 

Ripe 

s.o 

IS 

4.0 

4.5 

SO 

6.0 

JO 

J.S 

4.0 

4.5 

S.0 

6.0 

Deaf  Smith,  Tex. 

-4 

-14 

-18 

-II 

-4 

-7 

-10 

0 

7 

15 

17 

8 

Oldham.  Tex. 

14 

0 

-3 

3 

-1 

-5 

10 

18 

21 

31 

22 

II 

Randall,  Tex. 

-4 

-10 

-16 

-10 

-2 

-7 

-9 

0 

7 

14 

17 

8 

Finney,  Kans. 

-3 

-8 

-16 

-7 

-6 

-9 

-9 

7 

1 

9 

9 

0 

Rice,  (Cans. 

(a) 

(a) 

-17 

-12 

-5 

-3 

(a) 

(a) 

-8 

-1 

5 

2 

Ellis,  Kans. 

(a) 

-15 

-17 

-12 

-7 

-3 

-2 

-2 

0 

3 

6 

5 

Saline,  Kans. 

-6 

-16 

-18 

-14 

-6 

-3 

-24 

-IS 

-9 

-2 

4 

2 

Morton.  Kans. 

0 

-1 

-4 

0 

0 

1 

0 

9 

II 

15 

17 

IS 

Boone.  Ind. 

1 

-9 

-11 

-6 

-8 

-8 

-17 

-14 

-9 

-5 

-12 

-10 

Madison.  Ind. 

0 

-9 

-11 

-6 

4 

4 

-17 

-II 

- 7 

-3 

2 

2 

Shelby.  Ind. 

24 

4 

-9 

-14 

-13 

-7 

2 

-3 

- 7 

-12 

-17 

-10 

Bannock,  Idaho 

8 

3 

-3 

-8 

0 

-1 

II 

12 

7 

-4 

-8 

-20 

Franklin,  Idaho 

(a) 

6 

-4 

-11 

6 

4 

(a) 

15 

8 

-5 

-3 

-11 

Oneida,  Idaho 

(a) 

9 

0 

-5 

-3 

12 

(a) 

18 

12 

3 

-8 

-7 

Whitman  (1).  Wash. 

0 

25 

14 

12 

27 

30 

-3 

36 

26 

18 

28 

20 

Whitman  (2).  Wash. 

-2 

0 

-8 

2 

16 

11 

-2 

II 

5 

-2 

13 

2 

Whitman  (3),  Wash. 

0 

8 

6 

7 

27 

29 

-5 

20 

17 

9 

28 

21 

Hill.  Mont.** 

-18 

0 

-1 

-6 

5 

7 

-24 

-7 

-16 

-16 

-5 

0 

Liberty.  Mont  ** 

-22 

0 

1 

7 

6 

-4 

-25 

-6 

-6 

1 

1 

0 

Toole.  Mont.** 

-24 

-9 

-14 

-14 

-7 

-12 

-23 

-6 

-15 

-15 

-4 

1 

Ham!  (I),S.  Dak.** 

-12 

0 

-6 

-8 

- } 

-6 

-20 

2 

-7 

-6 

-3 

-13 

Hand  (2).S.  Dak.** 

-8 

-10 

-4 

0 

-4 

-6 

-17 

-9 

-6 

-3 

-6 

-13 

Bias,  days 

-31 

-2.2 

-7.2 

-5.1 

1.0 

08 

-9.7 

3.6 

1.5 

2.0 

4.7 

0.6 

SO,  days 

11.7 

9.8 

8.5 

7.6 

104 

112 

11.4 

13.1 

IIS 

11.5 

12.8 

10.8 

RMSU.days 

11.8 

9.8 

1 1.0 

9.0 

10.3 

11.0 

14.7 

13.3 

11.4 

114 

11.3 

10.6 

*ITS  data  not  available 

^ Acquired  as  spring  wheal  ITS  bul  contained  numerous  wmict  wheat  Helds 


mean  of  - 10  days)  and  at  soft  dough  ( I day  versus  5 
days),  whereas  the  historical  crop  calendars  were  sig- 
nificantly more  accurate  (at  the  5-pcrcent  level)  at 
heading  (I  day  versus  -7  days).  The  winter  wheat 
model  estimates  were  approximately  2 to  2.5  weeks 
slow  at  heading  in  the  Central  and  Southern  U.S. 
Great  Plains  states  but  were  generally  less  than  1 
week  off  at  ripe.  At  the  other  1 1 ITS's  outside  Texas, 
Kansas,  and  Indiana,  the  magnitude  of  the  model 
versus  ITS  dates  at  heading  was  usually  less  than  I 
week.  Generally,  except  for  Whitman  County,  Wash- 
ington, the  model  estimates  were  behind  those  ob- 
served at  the  ITS's.  The  Whitman  County  ITS's 


received  heavy  rains,  which  apparently  tended  to 
slow  the  development  rate  of  the  wheat  plant. 

Tables  IV  and  V show  that  the  spr;  * wheat 
model  generally  ran  fast,  except  for  Hand  County. 
South  Dakota;  Dawson  Creek.  British  Columbia;  and 
Olds.  Alberta.  The  U.S.  and  Canadian  spring  wheat 
ITS's  were  divided  into  separate  groups  for  analysis, 
figures  4,  b,  10.  and  1 1 present  the  model's  accuracy 
results  at  heading  and  at  soft  dough. 

The  spring  wheat  starter  model  was  the  mecha- 
nism by  which  the  crop  calendar  model  was  begun  in 
the  spring  wheat  regions.  Naturally,  the  accuracy  of 
the  planting  dates  to  a great  extent  determined  the 
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Devt'hpment  Slopes  in  the  / 07 6 US.  Spring  Wheat  ITSs 


IIS 

(coinin',  aaiei 

4<V 

iv.  /rv  lAn* 

ttoiomal  i'.v.  ITS.  lAnv 

Jointing 

Heiiitmg 

Soli 

dough 

Ripe 

Jointing 

Heading 

Soli 
< lough 

Ripe 

tt) 

3,5 

4,0 

4.5 

50 

no 

30 

.35 

4.0 

45 

.SO 

no 

lltuul  tD.S  Dak 

II 

-II 

-15 

-13 

-7 

-Id 

-7 

-17 

-22 

- Ift 

-10 

-15 

Hand  (2),S,  Dak 

-u 

-II 

-15 

-17 

-20 

-24 

-d 

-17 

-21 

-19 

-18 

-20 

Burke.  N Dak 

d 

IR 

15 

14 

34 

V» 

• 4 

8 

12 

10 

d 

12 

7 

Divide.  N.  Dak. 

10 

R 

7 

10 

22 

24 

8 

7 

ft 

7 

13 

d 

Williams.  N.  Dak 

d 

10 

.t 

,t 

l.t 

R 

d 

ft 

0 

-1 

4 

-5 

Hill,  Mom 

-l.t 

- 14 

-5 

0 

II 

12 

-21 

-27 

-23 

-1ft 

-4 

1 

Liberty,  Mont, 

0 

7 

1 

- j 

1 

14 

-10 

-4 

-15 

-15 

-12 

4 

Toole,  Mont 

17 

l.t 

R 

13 

17 

25 

10 

4 

-3 

2 

7 

14 

IVIk,  Minn. 

(a) 

(a) 

ta> 

-13 

3 

2 

ta> 

(a) 

(a) 

-II 

5 

5 

Bias,  days 

10 

7 5 

-0.1 

-0.4 

7 1 

7.1 

-15 

-4  5 

-8.5 

-6.4 

-0  5 

0 

SD.  days 

17  0 

17  5 

108 

113 

14.3 

17  d 

ll  7 

142 

13  3 

108 

11  1 

II  5 

KMSt  .days 

II  : 

17  0 

101 

10  7 

15  3 

184 

III 

140 

15.1 

12.0 

105 

105 

7t  HI  t l , — Comparison  of  LAL'IE  Phase  II  ACC  and  Historical  CRD  Calendars  With  Observed 
Dewlopment  Stages  in  the  IV'b  Canadian  Spring  Wheat  ITSs 


It'S  ■!('('  iv  //X,  */t m Historical  \ % //X  days 


l (<>««.  pnonnel 

Join  ling 

HcaJn ig 

Son 

.hnigh 

Ripe 

Jointing 

Heading 

Soli 

tfangh 

Ripe 

70 

4.0 

45 

.SO 

6.0 

30 

3 5 

4.0 

45 

50 

no 

lort  Saskatchewan. 

14 

N 

-3 

-7 

_ A 

3 

7 

4 

0 

-8 

-8 

-d 

Alberta 

Olds.  Alberta 

8 

— > 

- g 

14 

- 12 

8 

-10 

— 7 

-ll 

1 ethbrivlge.  Alberta 

17 

8 

<» 

0 

12 

15 

10 

II 

n 

12 

Id 

18 

Dawson  Creek. 

-*) 

12 

17 

- 14 

-12 

-l 

5 

3 

8 

5 

British  Columbia 

Sionv  Mountain, 

II 

10 

s 

1 

d 

ft 

13 

10 

3 

1 

d 

- 1 

Manitoba 

Starhuck.  Manitoba 

...  > 

g 

ft 

> 

8 

4 

0 

d 

3 

1 

8 

-i 

Allona,  Manitoba 

-1 

i 

0 

i 

„ 8 

. ^ 

7 

4 

1 

0 

- 4 

Deltsle.  Saskatchewan 

g 

7 

jt 

1 

tai 

12 

14 

14 

13 

12 

(a) 

Swift  Current. 

6 

3 

• 8 

8 

(5 

-d 

-3 

-5 

8 

d 

-7 

Saskatchewan 

Torquay , 

3 

{5 

4 

i 

8 

-II 

7 

d 

ft 

3 

0 

d 

Saskatchewan 

Mellon, 

II 

10 

5 

i 

d 

ft 

3 

- 3 

-3 

3 

3 

-1 

Saskatchewan 

Bias,  days 

5ft 

7ft 

•05 

l 5 

18 

0 3 

40 

6 2 

2 8 

7 3 

4ft 

12 

SD.  days 

*» 

ft  ft 

68 

80 

1 7 

d 3 

ft  8 

5 5 

58 

7 1 

8 3 

8.7 

KMSL,  days 

8 2 

ft  8 

ft  .5 

7.7 

7ft 

88 

7 ft 

8 1 

6 2 

7.2 

d.2 

84 
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HGIRK  )(>.— (.'omparkon  of  C anadian  spring  wheal  ITS  obser- 
vations with  CRD  historical  calendars  and  LACIK  adjustments 
at  heading. 


reliability  of  the  models  performance.  For  winter 
wheat,  dormancy  had  a tendency  to  neutralize  the 
effect  of  early  or  late  plantings  on  the  crop  calendar 
model  estimates  because  fields  in  the  same  area  tend 
to  emerge  from  dormancy  at  the  same  time. 
However,  the  application  of  incorrect  spring  wheat 
planting  dates  to  the  ACC  model  will  lead  to  errors 
that  tend  to  propagate  throughout  the  remainder  of 
the  growing  season.  In  the  United  States,  the  starter 
model  predicted  seeding  in  North  Dakota  about  10  to 
15  days  earlier  than  was  reported  at  the  three  ITS's. 
whereas  the  starter  model  predicted  planting  in 
South  Dakota  as  much  as  15  days  behind  the  ground- 
observed  planting  dates.  The  erroneous  model-gener- 
ated planting  dates  for  these  two  states  that  were 
used  to  begin  the  ACC  model  thus  introduced  errors 
into  the  model  estimates  that  did  indeed  remain 
through  the  ripe  stage. 

The  severe  drought  at  the  two  Hand  County, 
South  Dakota,  ITS’s  forced  the  wheat  to  mature  more 
rapidly  and  probably  explains  the  model’s  lack  of 
response  to  the  real  situation.  Figure  12  illustrates 
some  of  the  inadequacies  associated  with  the  LACJE 
use  of  the  model;  namely,  that  the  model  contains  no 
moisture  variable  and  the  temperatures  input  to  the 
model  do  not  represent  true  canopy  temperatures. 

In  the  U.S.  spring  wheat  region,  the  model  per- 
formed better  overall  than  the  historical  crop  calen- 
dars at  the  heading  stage,  whereas  the  reverse  was 
true  at  the  soft  dough  and  ripe  stages. 


In  Canada,  the  model  performed  well  as  biases 
varied  from  -1.5  to  16  days  over  the  development 
stages  being  examined;  RMSE’s  ranged  from  6.5  to 
8.8.  However,  the  historical  crop  calendars  also 
proved  to  be  fairly  reliable,  with  biases  ranging  from 
1.2  to  6.2  days  and  RMSE's  from  6.2  to  9.2.  For 
Phase  II,  no  significant  difference  between  the  model 
and  ITS  estimates  and  the  historical  and  ITS  esti- 
mates was  noted. 

In  the  winter  of  1976-77,  a joint  study  was  con- 
ducted by  YES  and  CAMS  personnel  to  assess  the 
feasibility  of  using  CAMS  crop  calendar  evaluations 
from  Landsat  imagery  to  update  and  correct  the 
YES-generaied  ACC's.  Phase  II  imagery  containing 
emergence  and  soft  dough  information  for  the  state 
of  Kansas  was  used  for  the  test  area.  The  results  of 
the  study  indicated  that  in  Kansas  for  that  particular 
growing  season,  use  of  CAMS  start  information  did 
not  significantly  improve  the  YES-generated  calen- 
dars (ref.  9). 

No  ground-truth  reports  were  available  in  the 
U.S.S.R.  and  the  P.R.C.,  but  the  ACC  estimates  at 
sample  segment  locations  were  compared  to  the  cor- 
responding growth  stage  numbers  as  reported  by  the 
analysis  on  the  CAMS  evaluation  forms.  Some 
minor  discrepancies  were  noted  in  the  far  south- 
eastern winter  wheat  areas  of  the  U.S.S.R.,  but  the 
ACC  model  performed  satisfactorily  for  the  major 
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wheal  areas  of  the  U.S.S.R.  In  the  P.R.C.,  where 
wheat  fields  are  often  very  small,  it  was  extremely 
difficult  to  estimate  development  stage  numbers 
from  Landsat  imagery.  As  a result,  no  miyor  effort 
was  focused  on  accuracy  of  the  model  in  the  P.R.C 


Phase  II 

The  tracking  elTort  of  comparing  both  the  ACC 
model  estimates  and  the  historical  values  with  those 
development  stages  reported  at  the  ITS’s  was  con- 
tinued for  Phase  III.  The  ground  truth  consisted  of 
data  from  22  ITS's  throughout  the  United  States  and 
the  same  11  Canadian  ITS's  used  in  Phase  11.  The 
results  of  these  comparisons  for  U S.  winter  and 
spring  wheat  and  Canadian  spring  wheat  are  given  in 
tables  VI  to  VIII  and  in  figures  .1  to  6.  10,  and  11. 


KIM  RK  12.— Crop  calendar  error  for  Phase  II  spring  wlu-al  in- 
tensin'  leM  sties. 


Tabu  17. — Comparison  ofLACIE  Phase  III  ACC  ami  Historical  CRD  Calendars  With  Observe .. 
Development  Stages  in  the  IV7f>  - 77  U.S.  Winter  Wheat  ITS's 
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Table  VU.— Comparison  of  LACIE  Phase  111  ACC  and  Historical  CRD  Calendars  With  Observed 
Development  Stages  in  the  1977  U.S.  Spring  Wheat  ITS’s 


ITS 

( count y.  stale } 

ACC  vs.  ITS,  days 

Historical  vs.  ITS,  days 

Jointing 

Heading 

Soft 

dough 

Ripe 

Jointing 

Heading 

Soft 

dough 

Ripe 

JO 

J.5 

4.0 

4.5 

5.0 

6.0  * 

J.O 

J.5 

4.0 

4.5 

5.0 

6.0 

Hand(l),S.  Dak. 

-10 

-5 

-2 

-8 

1 

5 

-11 

-17 

-20 

-15 

-8 

-3 

Hand  (2),S.  Dak. 

-10 

-8 

-2 

-3 

-3 

-3 

-10 

-19 

-19 

-14 

-11 

-10 

Burke,  N.  Dak. 

(a) 

(a) 

(a) 

(a) 

22 

21 

(a) 

(a) 

(a) 

(a) 

8 

2 

Williams,  N.  Dak. 

0 

5 

2 

4 

12 

to 

-6 

-4 

-10 

-9 

-3 

-8 

Hill.  Mont. 

10 

12 

6 

6 

15 

14 

-10 

-14 

-22 

-20 

-9 

-5 

Liberty,  Mont. 

19 

22 

19 

11 

27 

34 

3 

0 

-7 

-8 

11 

20 

Toole,  Mont. 

2 

0 

-1 

6 

12 

17 

-6 

-11 

-17 

-8 

4 

15 

Polk,  Minn. 

— t 

- 5 

-2 

6 

8 

5 

-19 

-21 

-20 

-5 

0 

-3 

Bias,  days 

0.6 

3.0 

2.9 

3.1 

11.8 

12.9 

-8.4 

-12.3 

-16.4 

— 11.3 

-1.0 

1.0 

SD,  days 

10.9 

10.8 

7.7 

6.4 

10.0 

11.4 

6.7 

7.8 

5.7 

5.2 

8.2 

10.9 

RMSE.days 

10.1 

10.5 

7.7 

6.7 

15.0 

16.7 

10.4 

14.3 

17.3 

12.3 

7.7 

10.2 

41TS  data  not  available 


Table  Vlll. — Comparison  of  LACIE  Phase  III  ACC  and  Historical  CRD  Calendars  With  Observed 
Development  Stages  in  the  1977  Canadian  Spring  Wheat  ITS's 


ITS 

(tow.  province ) 


ACC  is.  ITS.  davs 


Historical  is.  ITS  days 


Jointing 

Heading 

Soft 

dough 

Ripe 

Jointing 

Heading 

Soft 

dough 

Ripe 

J.O 

J.5 

4.0 

4.5 

5.0 

6.0 

J.O 

J.5 

4.0 

4.5 

5.0 

6.0 

Fort  Saskatchewan. 

-1 

0 

-7 

-11 

4 

-19 

-9 

-6 

-6 

-6 

4 

-4 

Alberta 

Olds.  Alberta 

10 

7 

4 

3 

14 

(a) 

0 

2 

2 

1 

23 

(a) 

Lethbridge. 

12 

13 

10 

9 

7 

0 

-2 

0 

-3 

0 

2 

-10 

Alberta 

Dawson  Creek. 

-3 

2 

-3 

-5 

-6 

4 

0 

4 

5 

7 

8 

5 

British  Columbia 

Stony  Mt„  Manitoba 

6 

3 

1 

2 

3 

4 

-3 

-7 

-9 

-5 

-2 

-5 

Starbuck,  Manitoba 

4 

0 

-3 

-3 

0 

5 

-4 

-8 

-12 

-10 

-6 

-8 

Altona,  Manitoba 

3 

-1 

-8 

-9 

-6 

-6 

0 

-8 

-14 

-12 

-11 

-17 

Delisle,  Saskatchewan 

11 

5 

0 

10 

8 

10 

7 

5 

2 

14 

13 

14 

Swift  Current. 

9 

5 

-4 

7 

4 

0 

2 

-4 

-10 

4 

3 

0 

Saskatchewan 

Torquay, 

7 

3 

-2 

-2 

1 

6 

0 

-4 

-8 

-6 

-3 

-7 

Saskatchewan 

Melfort.  Saskatchewan 

9 

9 

7 

6 

12 

-7 

0 

0 

0 

0 

ft 

-3 

Bias,  days 

6.1 

4.2 

-0.5 

0.6 

3.7 

-0.3 

-0.8 

-2.4 

-4.8 

-1.2 

3.4 

-3.5 

SD.  days 

4.9 

4.2 

5.6 

7.2 

6.4 

8.4 

4.0 

4.8 

64 

7.7 

93 

85 

RMSE.  days 

7.7 

5.8 

5.4 

6.9 

7.2 

8.0 

3.9 

5.1 

7.8 

7.4 

9.5 

8.8 

a1fS  dala  not  avuUblc. 
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From  these  results,  it  appears  that  increased  ac- 
curacy was  obtained  in  generating  winter  wheat  esti- 
mates using  the  scalar  multipliers.  At  heading,  the 
average  model  estimates  were  less  than  2 days 
behind  the  values  observed  at  the  winter  wheat  ITS's 
and  the  SD  was  about  6 days.  At  14  of  the  17  sites,  it 
was  noted  that  the  magnitude  of  the  ACC  values 
differed  from  the  ground-observed  estimates  by  less 
than  1 week.  At  this  same  development  stage,  the 
corresponding  historical  calendars  averaged  about  3 
days  slower  than  the  ITS's.  with  an  SD  of  9.  Overall 
for  winter  wheat,  the  magnitude  of  the  model  versus 
ITS  biases  and  SD's  differed  from  the  historical  ver- 
sus ITS  by  2 days. 

In  the  U.S.  spring  wheat  region,  the  average  ACC 
estimates  were  ahead  of  the  ground-truth  estimates 
for  the  entire  development  of  the  plant.  The 
difference  between  the  two  estimates  was  smallest  at 
jointing  (0.6  day)  and  generally  increased  as  the  crop 
progressed  toward  maturity  to  a value  of  almost  13 
days  at  ripe.  For  the  historical  versus  ITS,  the 
average  historical  values  were  approximately  8 days 
later  than  the  ITS  at  jointing,  regressed  further  at 
heading  to  16  days  behind,  and  then  approached  the 
ground-observed  values  at  soft  dough  and  ripe.  At 
heading,  the  historical  estimates  were  significantly 
different  from  0 at  the  1 -percent  level,  whereas  at 
soft  dough,  the  model's  ACC's  were  significantly 
different  from  0 (at  the  5-perccnt  level). 

During  Phase  111,  the  effects  of  the  extended 
drought  in  the  northern  intermountain  and  western 
regions  were  still  being  felt  in  Montana  and  to  a 
lesser  degree  in  North  Dakota.  Planting  was  delayed 
at  numerous  fields  in  these  two  states,  especially  at 
the  Liberty.  Montana,  ITS.  The  spring  wheat  starter 
model  did  not  account  for  these  deferred  plantings 
and  thus  generated  early  planting  dales  for  these 
states.  An  abundance  of  rain  fell  in  July  after  the 
wheat  had  headed,  and  the  moisture  tended  to  slow 
the  crop’s  actual  development  The  model  did  not 
respond  to  this  slower  development  rate:  thus,  in 
Montana  and  North  Dakota,  it  advanced  still  further 
ahead  of  the  ITS  values  as  the  crop  proceeded  toward 
the  ripe  stage.  At  the  same  time,  the  combined 
historical  calendars,  which  had  averaged  some  16 
days  behind  ground  truth  at  the  heading  stage,  were 
within  a day  of  the  observed  value  at  the  ripe  stage. 

In  Canadian  spring  wheat  areas,  the  average 
model  estimates  were  ahead  of  ground  observations 
by  4 to  6 days  at  jointing  and  by  4 days  at  soft  dough. 
There  was  little  difference,  on  the  average,  between 
the  observed  and  the  predicted  values  at  heading  and 


ripe.  SD's  varied  between  4 days  at  midjointing  to  8 
days  at  ripe.  The  historical  calendars  also  proved  to 
be  close  to  the  ground-observed  values,  with  average 
differences  at  the  various  stages  ranging  from  5 days 
behind  (at  heading)  to  3 days  ahead  (at  soft  dough). 
Corresponding  SD's  varied  from  4 days  at  jointing  to 
9 days  at  soft  dough.  At  the  5-pcrcent  level,  the  nor- 
ma1 calendar  at  the  development  stage  of  heading 
was  significantly  different  from  0. 

Variations  in  the  wheat  development  observed 
within  the  Phase  III  ITS's  were  computed  according 
to  the  following  equation: 


Average  SD  = 


Xci 

. XX 


I 


where  m,  = number  of  fields  within  the  rth  ITS 

.v.  * standard  deviation  of  fields  within  the 
Ah  ITS 

These  SD's  were  computed  at  several  develop- 
ment stages  for  the  winter  wheat  ITS's,  U.S.  spring 
wheat  ITS's.  and  their  Canadian  counterparts.  These 
results  are  summarized  in  table  IX.  The  deviations 
were  smaller  for  the  winter  wheal  sites  than  for  the 
spring  wheat  sites,  thus  reflecting  the  neutralizing 
effect  of  dormancy  on  variations  due  to  early  and 
late  plantings.  For  spring  wheat,  the  variations  of 
development  within  an  ITS  because  of  early  and  late 
planted  fields  generally  continue  to  increase  through 
the  crop  season.  The  approximate  number  of  days  as- 
sociated with  each  stage's  deviation  is  enclosed  in 
parentheses.  Thus,  at  heading,  for  example,  the 
average  SD  within  winter  wheat  ITS's  was  about  6 
days,  whereas  for  spring  wheat  ITS's,  it  was  approx- 
imately 9 days. 


Test  of  Applicability 
to  Foreign  Areas 

Because  the  density  of  meteorological  input  data 
within  a region  affects  both  the  reliability  and  the 
variability  of  the  model's  results,  NOAA  personnel 
performed  studies  to  determine  the  percentage  of 
meteorological  stations  in  foreign  areas  for  which  6- 
hourlv  observations  were  received  regularly.  It  was 
found  that  observations  from  many  of  the  stations 


Table  IX. — 1976-77  Phase  III  A verage  Standard 
Deviations  of  Wheat  Development 
Observed  Within  ITS's 


ITS 

Mean  development  stage a 

Jointing 

Heading 

Soft 

dough 

Ripe 

S.O 

4.0 

f.O 

6.0 

Winter  wheal 

. 

0.24(6) 

0.19(4) 

0.29(4) 

U.S.  spring  wheat 

0.13(3) 

•35  (9) 

.22  (S) 

.39(6) 

Canadian  spring  wheat 

.28  (6) 

.34(9) 

.36  (7) 

.58  (9) 

^Standard  deviation  in  umli  or  a »ta|c,  approximate  number  of  days  associated 
with  stage  deviation  given  in  parentheses. 


previously  selected  tor  use  in  ACC  operations  were 
not  received  at  the  National  Meteorological  Center 
(NMC),  whereas  others  were  acquired  only 
sporadically.  NOAA's  recommendation  to  use  every 
available  station  in  the  vicinity  of  the  wheat  areas 
that  reported  regularly  was  followed  for  model 
operations. 

As  noted  early  in  this  paper,  Robertson's  crop 
calendar  is  a spring  wheat  model  developed  from 
Canadian  data.  In  LACIE  operations,  the  model  was 
extended  to  estimate  the  development  of  winter 
wheat  in  various  parts  of  the  world.  Feyerherm 
multipliers  were  computed  for  all  winter  wheat  crop 
calendar  stations  in  the  United  States,  the  U.S.S.R., 
the  P.R.C.,  India,  Argentina,  Australia,  and  Brazil 
(ref.  10). 

Studies  were  conducted  to  assess  the  applicability 
of  the  model  and  the  scalar  multipliers  for  the  winter 
wheat  grown  in  the  southern  latitudes,  where  the 
dwarf/semidwarf  varieties  are  grown.  Most  of  the 
dwarf/semidwarf  varieties  grown  in  the  southern 
latitudes  are  not  sensitive  to  day  length,  require  no 
vernalization  period,  and  may  require  warmer  tem- 
peratures. The  majority  of  the  wheat  grown  in  India 
is  actually  of  a high-yield  dwarf  variety,  and,  except 
in  far  northern  Kashmir,  there  is  essentially  no  dor- 
mancy period.  From  the  tests  on  India  data,  it  was 
observed  that  the  crop  calendar  did  not  advance 
because  of  the  day-length  factor  in  the  jointing  to 
heading  equation.  In  India,  winter  wheat  day  lengths 
typically  dip  below  1 1 hours,  which  is  the  threshold 
value  for  crop  advancement  in  this  stage  of  the 
model.  It  was  found  that  use  of  the  multipliers  in  In- 


dia produced  a distorted  crop  calendar  and  that  the 
calendar  was  not  compressed  enough  to  reflect  an 
estimate  of  actual  growing  conditions.  The  model's 
estimates  without  the  multipliers  were  better  but  still 
were  totally  inadequate.  The  results  of  the  study 
clearly  indicated  that  the  ACC  model  was  not  valid 
for  use  in  India  (refs.  11  and  12). 

A similar  situation  existed  in  the  three  Southern 
Hemisphere  countries  under  study,  where  the  wheat 
grown  has  no  dormancy  requirements  for  reproduc- 
tive maturity.  In  essence,  a spring-type  wheat  is 
grown  during  the  winter  months,  with  150-  to  200- 
day  growing  seasons.  The  ACC  model  was  run  for 
the  1977  crop  year  for  Argentina,  Australia,  and 
Brazil  without  multipliers.  No  ground-truth  data 
were  available,  but  feedback  from  the  CAMS 
analysts  on  their  development  stage  estimates  as 
determined  from  the  Landsat  imagery  indicated  that 
the  ACC  estimates  for  Australia  were  not  reliable. 

Very  spotty  and  limited  ground-truth  data  existed 
for  the  U.S.S.R.  Feedback  from  the  CAMS  analysts 
was  the  primary  means  of  determining  the  reliability 
of  the  model's  estimates.  Scalar  multipliers  were 
used  in  determining  Phase  III  winter  wheat  predic- 
tions for  the  U.S.S.R.  For  the  most  part,  no  major 
discrepancies  were  observed  between  the  model's 
results  and  the  analyst's  estimates.  In  spring  wheat 
areas,  as  long  as  the  planting  or  starter  dates  were 
realistic,  the  ACC  performance  was  good.  Problems 
did  exist  in  the  U.S.S.R.'s  New  Lands  area  east  of  the 
Ural  Mountains  during  Phase  III,  but  the  discrepan- 
cy resulted  from  erroneous  start  dates  generated  by 
the  spring  wheat  starter  model. 


CONCLUSIONS  AND  RECOMMENDATIONS 

First,  it  should  be  restated  that  the  ground-truth 
data  set  available  for  evaluating  the  accuracy  of  the 
ACC  was  very  limited.  At  its  peak,  the  network  in- 
cluded reports  from  only  18  U.S.  winter  wheat  sites, 
9 U.S.  spring  wheat  sites,  and  II  Canadian  spring 
wheat  sites.  Even  with  this  sparse  data  set.  certain 
trends  were  noticed  during  evaluation  of  &:  model's 
accuracy  and  certain  conclusions  were  made. 

For  winter  wheat,  for  each  of  the  three  growing 
seasons  under  study,  the  average  model  heading  date 
varied  between  2 and  7 days  behind  the  date  ob- 
served. A comparison  of  these  results  with  those  for 
soft  dough  shows  that  the  model  ran  fast  between  the 
stages  of  heading  and  soft  dough,  where  the  average 
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model  date  occurred  l to*  ^ days  before  the  ground- 
observed  date. 

It  was  also  found  that,  for  winter  wheat,  the 
model’s  performance  was  best  for  Phase  III 
(1976-77),  the  first  time  that  the  Feyerherm  scalar 
multipliers  were  incorporated  into  the  operational 
model. 

It  is  also  apparent  that  the  model  performed  well 
for  the  Canadian  spring  wheat  regions.  This  was  ex- 
pected since  the  model's  coefficients  were  derived 
using  phenological  data  for  the  Marquis  variety  of 
spring  wheat  grown  in  Canada. 

The  limited  assessment  of  the  model  estimates  at 
heading,  soft  dough,  and  ripe  in  the  ITS’s  showed 
that,  overall,  the  model’s  estimates  provided  more 
accurate  information  than  that  available  from  the 
historical  normals  and  generally  met  the  accuracy 
goal  of  being  within  5 to  7 days  of  the  ground  obser- 
vations. 

The  validity  of  planting  dates  and  the  lack  of  a 
moisture  term  in  the  model  were  observed  to  be  im- 
portant factors  in  the  model’s  accuracy.  Droughts  in 
South  Dakota  in  1976  and  in  North  Dakota  and  Mon- 
tana early  in  the  1977  growing  season  were  not 
reflected  in  the  model’s  estimates.  Also,  some  er- 
roneous planting  dates  were  generated  by  the  spring 
wheat  starter  model  during  the  growing  seasons,  and 
these  errors  tended  to  propagate  throughout  the 
seasons.  From  these  results,  it  is  concluded  that  an 
improvement  in  crop  calendar  capability  is  needed  in 
the  area  of  model  startup  dates.  Some  type  of 
moisture  term  also  needs  to  be  incorporated  into  the 
model  to  account  for  periods  of  moisture  excess  or 
stress.  A recent  attempt  has  been  made  to  incorpor- 
ate a moisture  variable  in  the  Robertson  model  form. 
The  only  addition  to  the  data  was  daily  precipitation 
statistics.  The  results  to  date  are  encouraging.  Details 
of  both  these  efforts  are  given  in  the  paper  by  Seeley 
et  al.  entitled  “Prediction  of  Wheat  Phenological 
Development:  A State-of-the-Art  Review.” 

Models  that  will  adequately  predict  development 
stages  for  the  dwarf  and  semidwarf  varieties  grown 
in  the  southern  latitudes  also  need  to  be  developed. 
As  the  technology  progresses  from  a single-crop  pro- 
gram, such  as  LACIE,  to  a multicrop  effort,  the  need 
for  accurate  crop  calendars  to  aid  in  the  identification 
and  separation  of  crops  will  be  greatly  increased. 
However,  for  some  of  the  crops  under  consideration, 
reliable  development  stage  data  do  not  exist  over 
enough  stages  and  geographical  areas  to  allow 
development  of  a Robertson-type  model.  Therefore, 
crop  calendar  models  using  remotely  sensed  data 


from  Landsat  and  meteorologies  satellites  should  be 
considered  as  prime  data  input  and  further 
developed  and  refined.  For  many  of  the  crops,  yield 
models  will  require  as  inputs  the  biological  stage  of 
development  of  the  plant.  This  will  be  particularly 
true  for  the  early  warning  techniques. 

Early  indications  are  that  the  remotely  acquired 
data  provide  a new  source  of  information  on  crop 
development  independent  of  the  meteorological  in- 
puts used  now.  This  should  be  pursued  not  as  a sepa- 
rate approach  but  as  a combined  approach  (spectral 
with  meteorological)  to  crop  development  estima- 
tion. 
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INTRODUCTION 

The  LAC1E  has  been  a highly  technical  effort  to 
estimate  wheat  production  in  mtuor  producing 
regions  of  the  world  by  using  satellite-derived  data  in 
combination  with  meteorological  and  historical  data. 
The  Accuracy  Assessment  Team,  the  Research, 
Test,  and  Evaluation  Group,  and  the  Information 
Evaluation  Group  within  LACIE  evaluated  LACIE 
from  a technical  standpoint,  often  recommending 
new  or  modified  technologies  to  improve  overall  per* 
formance.  The  specific  questions  addressed  were 
“How  good  are  the  estimates?"  and  ‘ What  tech- 
nology needs  to  be  improved?"  However,  a crucial 
question  has  not  been  adequately  addressed:  “Are 
there  sufficient  benefit  to  justify  implementation  of 
a satellite-based  crop  information  system?"  In  order 
to  make  this  judgment,  an  economic  evaluation  is 
needed. 

During  the  course  of  LACIE,  an  interagency  Eco- 
nomic Evaluation  Planning  Team  was  established  to 
develop  and  monitor  an  economic  evaluation  of 
LACIE.  Drawing  upon  the  team's  experience  and 
other  benefit  estimation  studies,  this  paper  first  dis- 
cusses a concept  for  valuing  crop  information,  con- 
sidering the  more  usual  approaches,  a recommended 
integrated  approach,  and  problems  of  implementa- 
tion. This  is  followed  by  a review  of  what  has  been 
done  in  the  economic  evaluation  of  LACIE-type  in- 
formation. The  various  studies  of  benefits  are 
reviewed,  and  the  costs  of  the  existing  and  proposed 
systems  are  considered.  Finally,  a method  and  ap- 
proach proposed  for  further  studies  is  reported. 


CONCEPT  FOR  VALUING  INFORMATION 

The  prime  benefit  derived  from  improved  infor- 
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mation  is  the  ability  to  make  improved  decisions— 
those  which  affect  buying,  selling,  investing,  or  set- 
ting government  economic  programs— with  in- 
creased accuracy,  in  a more  timely  manner,  or  with 
more  certainty. 

Some  decisions  affected  by  crop  information  lie 
within  the  commodity  markets  and  others  lie  out- 
side. Market-related  decisions  are  made  by  the 
domestic  or  international  grain  trade,  U.S.  producers 
and  consumers,  and  foreign  producers  and  con- 
sumers. Nonmarket  decisions  are  reflected  in 
government  policy,  the  administration  of  govern- 
ment programs,  and  agreements  with  other  nations. 
Some  decisionmakers  use  crop  production  informa- 
tion directly,  such  as  a federal  official  assessing  the 
need  to  restrict  wheat  production  during  the  nekt 
crop  year.  Some  decisions  and  impacts  are  indirect; 
for  example,  consumers  observe  an  increase  in  the 
price  of  bread  when  the  price  of  wheat  rises.  Deci- 
sionmakers who  use  crop  information  also  react  to 
information  concerning  many  other  factors,  includ- 
ing the  state  of  the  economy,  money  supply,  trade 
policies,  and  pending  legislation.  All  these  factors  are 
sources  of  uncertainty.  Improved  decisionmaking 
may  require  that  one  or  all  types  of  information  be 
improved. 

Crop  production  data  have  a number  of  properties 
that  could  be  affected  by  a system  incorporating 
satellite  data  collection  and  analysis.  Accuracy  is  im- 
portant, but  it  is  conditioned  by  (I)  when  data  are 
available  in  the  crop  season,  (2)  geographic  location 
and  detail,  (3)  comprehensiveness  and  continuity, 
and  (4)  the  reliability  of  the  estimates,  including  per- 
ceived objectivity. 


Methods  of  Valuing  Information 

The  methods  of  valuing  information  seem  to  fall 
into  two  genera)  categories.  The  “global  modeling  ap- 
proach” takes  the  form  of  simulations  or 


60S 


econometric  models  that  estimate  benefits  derived 
from  information  by  relevant  sectors  of  the  econo- 
my. Hayami  and  Peterson  (refs.  1 and  2)  used  the 
measurement  of  consumer  and  producer  surplus  to 
assess  the  value  of  crop  information.  This  has  been 
elaborated  by  others,  most  recently  by  ECON,  Inc. 
(refs,  3 to  7).  Another  global  modeling  approach  uses 
decision  analysis  or  decision  theory  to  estimate  the 
impact  of  information  on  the  decision  process  and  to 
assign  a value  to  that  impact.  Decision  analysis  was 
developed  by  theorists  such  as  Martchak  (refs.  8 to 
II),  Howard  (refs.  12  and  13).  and  Agnew  (refs.  14 
and  15).  Although  decision  analysis  usually  reflects 
the  value  of  information  to  an  individual  firm  rather 
than  to  society,  this  technique  has  been  used  to  esti- 
mate the  aggregate  value  of  information  (refs.  IS  and 
16). 

The  second  major  method  of  valuing  information 
is  quite  pragmatic,  studying  specific  user  groups.  The 
Panel  on  Methodology  for  Statistical  Priorities  pro- 
posed this  approach  to  estimate  benefits  attributable 
to  data  packages  and  programs  (ref.  17).  Savage  (ref. 
18)  has  summarized  and  criticized  the  Panel's  pro- 
posal. Eisgruber1  has  warned  that  surveys  of  users, 
which  frequently  are  a part  of  user  studies,  are 
notorious  for  their  nearsightedness.  Social  scientists 
currently  supporting  the  user  approach  include  Hoos 
(refs.  19  and  20),  Duncan  (ref.  21),  and  Sharp  (ref. 
22).  The  pragmatic  user-oriented  approach  may 
quantify  estimates  of  benefits  to  specific  user  groups 
through  the  methods  of  the  global  modelers,  but  it 
recognizes  the  impossibility  or  impracticably  of 
quantifying  benefits  to  other  user  groups,  such  as 
researchers.  For  these,  expert  opinion  is  used  to 
make  qualitative  assessments. 


Integrated  Methodology 

Miller  (refs.  23  and  24)  has  recommended  an  in- 
tegrated methodology  with  one  frn,tiework  for 
market  users  of  crop  information  and  a second  for 
nonmarket  users  (fig  I).  He  emphasizes  empirical 
estimates  of  information  and  decision  models, 
especially  for  market  users  of  crop  information.  For 
market  users,  the  methodology  comprises  four 
models  or  components. 


1 Pcrton«l  correspondence  from  Ludwif  Et«grubcr.  Oregon 
Suite  University,  to  Forreit  G Hull,  NASA  Minion  Spucc 
Center,  Jen.  20, 1971. 
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1.  An  information  model  relates  the  subset  of  in- 
formation used  in  the  decision  process  to  the  total  set 
of  information  available  and  accommodates  the  fact 
that  decisionmakers  do  not  fully  believe  in  or  act 
upon  specific  forecast  information.  Other  factors, 
such  as  alternative  sources  of  information  with  their 
perceived  costs  and  reliability,  are  also  considered. 

2.  A decision  model  describes  the  decisionmaking 
process, considering  both  the  information  model  and 
a system  of  economic  rewards. 

3.  A traditional  economic  model,  based  on  the 
assumption  of  profit  maximization,  represents 
behavioral  characteristics  involved  in  a specific  kind 
of  decision  process. 

4.  An  information  valuation  model  estimate* 
benefits  in  either  of  two  ways.  The  first  estimation 
procedure  utilizes  net  social  benefits  from  consumer 
and  producer  surplus.  The  alternative  technique  esti- 
mates the  size  of  changes  in  user  incomes,  a 
methodology  frequently  used  as  a basis  for  benefit- 
cost  ratios. 

For  the  second  group,  nonmarket  users.  Miller 
proposes  a basically  qualitative  assessment.  The  first 
subgroup  of  nonmarket  users  is  concerned  with  the 
policymaking  functions  of  the  Federal  Government 
and  may  represent  the  most  important  of  all  uses 
(refs.  17  and  25  to  28).  Examples  of  frequently  ad- 
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dressed  policy  issues  are  commodity  trade  negotia- 
tions and  grain  emb.rgoea.  Even  if  the  presidential 
and  executive  ucvirionmaking  process  is  unavailable, 
research  into  the  uses  of  information  for  such  deci- 
sions would  provide  insight  for  evaluation. 

A second  subgroup  of  nonmarket  users  is  con- 
cerned with  the  administration  of  legislated  federal 
programs.  Estimation  of  the  value  of  information  for 
such  activity  is  more  feasible  than  for  policymaking 
because  budgets  have  been  assigned  and  administra- 
tive latitude  is  relatively  known. 

The  final  major  nonmarket  user  of  information  is 
the  research  community.  Since  the  value  of  informa- 
tion as  an  input  to  research  is  highly  correlated  with 
the  ultimate  value  of  the  research,  the  value  of 
neither  is  readily  predictable.  The  value  of  either  in- 
formation or  research  is  ascribable  only  insofar  as  it 
affects  buying,  selling,  investing,  or  human  life. 

As  the  benefits  of  global  information  generated 
from  LACIF-developed  technology  are  evaluated, 
elements  of  technology  assessment  emerge.  The 
value  ot  l.\CIE  may  be  heavily  influenced  by  the 
learning  experience  and  the  development  of  an  ad- 
vanced technology  in  the  future.  Thus,  the  assess- 
ment of  the  value  of  infoimation  should  be  gener- 
ated for  differing  levels  of  technology  at  future  time 
periods,  with  the  present  value  assessed  using  an  ap- 
propriate discounting  procedure  (fig.  1). 


Problems  of  Implomontatlon 

Since  there  is  no  generally  accepted  methodology 
for  estimating  the  value  of  information,  appropriate 
methodology  must  be  developed  and  tested.  The 
relationships  between  information,  decisions,  and 
market  structure  are  crucial  to  the  development  of 
this  methodology  (ref.  29).  Future  decisions  con- 
cerning implementation  of  a satellite-based  crop  in- 
formation system  require  better  assessments  of  the 
need  for  information,  which  a satellite-based  system 
can  provide.  In  addition,  expected  investment  and 
operational  costs  must  be  estimated.  Economic 
analysis  of  the  value  of  information  can  assist  in 
making  these  decisions.  However,  the  usefulness  of 
the  analyses  is  influenced  by  practical  considerations 
such  as  budget  restrictions  and  the  “client's''  accept- 
ance of  a new  information  source. 

Thus,  a major  problem  is  measuring  the  quality  of 
information  and  relating  that  quality  to  user  require- 
ments. There  arc  several  attributes  of  high-quality  in- 
formation. Authors  have  listed  such  factors  as  objec- 


tivity, accuracy,  reliability,  adequacy,  continuity, 
comprehensiveness,  geographic  detail,  timeliness, 
availability,  relevance,  and  believability  (refs.  17  and 
30).  Some  of  these  attributes  are  difficult  or  impossi- 
ble to  quantify,  yet  improvement  in  one  charac- 
teristic could  result  in  an  increase  in  value.  Because 
of  the  difficulty  in  quantification,  most  studies  have 
limited  their  evaluations  to  measuring  those  cha  ac- 
teristics  of  information  quality  that  have  the  highest 
potential  for  generating  economic  value. 

Other  problems  encountered  in  estimating  the 
value  of  information  include  the  following. 

1.  Extending  the  results  of  an  analysis  based  on 
current  system  performance  to  a system  using  un- 
tested technology. 

2.  Anticipating  political  and  economic  conditions 
that  change  requirements  for  information  and  its 
value. 

3.  Assessing  the  effect  of  production  changes  on 
prices  and  market  receipts,  which  in  turn  may  make 
timely  and  accurate  crop  forecasts  more  valuable. 

4.  Extending  results  of  analyses  of  information 
value  that  assume  competitive  conditions  to  situa- 
tions dominated  by  government  and  large  commer- 
cial organizations. 


BENEFITS  AND  APPLICATIONS  OF 
IMPROVED  CROP  CONDITION 
INFORMATION 

The  U.S.  wheat  crop  is  so  large  and  the  associated 
transactions  are  so  great  that  modest  improvements 
in  the  marketing  system  could  have  large  aggregate 
benefits.  For  example,  an  increase  of  one  cent  per 
bushel,  resulting  from  a price  increase  or  an  efficien- 
cy improvement,  had  an  aggregate  value  to  U.S. 
farmers  of  $2 1 million  in  1975  and  nearly  $18  million 
in  1974.  Exports  in  1975  were  1.2  billion  bushels: 
thus,  one  cent  per  bushel  could  have  amounted  to  a 
difference  in  returns  to  the  United  States  from  the 
rest  of  the  world  of  $12  million.  In  1975,  the  cost  of 
moving  wheat  from  the  farms  to  the  docks  for  export 
was  $873  million:  small  per-unit  efficiencies  in  stor- 
ing and  shipping  could  have  been  large  in  total.  Arc 
there  ways  for  satellite-based  improvements  in 
foreign  wheat  production  information  to  affect  deci- 
sions concerning  planting,  harvesting,  buying,  sell- 
ing, or  investing  that  would  exceed  the  cost  of  the 
improvements?  Although  no  final  analyses  have 
been  made,  there  arc  several  studies  w hich  add  to  our 
understanding  of  the  question. 
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EC  ON  StudlM 

Since  1972,  ECON,  Inc,,  of  Princeton,  New  Jersey, 
has  developed  benefit-cost  or  economic  evaluations 
of  Earth  resources  satellite  systems  for  NASA.  In 
developing  their  estimates,  ECON  has  made  two  key 
assumptions:  (1)  performance  of  at-harvest  crop 
estimates  is  within  10  percent  of  true  production  9 
years  of  10— the  LACIE  90/90  criterion,  and  (2)  the 
United  States  would  be  operating  within  a perfectly 
competitive  international  market.  Under  these 
assumptions,  ECON  has  estimated  that  benefits  to 
the  United  States  of  improved  foreign  production  in- 
formation would  be  in  the  neighborhood  of  $300 
million  annually.  They  estimate  that  about  $240 
million  would  be  from  improved  wheat  forecasts. 

One  of  the  ECON  approaches  was  essentially 
descriptive  or  positive,  used  econometric  techniques, 
and  was  sometimes  known  as  ECON’s  production 
model  (ref.  31).  A second  approach  was  partially  nor- 
mative and  has  been  referred  to  as  ECON’s  distribu- 
tion model  (refs.  4 and  32).  Their  present  work  is  an 
extension  of  the  distribution  benefits  model  and  is 
referred  to  as  the  integrated  model  (ref.  4).  The  in- 
tegrated model  is  currently  used  to  estimate  both 
production  and  distribution  benefits. 

The  integrated  model,  a stochastic  dynamic  deci- 
sion model,2  has  its  roots  in  the  work  of  Hayami  and 
Peterson  (refs.  I and  2).  The  model  assumes  perfect 
competition  and  imperfect  foresight.  It  uses  dynamic 
programing  to  solve  for  production,  consumption, 
and  trade-with-uncertain-information  that  would 
maximize  producer  and  consumer  surplus.  The 
global  activity  of  crop  production  and  distribution  is 
treated  as  a process,  in  which  rates  of  planting,  con- 
sumption, and  exports  are  decision  control  variables. 
Estimates  of  supply  in  the  two  producing  units,  the 
United  States  and  the  rest  of  the  world,  are  resultant 
state  variables.  The  value  of  improved  information  is 
obtained  by  comparing  producer  and  consumer 
surplus  resulting  from  alternative  information 
systems.  ECON  assumed  that  all  participants  oper- 
ated in  a perfectly  competitive  world  wheat  market 
and  used  production  forecasts  equivalent  to  those 
published  by  the  Commonwealth  Secretariat  of  the 
United  Kingdom.  A second  analysis  assumed  that 
participants  shifted  completely  to  the  more  accurate 


2 John  Andrews,  "A  Stochastic  Dynamic  Decision  Model  of 
the  Value  of  Improved  Public  Crop  Information  {Wheat).”  Un- 
published report.  ECON,  Inc.,  Princeton,  New  Jersey,  1977, 


information  represented  by  the  LACIE  90/90  cri- 
terion. 

According  to  ECON,  the  principal  benefit  to  the 
United  States  from  improved  foreign  crop  informa- 
tion would  be  from  selling  larger  quantities  to  the 
rest  of  the  world  in  those  months  in  which  prices 
were  higher.  ECON's  model  maximizes  returns  to 
the  global  society  by  optimizing  stocks,  U.S.  exports, 
and  U.S.  production.  Inventories  in  the  United  States 
increase  with  improved  information,  while  invento- 
ries in  the  rest  of  the  world  decrease.2  Preliminary 
results  show  that  at  the  end  of  the  model’s  base  year, 
buffer  stocks  or  inventories  would  average  2.1 
million  metric  tons  (MMT)  in  the  United  States,  31.8 
MMT  in  the  rest  of  the  world,  and  1.2  MMT  in  tran- 
sit. Under  the  improved  information  system,  this 
would  change  to  13.3  MMT  in  the  United  States,  15.6 
MMT  in  the  rest  of  the  world,  and  1.0  MMT  in  tran- 
sit at  year’s  end.  Note  that  total  buffer  stocks  would 
decrease  if  improved  information  were  available. 
Total  annual  exports  would  remain  the  same,  but  ex- 
port revenues  would  be  much  higher  under  the  im- 
proved system  because  export  sales  would  occur  at 
higher  price  levels.  The  higher  export  revenues 
would  be  somewhat  offset  by  increased  storage  costs, 
including  in.erest  costs. 

ECON  shows  that  trade  benefits  from  improved 
crop  information  sift  down  eventually  to  producers 
and  consumers.  Some  benefits  would  result  from  ad- 
justment in  production.  In  addition  to  benefits  to  the 
United  States,  benefits  to  the  rest  of  the  world  result 
from  decreased  inventory  costs  roughly  equal  to  10 
percent  of  the  U.S.  benefits. 


Futures  Group  Study 

In  the  spring  of  1976,  the  U.S.  Department  of 
Agriculture  (USDA)  contracted  with  the  Futures 
Group,  Inc.,  of  Glastonbury,  Connecticut,  to  study 
the  use  and  usefulness  of  improved  wheat  informa- 
tion to  the  USDA  (refs.  33  and  34).  USDA  officials 
were  interviewed  to  determine  how  improvements 
in  foreign  wheat  production  information  would 
affect  program  decisions.  The  use  of  impro’  ed  infor- 
mation for  broad  policy  direction  or  the  nved  to  pro- 
vide improved  information  to  the  public  were  not 
pursued.  The  30  respondents  consisted  of  nearly 
equal  numbers  of  program  managers,  analysts,  and 
those  performing  both  functions.  Major  findings  of 
the  study  are  discussed  in  the  following  paragraphs. 

The  principal  uses  of  improved  foreign  crop  infor- 
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mation  by  the  USDA  would  be  in  the  management  of 
export  programs,  especially  PL-480,  and  in  negotia- 
tions and  adjustments  within  bilateral  agreements. 
Improved  information  would  also  be  used  to  support 
decisions  concerning  international  wheat  reserves  or 
embargoes. 

The  need  for  wheat  production  data  varies  be- 
tween programs.  For  some  programs,  current-season 
data  cannot  be  obtained  early  enough  to  significantly 
affect  program  decisions.  For  other  programs,  im- 
proved wheat  information  would  be  important  pro- 
vided the  improvements  were  significant.  Under 
some  circumstances,  program  decisions  have  been 
delayed  until  better  crop  data  became  available.  Ex- 
amples of  commodity  program  decisions  that  could 
be  affected  by  more  up-to-date  global  information 
are  summarized  in  table  I. 

Under  supply  and  demand  conditions  current  at 
the  time  of  the  study,  programs  relating  to  wheat  pro- 
duction controls  or  income  supports  were  not  being 
applied.  Therefore,  foreign  wheat  production  infor- 
mation was  not  needed  to  support  most  USDA  pro- 
gram decisions. 


The  possibility  of  large  worldwide  wheat  supplies 
and  an  associated  weak  export  demand  or  of  small 
supplies  and  a strong  export  demand  could  signifi- 
cantly affect  the  need  for  better  information  about 
world  wheat  production.  The  occurrence  of  short 
wheat  supplies  would  be  more  disruptive  and  make 
good  early  production  information  especially  useful. 

Many  variables,  other  than  foreign  production, 
are  involved  in  projecting  U.S.  exports.  These  varia- 
bles include  prices,  political  attitudes  and  actions  of 
foreign  governments,  grain  carryover,  livestock 
numbers  and  feed  use,  availability  of  foreign  ex- 
change, and  transportation  problems.  For  factors 
other  than  production,  information  usually  can  be 
improved  only  indirectly,  if  at  all,  with  satellite-based 
data. 

Department  officials  expressed  the  desire  to  have 
more  accurate  and  timely  information.  They  thought 
that  if  improved  information  were  fed  into  the 
marketing  system,  the  operation  of  the  market  would 
be  improved.  Furthermore,  they  anticipated  the 
possible  recurring  need  for  agricultural  price  or  sup- 
ply programs. 


Tahi.i:  I. — Dates  and  Time  Flexibility  for  USDA  Wheat  Programs  Decisions 


Decision 

/.  < -gisialt  ■ it  i >r  n -quirt  -it 
tlccisitm  tiaif 

/.v  th-cisitm  subject 
lo  change  Kith  new 
information? 

1. 

Wheal  national  program  acreage 
(NPA) 

By  August  15  of  previous  crop  ycara 

NPA  may  be  adjusted  based  on 
new  information 

2. 

"Sct-asidc"  acreage 

By  August  1 of  previous  crop  year1 

By  clear  precedent,  could  reduce 
set-aside  required  but  not  in- 
ctease  it 

3 

Price  support  level 

No  legal  date;  best  done  by  planting 
time  but  seldom  announced  that 
early 

Clear  precedent  not  to  reduce 
amount 

4. 

Disposal  of  Commodity  Credit 

No  legal  date;  cannot  sell  CCC  inven- 

Limited  by  desire  to  minimize 

Corporation  (CCC)  slocks 

lory  at  less  than  150  percent  of  cur- 
rent loan  rate  when  reserve  is  in 
effect 

effect  on  market 

5. 

CCC  aid  for  export  sales 

No  legal  date;  made  when  there  is  U S. 
buildup  of  price-depressing  stocks 

Full  flexibility 

6. 

Wheat  available  for  PL-480 

No  legal  date;  announce  plan  by  Octo- 
ber 1 

Full  flexibility 

7. 

Wheat  reserve 

No  legal  date;  producer-held  wheat 
reserves  of  300  to  700  million 
bushels  are  required 

Limited  by  policy  created 

4 lilt  Ample:  lor  Ihe  1979  wheal  crop,  which  i*  planted  in  the  fall  of  1978  and  smog  l,l  1979,  the  decision  mu*l  b c announced  by  August  15.  1978 
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Overview  of  the  If  .8.  Whoot  Industry 


An  overview  of  the  U.S.  wheat  industry  made  in 
mid-1977  by  the  Economics,  Statistics,  and  Coopera- 
tives Service  (ESCS)  provides  some  necessary  back- 
ground information  concerning  how  crop  informa- 
tion is  used.  These  uses  are  the  basis  for  its  value 
(ref.  35).  In  itself,  the  study  provides  no  expression 
of  the  value  of  information. 

The  study  traces  the  flow  of  wheat  between 
various  sectors  of  the  wheat  industry.  Accompany- 
ing the  physical  flows  are  decision  flows.  Decision 
flows  may  travel  parallel  to  the  physical  flows,  or 
separately,  as  in  the  case  of  the  futures  market.  Both 
physical  and  decision  flows  are  summarized  in  figure 
2;  the  figure  also  indicates  the  number  of  participant 
locations  in  each  sector.  Researchers  identified  two 
sectors  as  key  wheat-market  information  users.  The 
first  is  the  large  integrated  export  firms.  The  second 
is  the  terminal  markets.  Information  concerning  a 
third  sector,  the  Federal  Government  with  its  regula- 
tions and  policies,  is  also  critical  to  wheat  decisions. 

Timeliness  of  information  was  identified  as  of 
major  importance  for  decisions  based  on  wheat  in- 
formation. Decisions  affected  by  wheat  information 
include  how  much  to  plant,  sell,  feed,  process,  and 
consume;  when  to  buy,  sell,  ship,  and  store;  where  to 
buy,  sell,  load,  or  ship;  what  to  plant,  what  quality  of 
product  to  buy  or  sell,  and  what  transportation  to 
use;  how  much  to  pay  for  land,  where  to  build,  and 
the  size  of  facilities  for  storage  and  processing  (ref. 
35,  p.  34).  Deficiencies  in  information  may  lead  to 
delay,  waste,  and  other  inefficiencies  in  the  wheat  in- 
dustry and  the  general  economy. 


FARM 
500  000 


USD  A] 


COUNTRY  ELEVATOR 


SURTEMMNAL 
ELEVATOR  (NO  E8T) 


■Jjaarressaor 


MERCHANT 


ELEVATOR  480 


ffV 


HOUR  MILLER 

m 

I 

1 

j— 

• t 

! T 

FUTURES  MARKET 
J 

MOO 

[WHOLESALER  INO  EOTll' 
■1  RETAILER  (NO  1ST)  1 


DOMESTIC 

200  000  000 


LL 


r terminal 

00 


HU. 


■ FLOWS 

■ DECISIONS 


IMPORTER  (NO  EST)  I 


FIGURE  2.— Major  wheal  flow  by  lector,  (law  of  merchandising 
decisions,  and  number  of  participant  locations  by  sector. 


Evaluation  of  the  Ollaaad  and  Products 
Program 

During  1976,  the  information  functions  of  the 
Foreign  Agricultural  Service  (FAS)  oilseed  and  prod- 
ucts program  were  evaluated  (ref.  36).  Although  con- 
ducted totally  apart  from  LACIE,  this  study  has  clear 
implications  concerning  the  value  of  improved 
wheat  information.  The  actual  extent  of  those  im- 
plications must  be  judged  in  the  light  of  the 
similarities  of  information  use  and  needs  of  both 
oilseed  and  food  grain  information  users  and  accord- 
ing to  the  nature  of  what  is  being  compared. 

A mail  survey  was  made  of  the  subscribers  to  the 
FAS  oilseeds  and  products  publications.  Subscribers 
from  government  and  international  organizations 
were  not  included  and  are  not  considered  below.  The 
original  population  of  about  1800  subscribers  was 
categorized  into  three  subpopulations;  (1)  private 
trade;  (2)  executives  of  firms;  and  (3)  media,  farm 
and  trade  associations,  and  educational  institutions. 
About  42  percent  of  the  subscribers  were  sent  ques- 
tionnaires; 69  percent  of  those  returned  usable  data. 

The  composition  of  the  subscriber  list  gives  a 
rough  index  of  where  the  interest  lies  for  crop  pro- 
duction information.  Subscribers  to  the  oilseeds  and 
products  publications  may  be  categorized  as  follows. 


Subscriber  Percent 

Trade/broker  15 

Manufacturing  16 

Pioccssing  10 

Import/expori  9 

Consulting  service  8 

Media  8 

Educational  institutions  6 

Farming  4 

Banking/finance  3 

Trade  associations  3 

Farm  associations  2 

Transportation  2 

Stonge/handling  I 

Other  8 


Two  findings  of  the  oilseed  and  products  study  are 
of  particular  interest  when  evaluating  th  • need  for 
wheat  information.  Crop  production  information 
was  clearly  the  top-priority  need  of  respondents. 
Subscribers  were  asked  to  designate  their  3 highest 
priority  information  needs  from  a list  of  1 1 topics. 
Seventy-nine  percent  of  the  private  trade  group 
ranked  crop  production  as  their  highest  priority  in- 
formation need,  and  49  percent  ranked  information 


* 


8 
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on  consumption  as  their  second  highest  priority.  Ex- 
ecutives designated  information  on  crop  production 
as  their  highest  priority  (65  percent)  and  information 
on  exports  and  imports  as  their  second  priority  (41 
percent). 

Timeliness  was  the  attribute  of  FAS  information 
which  received  the  most  criticism.  Although  73  per- 
cent of  the  private  trade  audience  ranked  all  other  at- 
tributes either  good  or  excellent,  45  percent  ranked 
timeliness  as  fair  or  poor.  Among  the  executive 
group,  all  information  attributes  except  timeliness 
were  ranked  good  or  excellent  by  82  percent  of  the 
respondents.  Timeliness  was  ranked  fair  or  poor  by 
58  percent.  It  should  be  noted  that  the  question  of 
timeliness  concerned  all  information  from  the  pro- 
gram and  did  not  distinguish  crop  production  infor- 
mation separately. 

The  subgroup  composed  of  the  media,  associ- 
ations, and  educational  institutions  acts  as  intermedi- 
aries in  information  transfer.  They  would  not  need 
foreign  crop  information  for  decisions  in  their  own 
organizations  but  would  use  it  in  their  reports  to 
commodity  decisionmakers.  Sixty-one  percent  of 
this  subgroup  placed  production  as  their  most  impor- 
tant information  need;  34  percent  classified  timeli- 
ness of  information  in  available  reports  as  fair  or 
good. 


COSTS  OF  CROP  INFORMATION  SYSTEMS 

The  big  question  concerning  a crop  information 
system  resting  on  LACIE-developed  technology 
must  be  whether  the  benefits  outweigh  the  costs 
sufficiently  to  warrant  further  investigation  or  use  of 
the  technology  in  an  operational  mode. 

Two  sets  of  cost  estimates  have  been  developed. 
Estimates  of  a satellite-based  system  were  made  in 
1976  and  updated  in  1977.  To  give  perspective,  costs 
of  the  current  system  were  also  estimated  in  1977. 

Considerable  caution  must  be  exercised  when 
considering  these  figures.  The  two  are  not  directly 
comparable.  The  present  USD  A system  primarily  as- 
sembles, weighs,  and  disseminates  information 
developed  and  paid  for  by  others.  On  the  other  hand, 
the  satellite-based  system  would  be  an  entirely  new 
data  source  that  would  provide  additional  informa- 
tion to  the  current  system.  The  products  of  a 
satellite-based  system  would  have  different 
geographic  comparability , different  statistical  proper- 
ties, different  timeliness  features,  perhaps  different 
believability,  and  perhaps  different  uses.  Thus,  direct 


comparisons  of  the  costs  of  the  two  systems  without 
simultaneous  comparisons  of  product  quality  and  as- 
sociated benefits  are  likely  to  be  misleading.  An 
analysis  of  the  uses  and  benefits  of  the  present 
system  and  of  a satellite-based  system  has  yet  to  be 
made. 


Coats  of  Satalllto-Basad  Systems 

The  USDA/LACIE  has  developed  cost  estimates 
for  satellite-based  approaches  that  could  provide  bet- 
ter and  faster  information  on  important  world  crops. 
Two  alternative  systems  were  considered— one  for  a 
single  crop  and  one  for  several  crops.  Each  would 
produce  repetitive  area,  yield,  and  production 
forecasts  throughout  the  season  in  countries  with  95 
percent  or  more  of  total  production;  either  could  pro- 
vide periodic  updates  for  areas  of  current  critical  in- 
terest. Projections  of  the  total  USDA  capital  invest- 
ment required  range  from  about  $10  million  for  a 
single-crop  system  to  about  $29  million  for  coverage 
of  eight  major  crops  (table  II).  These  estimates  are 


Table  II. — Projected  Costs  of  USDA  Remote-Sensing- 
Based  Crop  Assessment  System 

I Millions  of  dollars?  I 


Item 

Range 
One  crop 

o feasts 
Mol  lump 

Investments 

Hardware 

56 

19.0 

Software 

2.3 

7.4 

Data  base 

1 

.1 

Conversion 

.03 

.03 

Relocation  expenses 

1.0 

2.0 

Other 

.8 

.9 

Total  (cumulative) 

8.9  to  108 

19.4  to  30.0 

Operations 

Personnel 

2.4 

46 

Administrative 

.7 

1.3 

ADP  services 

.5 

1.2 

Facilities 

.2 

.4 

Research  and  development 

.3 

.4 

Support  services 

.6 

,7 

Total  (yearly  average) 

1.3  to  5.1 

6.0  to  9.4 

* All  cost*  based  on  constant  (1977)  dollars  with  the  exception  of  salaries,  which  arc 
increased  5 percent  annually  in  accord  w*lh  past  trend,  cf  OMU  Circular  No  A *94. 
revised  Mar  27.  197} 
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cumulative  over  10  years.  Estimated  annual  operat- 
ing coats  range  from  about  $5  million  to  about  $9 
million.  Other  systems  could  be  defined,  with  each 
exhibiting  associated  differences  in  output,  costs, 
and  benefits. 

The  cost  estimates  were  made  using  a model  that 
separates  costs  into  capital  investments  and  operat- 
ing costs.  The  costs  of  L ACIE  research  and  develop- 
ment are  not  included;  however,  USDA  costs  associ- 
ated with  the  application,  development,  and  test 
phases  following  LACIE  are  included.  The  costs  of 
satellite  data  collection  and  a ground  processing 
system  are  included  only  as  an  annual  payment  for 
Landsat  products. 

Costa  of  the  Present  System 

The  present  USDA  foreign  crop  information 
system  reports  on  crop  production,  trade,  stocks, 
consumption,  governmental  policies  in  110  coun- 
tries, and  prices  (ref.  37).  Total  costs  for  FY  1977  for 
this  system  were  approximately  $19  900  000  (table 
III).  Of  that  amount,  FAS  and  ESCS  production  esti- 
mates for  all  crops  in  the  seven  LACIE  countries  ac- 
counted for  slightly  less  than  $700  000.  Wheat  esti- 
mates alone  cost  approximately  $165  000. 

The  principal  sources  of  foreign  crop  information 
in  USDA  are  the  98  agricultural  attaches  and  assist- 
ant attaches  assigned  to  foreign  countries  by  FAS. 
Sixty-one  percent  of  the  cost  of  developing  wheat 
production  estimates  for  the  seven  LACIE  countries 


is  derived  from  the  FAS  agricultural  attache  pro- 
gram. The  remainder  is  divided  among  different  FAS 
and  ESCS  analyst  units  in  Washington  and  wheat 
team  trips  to  the  U.S.S.R. 

Primary  responsibility  for  estimating  foreign  crop 
production  ties  with  the  FAS.  However,  ESCS 
analysts  play  a major  role  in  developing  estimates  for 
the  U.S.S.R.  and  the  People's  Republic  of  China.  Ex- 
tra effort  is  expended  by  the  USDA  to  estimate  crop 
production  in  those  countries  that  do  not  release 
their  information  as  freely  as  others. 

The  FAS  and  the  ESCS  made  some  very  tentative 
cost  projections  of  the  present  USDA  foreign  crop 
information  system.  They  estimated  that  the  costs  of 
crop  forecasts  for  the  seven  LACIE  countries  under 
the  current  system  would  increase  about  $83  000  by 
1981  for  all  crops,  of  which  $17  000  would  be  for 
wheat.  By  1986,  additional  increases  of  $24  000  for  all 
crops  ($3000  for  wheat)  were  anticipated.  Total  costs 
of  the  current  system  for  the  seven  countries  for  the 
1977-86  period  were  expected  to  be  $8.9  million  for 
all  crops,  of  which  $2.1  million  would  be  for  wheat 
(table  IV). 


FUTURE  PROGRAM  OF  ECONOMIC 
EVALUATION 

An  approach  for  economic  evaluation  has  been 
recommended.  The  approach  is  pragmatic,  is 
oriented  toward  information  users,  and  uiiii?es 


Table  III. — Cost  of  Current  USDA  Foreign  Crop  Production  Estimates11 


USDA  division 

Total  budget 
for  all  countries 

PmJm  iMl  ssihnaies  for 
7 LACIE  ivuntrics 

Major  crops 

Wheat  only 

FAS  attache  program 

St!  811  000 

$503  400 

$100  400 

FAS  Washington  analysisc 

4 797  000 

95  900 

18  700 

ESCS-FDCD4*  analysis 

3 137  000 

55  300 

12  300 

Othere 

1S9  000 

35  000 

*33  900 

Total 

$19  904  000 

$689  600 

$165  300 

aCo*t  estimates  include  overhead 

^Total  cost  of  USDA  foreign  crop  information  system,  which  includes  considerably  more  than  production  estimation  -see  text 
1 Foreign  Commodity  Analysis  program  area 

^Economics  Statistics,  and  Cooperatives  Service  — f oreign  Demand  and  Competition  Division 

cFor  spring  wheal  team  and  winter  wheat  team  trips  to  the  U SSR  , plus  costs  of  Task  Force  on  l*  SSR  (>*jm  Situation  not  /ready 
counted  above  Total  budget  is  for  U S-U£.S  R Secretariat,  which  paid  most  costs  of  the  wheat  teams  but  none  el  the  Task  l our 
$32  ?00  was  for  wheat  teams  traveling  to  the  U SS  R 
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Tabu  I V. — Projected  Costs  of  Inputs  of  Current  USDA  Foreign  Crop 
Production  Estimation , 7 Foreign  LACIE Countries,  FY 1977-86 

I Thousands  ofdidlars0} 


t»7  7 

Ml 

im 

I977-H6 

Major  crops 

FAS  attache  input 

$503.4 

$629.5 

$770.1 

$6382.6 

FAS  Washington  analysis 

95.9 

111.4 

135.5 

1144.0 

ESCS-FDCD  input 

553 

100.0 

122.0 

950.6 

Other 

350 

37.5 

41.5 

380.4 

Total 

$689.6 

$878.4 

$1069.1 

$8857.6 

Wheat  only 

FAS  attache  input 

$100.4 

$125.8 

$152.2 

$1270.3 

FAS  Washington  analysis 

18.7 

21.7 

26.4 

222.9 

ESCS-FDCD  input 

12.3 

22.3 

27.1 

211.4 

Other 

33.9 

36.2 

39.9 

366.7 

Total 

$165.3 

$206.0 

$245.6 

$2071.3 

a 4iU<v\Kdr?  taxed  on  vvnsunt  dolUiv  with  ihccwcpuon  of  saUncv  which  are  increased  5 percent  annual!) 
»n  sviv'ul  «nh  past  trend,  cf  OMB  Circular  No  A H,  tevtsed  Mar  27.  1^2 


proved  economic  methodology  whenever  possible 
(ref.  38).  Five  tasks  have  been  specified  to  cover  the 
key  economic  questions  concerning  the  potential 
value  of  satellite-developed  estimates.  The  LACIE 
experience  in  terms  of  expected  performance  was 
considered  when  tasks  were  formulated.  The  objec- 
tives of  these  five  tasks  are  as  follows. 

1.  Appraise  the  usefulness  of  improved  global 
wheat  production  information  to  major  user  groups. 

2.  Refine  and  extend  available  models  to  develop 
quantitative  estimates  of  the  expected  value  to  the 
United  States  front  improved  wheat  production  esti- 
mates. 

3.  Evaluate  the  relationship  between  the  structure 
of  the  international  grain  trade  and  the  existence  of 
different  levels  of  public  foreign  crop  information. 

4.  Develop  and  quantify,  where  possible,  the  rela- 
tionship between  evolving  technology  and  the 
quality  of  information  derived  from  its  application; 
especially  examine  how  the  planned  and  expected 
improvements  in  the  Landsat  observing  system  and 
LACIE-developed  methodology  will  improve  the 
quality  of  wheat  production  information. 

5.  Update  cost  projections  for  a USDA  crop 
forecasting  and  condition  assessment  system  based 
on  LACIE-developed  techniques. 

As  mentioned  earlier,  two  general  schools  oi 
thought  seem  to  prevail  in  the  economic  evaluation 
of  crop  information;  a global  modeling  approach  and 


a pragmatic  user-oriented  approach.  Elements  of 
each  have  been  selected  for  the  recommended  pro- 
gram. Task  1 is  designed  to  define  the  problem  set- 
ting by  analyzing  individual  user  groups.  Benefits 
would  be  assessed  and  measured  without  the 
necessarily  restrictive  assumptions  of  quantitative 
modeling.  Both  market  and  nonmarket  users  would 
be  considered.  Task  2 would  quantitatively  assess  the 
benefits  of  improved  information  on  foreign  crop 
production  estimates.  No  new,  integrated  theoretical 
modeling  would  be  attempted;  instead,  known 
available  models  would  bi  adapted  to  estimate 
benefits.  The  results  of  Tasks  1 and  2 will  comple- 
ment each  other.  It  is  recognized  that  the  resources 
required  to  fully  accomplish  the  five  tasks  are  quite 
high.  Tasks  1 and  2 would  be  given  first  priority. 

Task  3 is  concerned  with  structural  impacts  of  in- 
formation. The  models  available  to  estimate  the 
value  of  information  systems  assume  a perfectly 
competitive  market  structure,  but  the  markets  to  be 
assessed  appear  oligopolistic  (i.e.,  contain  few  ac- 
tors). Furthermore,  the  effect  of  improved  informa- 
tion may  be  to  alter  the  distribution  of  income 
among  countries  or  groups.  Some  of  this  effect  is  ex- 
acerbated by  an  oligopolistic  market. 

Task  4 recognizes  that  the  technology  to  develop 
information  is  not  static  and  that  the  effect  of  evolv- 
ing technology  on  the  information  produced  is  not  a 
simple  correlation.  This  task  directly  considers  the 
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role  of  an  evolving  technology  in  improving  the 
quality  of  information.  The  answers  to  the  questions 
addressed  by  Tasks  3 and  4 should  expand  the  in- 
terpretation of  Tasks  1 and  2. 

Finally,  the  benefits  of  information  derived  from 
satellite-based  crop  estimates  must  be  weighed 
against  the  cost  of  their  production.  Task  S provides 
for  updating  the  costs  of  a satellite-based  global  crop 
forecasting  and  crop  condition  assessment  system. 


SUMMARY  AND  CONCLUSIONS 

Decisions  need  to  be  made  concerning  the  extent 
to  which  USDA  should  incorporate  satellite-based 
crop  estimation  techniques  into  their  global  crop  in- 
formation system.  The  only  current  study  to  esti- 
mate total  benefits  to  the  United  States  from 
satellite-based  foreign  wheat  forecasts  suggests 
benefits  to  be  about  $240  million  annually.  Costs  of  a 
satellite-assisted  system  are  estimated  to  include  in- 
vestments of  about  $10  to  $29  million  cumulative 
over  a 10-year  period,  with  an  annual  operating  cost 
of  about  $5  to  $9  million.  This  system  would  be  a 
new  source  of  information  and  would  provide  infor- 
mation not  now  available. 

The  present  USDA  foreign  crop  information 
system  is  essentially  a crop  intelligence  system  that 
covers  about  110  countries.  It  generates  some  new 
data,  but  primarily  it  assembles,  analyzes,  and  dis- 
seminates preexisting  information.  In  addition  to  re- 
porting crop  production,  the  current  system  gener- 
ates information  on  trade,  stocks,  consumption, 
policies,  and  prices.  The  total  annual  cost  for  the 
system  is  about  $20  million.  Of  the  $20  million,  FAS 
and  ESCS  estimates  show  that  it  takes  about  $0.69 
million  to  make  production  estimates  for  major 
crops  in  the  seven  foreign  LAC1E  countries;  to  esti- 
mate wheat  only  in  the  seven  countries  requires 
about  $0.17  million  annually. 

The  principal  USDA  use  of  improved  foreign  crop 
information  would  be  to  support  the  management  of 
export  programs  and  export  policy  decisions.  The  oc- 
currence of  a short  wheat  supply,  either  domestic  or 
foreign,  would  especially  increase  the  importance  of 
such  improved  wheat  information. 

A recent  study  of  the  FAS  oilseed  and  products 
program  has  direct  implications  for  LACIE-type 
wheat  information.  Of  all  the  information  FAS  sup- 
plied in  the  oilseed  and  products  publications,  crop 
production  was  clearly  ranked  by  subscribers  as  their 


top-priority  information  need.  Lack  of  timeliness 
was  the  attribute  of  FAS  information  they  criticized 
the  most.  Presumably,  LACIE's  strengths  would  be 
in  these  two  areas. 

Economic  evaluation  of  LAC1E  is  ongoing.  Both 
present  and  future  studies  will  emphasize  benefits  of 
forecasts  of  foreign  crops,  wrestle  with  conceptual 
problems,  and  come  up  with  descriptions  and  esti- 
mates that  will  help  decisionmakers  determine 
whether  or  not  improvements  in  information  from 
satellite-based  technology  are  worth  the  costs. 
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Supporting  Research  and  Technology 


FOREWORD 

The  purpose  of  the  Supporting  Research  program 
in  LACIE  is  to  provide  the  technology  improve- 
ments required  to  make  LACIE— and  subsequent 
crop  inventory  experiments  like  LACIE— suc- 
cessful. It  is  an  applied  program  motivated  by  prob- 
lems that  have  surfaced  in  the  LACIE  large-scale  ex- 
periments. Although  the  program  has  provided  some 
solutions  of  the  "quick  fix”  variety,  most  of  the 
research  effort  is  directed  to  problem  solving  on  a 
time  scale  of  1 to  3 years.  And,  since  this  period 
spans  the  duration  of  LACIE,  a large  portion  of  the 
program  is  aimed  at  providing  improvements  fot 
later  remote-sensing  applications  of  this  type.  It  is 
proper,  therefore,  to  view  LACIE  as  a series  of  large- 
scale  experiments  in  which  existing  technology  has 
been  evaluated  and  as  a series  of  studies  which  can 
form  the  basis  for  improvements  in  future  applica- 
tions. 

Many  institutions  have  contributed  and  are  still 
contributing  to  the  Supnorting  Research  program.  A 
complete  list  of  these  institutions  and  of  the  major 
disciplines  they  represent  is  given  in  table  I.  Besides 
contributing  by  performing  research,  members  of 
these  institutions  have  also  contributed  i**  at  least 
two  other  ways.  First,  they  established  a certain  base 
technology  which  formed  the  foundation  of  many  of 
the  concepts  used  in  the  beginning  of  LACIE.  Sec- 
ondly, they  participated  in  periodic  reviews  of 
LACIE.  These  reviews  were  meant  to  be  critical  ex- 
aminations of  the  functioning  of  the  experiment  to 
ensure  that  sound  methods  were  being  used.  To  a 
large  extent,  this  contribution  has  had  a reciprocating 
effect,  for  it  has  provided  the  institutional  partici- 
pants with  a good  realization  of  the  actual  problems 
encountered  in  an  experiment  of  this  size.  Thus,  it 
has  had  an  enriching  effect  on  the  overall  quality  of 
research. 

The  research  that  is  discussed  in  this  section  is 
categorized  by  functional  areas  that  relate  to  the  ma- 
jor elements  within  LACIE.  These  areas  are  (1) 
machine-processing  methods  for  segment  crop  area 


Table  I,— Participants  In  LA CIE  Supporting  Research 
and  Their  Areas  of  Contribution 


Participant 

Study  area 

Area  estimation 

Colorado  Sum  Uni  vanity 

Canopy  modeling 

Environmental  Research 

Physics,  sensors,  modeling, 

Institute  of  Michigan 

pattern  recognition 

(ERIM) 

International  Buainess 

Statistical  deii|n,  problem 

Machinea,  Inc. 

solving 

Oreson  SUM  University 

Agricultural  economics 

Pen  American  University 

Cenopy  modeling 

Purdue  University— 

Agriculture,  pattern 

Laboratory  for  Applications 

recognition 

of  Remote  Sensing  (LARS) 

Rice  University 

Computation,  mathematics 

University  of  California 

Image  and  date  interpreution. 

at  Berkeley 

templing 

University  of  Houston 

Mathematics 

University  of  Missouri 

Agricultural  economics 

Yield/crop  calendar  modeling  and  estimation 

Ctemson  University 

Crop  physiology 

Development  Planning  & 

Crop  modeling 

Research  Associate! 

Earth  SaullUe  Corporation 

Yield  modeling 

Fort  Lewie  College 

Crop  modeling 

Kansas  Sute  University 

Crop  physiology,  yield 

modeling 

NOAACCEA* 

Yield  modeling. 

meteorological  dele 

Prairie  View  ARM 

Agriculture 

University 

University  of  Wisconsin 

Yield  modeling 

USDA  SEAb 

Yield  end  winterkill  modeling 

Sompllngtfggregatton  and  production  estimation 


South  Dikou  State  University 

Stratification 

Texas  A & M University 

Sampling,  statistics. 

mathematics,  agriculture 

TRW.  Inc. 

Error  modeling 

University  of  Texss  st  Dallas 

Statistics 

*N  at  tonal  Oceanic  and  Atmospheric  Administration  Center  for  Climatic  and 
Enviroamgmil  Aaaeaamem 

bU.S  Department  of  Agriculture  Science  and  Eduction  Administration 
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estimation,  (2)  manual  image  interpretation,  (3) 
yield  estimation  and  crop  calendar  modeling,  (4) 
sampling  and  aggregation,  and  (S)  Held  research. 

Under  category  ((),  two  main  themes— accuracy 
and  efficiency— ere  motivating  factors.  In  statistical 
terms,  accuracy  deals  with  the  bias  and  variance  of 
an  estimator.  Thus,  methods  that  can  estimate  the 
correct  crop  acreage  within  a segment  on  the  average 
(i.e.,  the  expected  value  of  the  estimator  is  the  true 
value)  and  methods  that  consistently  (i.e.,  with  tow 
variance)  produce  good  answers  are  sought. 
Although  low  bias  is  often  a standard  requirement 
for  • any  estimator,  ir.  LAC1E  it  came  to  the 
foreground  early  as  a potential  problem  area.  The 
reason  is  that  LACIE  began  with  ideas  that  were 
basically  directed  not  toward  making  an  inventory 
but  rather  toward  land  use  mapping.  Indeed,  the  con- 
cept of  image  classification  is  to  identify  land  areas  in 
accordance  with  some  given  generic  category  rather 
than  to  estimate  the  amount  of  material  (e.g.,  crop 
acreage)  in  that  category.  The  problem  with  these 
land  use  mapping  methods  in  inventory  applications 
is  that  generally  the  classification  process  makes  oc- 
casional mistakes,  and  it  can  be  shown  that  this 
classification  error  will  result  in  an  estimate  that  is 
different  from  the  true  answer  no  matter  how  many 
pixels  are  classified.  In  other  words,  classification  er- 
rors will  in  general  not  average  out  in  any  given 
classification  of  a large  area  where  the  classification 
parameters  are  not  locally  adjusted.  There  are  a num- 
ber of  approaches  that  use  classification  as  one  step 
and  subsequently  correct  for  errors,  and  there  are 
methods  that  do  not  use  classification  at  all.  All  these 
approaches  lead  to  unbiased  or  asymptotically  un- 
biased estimators.  The  performance  (as  measured, 
for  example,  by  the  variance  in  the  results)  of  those 
that  use  classification  improves  as  classification  error 
is  reduced.  And,  since  classification  has  continued  to 
play  a dominant  role  in  LACIE,  improved  classifica- 
tion methods  have  continued  to  be  a research  theme. 

Efficiency,  as  used  here,  relates  to  processing 
methods  that  require  a minimum  of  manual  inter- 
vention. In  terms  of  classification,  this  implies 
methods  that  call  for  only  a minimal  amount  of 
machine  classifier  training.  Throughout  the  course  of 
LACII; , there  has  been  a belief  that  machine  process- 
ing can  play  a dominant  role  in  area  estimation.  Yet, 
in  the  large-scale  experiments,  training  samples  need 
to  be  drawn  from  each  segment  that  is  classified. 
Hence,  a large  portion  of  the  research  effort  has  been 
directed  toward  solving  this  efficiency  problem— a 


problem  that  has  come  to  be  called  the  signature  ex- 
tension problem. 

Manual  image  interpretation  has  continued  to 
play  a key  role  throughout  LACIE  since  the  use  of 
ground  observation  data,  especially  over  foreign 
areas,  was  ruled  out.  Even  though  improvements  in 
machine-processing  methods  have  been  sought,  the 
requirement  to  be  able  to  adapt  to  anomalous  situa- 
tions implies  that  a certain  amount  of  manual  in- 
terpretation will  be  necessary.  Interpretation  of 
Landsat  imagery  indeed  presents  a challenging  prob- 
lem. Unlike  high-resolution  aircraft  image  data,  for 
example, « Landsat  image  captures  very  little  tex- 
tural information  related  to  an  agricultural  crop. 
Because  texture  is  so  important  in  human  recogni- 
tion processes,  conventional  photointerpretation 
methods  thus  have  to  be  rethought.  The  central 
research  goal  has  been  formulating  human  decision- 
making processes  that  are  predicated  on  an  under- 
standing of  crop  development  properties  and  deter- 
mining how  those  properties  appear  in  Landsat  data 
under  a variety  of  environmental  circumstances.  The 
design  of  more  informative  displays,  better  use  of  an- 
cillary data,  and  development  of  methods  based  on  a 
question-and-answer  scheme  are  some  of  the 
research  projects  that  have  contributed  in  this  area. 

Yield  estimation  and  crop  calendar  development 
were  originally  approached  from  a regression  model- 
ing point  of  view.  For  example,  in  yield  estimation, 
records  of  yields  and  associated  weather  variables 
were  used  that  dated  back  as  far  as  40  years. 
Although  moderately  successful,  the  approaches  had 
some  serious  drawbacks.  First  of  all,  the  general  ap- 
proach was  developed  over  the  United  States,  where 
data  records  of  this  kind  are  available.  This  is  not  the 
case  in  many  foreign  areas.  Hence,  even  if  a satisfac- 
tory model  were  developed  in  the  United  States,  the 
chance  of  obtaining  the  same  data  records  with 
which  to  derive  model  coefficients  in  foreign  areas  is 
small.  Moreover,  the  model  developed  with  coeffi- 
cients derived  from  U.S.  data  would  probably  not 
work  well  elsewhere.  Secondly,  the  regression  ap- 
proaches worked  well  when  average  weather  was  ex- 
perienced during  the  year.  However,  unusual 
weather  caused  unacceptably  large  yield  prediction 
errors.  Hence,  in  yield  research,  the  thrust  has  b 
to  build  a yield  model  that  is  responsive  to  weatuer 
variation  and  that  is  “transportable";  i.e.,  a model 
that  can  be  applied  in  many  different  areas  besides 
the  one  for  which  it  was  developed. 

The  original  crop  calendar  model  used  in  LACIE 
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was  a version  of  the  Robertson  spring  wheat  model 
in  which  crop  stage  development  rates  were  pre- 
dicted from  daily  maximum  and  mir  num  tem- 
peratures and  day  length.  Although  the  utodel  works 
well  for  spring  wheat  when  a proper  starting  time  is 
known,  it  wilt  not  predict  well  in  the  case  of  winter 
wheat  because  winter  wheat,  unlike  spring  wheat, 
enters  a period  of  dormancy,  a stage  that  is  not 
accounted  for  in  the  Robertson  model.  The  bulk  of 
the  research,  then,  has  been  to  develop  methods  for 
estimating  planting  dates,  in  order  to  properly  start 
the  models,  and  for  estimating  the  length  of 
dormancy. 

The  topic  of  sampling  and  aggregation  deals  with 
the  development  of  areal  sampling  strategies  and 
with  methods  for  combining  crop  area  and  yield  esti- 
mates in  order  to  estimate  production,  for  example, 
at  a country  level.  Here,  the  emphasis  has  been  to  in- 
crease the  efficiency  of  the  sampling;  i.e.,  to  be  able 
to  estimate  crop  area  to  a given  variance  with  fewer 
samples.  The  original  sampling  strategies  in  LACIE 
were  developed  using  historical  information  on  crop 
proportions  in  a political  division  (e.g.,  counties  in 
the  United  States);  however,  in  foreign  areas,  such 
data  are  seldom  available.  Thus,  the  need  to  improve 
sampling  has  led  naturally  to  proposals  for  using 
Landsat  data  to  develop  rough  estimates  of  crop 
acreage  on  which  to  stratify  or  to  look  for  correlates 
of  ciop  acreage  which  can  be  used  as  stratification 
variables. 

The  field  research  program  was  conceived  as  an 
area  of  fundamental  research  into  the  character  and 


controlling  factors  of  spectral  radiation  patterns  of 
crops  and  soils.  For  3 years,  high-resolution  spectral 
measurements,  supported  by  intensive  agronomic 
observations,  weie  made  in  controlled  experimental 
plots  at  an  agriculture  experiment  station  in  Kansas 
and  at  one  in  North  Dakota.  At  nearby  test  sites  in 
commercial  production  areas  and  at  a third  experi- 
mental site  in  South  Dakota,  similar  spectral 
measurements  were  taken  by  airplane-  and  helicop- 
ter-borne spectral  sensors,  also  supported  by 
agronomic  observations.  All  data  acquisition  was 
scheduled  to  coincide  as  nearly  as  possible  with 
Landsat  overpasses.  The  field  research  data  have 
been  integrated  into  the  LaCIE  research  data  base 
and  have  been  applied  to  studies  directed  at  current 
LAGE  problems,  some  of  the  results  of  which  are 
reported  in  this  Supporting  Research  results  section. 
The  field  research  data  will  continue  to  be  used  in 
studies  of  crop  and  soil  radiation  patterns. 

There  are  two  types  of  papers  contained  in  this 
section.  One  is  a review  paper,  which  is  intended  to 
integrate  the  research  approaches  taken  so  that  the 
reader  can  understand  the  main  concepts  without 
having  to  read  through  considerable  technical  detail. 
The  other  papers  provide  that  technical  detail  and 
are  intended  to  appeal  to  readers  who  are  perhaps  in- 
terested in  pursuing  research  topics  of  this  type  on 
their  own.  Still  more  detail  is  provided  in  the 
references  cited.  Many  of  these  references  are 
research  reports  that  have  been  compiled  during 
LACIE  and  are  available  to  the  reader  on  request. 
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Methods  for  Segment  Wheet  Aree  Estimation 

* .*» 

R.  P.  Heydom ,a  M.  C Trichel,a  and  J.  D.  Erickson a 


INTRODUCTION 

The  clsssincstion  and  mensuration  approach  used 
in  the  large-scale  experiment  studies  within  LACIE 
was  designed  to  provide  wheat  proportion  estimates 
for  each  segment.1  Such  proportions  were  provided 
by  integrated  manual  and  machine  processing,  which 
is  discussed  in  the  paper  by  Heydom  et  al.  entitled 
“Classification  and  Mensuration  of  LACIE  Seg- 
ments.’* Two  fundamentally  different  designs  were 
attempted  in  the  3 years  cf  LACIE.  The  first  design, 
which  was  implemented  during  the  first  2 years,  was 
one  in  which  the  analyst,  was  required  to  sample  a 
color-infrared  (CIR)  Landsat  image  and  pick  exam- 
ples of  wheatfields  and  of  non  wheat  areas.  The 
Landsat  reflectance  values  contained  in  these  exam- 
ples were  then  used  to  estimate  classification 
parameters.  These  examples  will  subsequently  be 
referred  to  as  “training  data”  and  the  function  of 
estimating  classification  parameters  will  be  called 
“training  the  classifier.”  Once  the  classifier  was 
trained,  every  picture  element  (pixel)  in  the  segment 
was  classified  as  wheat  or  nonwheat.  The  wheat 
acreage  estimate  for  the  segment  was  then  simply  the 
total  number  of  pixels  classified  as  wheat. 

The  large-scale  experiment  results  indicated  that 
this  approach  would  not  support  LACIE  goals.2 
There  were  a number  of  difficulties.  Basically,  the 
classification  performance  was  erratic.  Given  enough 
time,  an  analyst  could  generally  rework  the  segment 


*NASA  Johnson  Space  Center.  Houston.  Texas 

'The  term  " legmen  f refers  to  the  5-  by  b-nauttcal-mile  pri- 
mary sampling  unit  as  discussed  in  the  paper  hy  Fcivcson  « al. 
entitled  “LACIE  Sampling  Design  " 

2Thc  staled  objective  in  LACIE  was  to  estimate  wheat  pto- 
duciion  at  a country  level  in  compliance  with  the  90/90  criterion 

(see  the  plena'  y paper  by  MacDonald  and  Hall  emitted  “LACIE : 
An  Experiment  in  Global  Crop  Forecasting").  The  segment  esti- 
mates ihcrut-re  had  to  support  that  criterion. 


by  alternately  choosing  training  fields  and  reclassify- 
ing until,  by  qualitative  judgment,  results  were 
satisfactory.  In  short,  it  was  a very  “arty”  process. 
Because  of  the  number  of  segments  that  needed  to  be 
processed  in  a given  period  of  ;ime,  the  analyst  was 
limited  in  his  exposure  to  any  one  segment.  In  the 
allotted  time,  it  was  very  difficult  to  obtain  an  accept- 
able multitemporal  classification;  consequently, 
almost  all  classifications  were  of  the  unitemporal 
variety.3 

The  root  of  the  problem,  however,  was  probably 
the  fact  that  an  analyst  was  required  to  examine  a 
CIR  image  and  discern  from  observations  of  colors 
al,  i te  spectral  variety  required  to  train  a complex 
statistically  oriented  classification  algorithm.  The 
mental  transformation  from  color  to  statistical  con- 
cept, such  as  the  “number  of  normal  distributions  re- 
quired to  fit  the  data”  or  “appropriate  sample  sixes 
required  to  estimate  distribution  means  and 
covariances”  is  indeed  a difficult  if  not  an  impossible 
chore. 

The  difficulties  with  this  design  motivated  the 
development  of  a second  design  called  Procedure  1 
(see  the  paper  by  Heydom  et  al.)  Procedure  1 
relieved  the  analyst  of  nonlabeling  functions  and  re- 
quired only  that  he  label  accurately  given  pixels 
(called  dots)  randomly  selected  from  the  image. 
Through  the  use  of  a clustering  mechanism,  these 
dots,  :n  part,  served  as  training  data.  The  clustering 
provided  both  the  number  of  normal  distributions  to 
fit  the  data  and  the  estimates  of  the  classification 
parameters.  In  addition.  Procedure  1 allowed  the 
analyst  to  correct  for  classification  errors  without 
reclassifying  a segment.  This  correction  process, 
when  viewed  statistically,  is  simply  a stratified  area 
estimate  in  which  the  classification  is  treated  as  a 


3Multitemporsl  classification  means  that  Lamb 't  measure- 
ments for  mullirlc  passes  are  concatenated  to  effect  « .lulttvsri- 
ate  classification.  Unitemporal  classification  means  t unjust  one 
pan  of  Landsat  data  is  used. 
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stratification  of  the  segment  into  potential  wheat  and 
non  wheat  strata. 

To  implement  Procedure  1,  it  was  necessary  to 
design  a suitable  clustering  algorithm.  However,  a 
year’s  experience  with  the  operation  of  Procedure  1 
indicated  that  a clustering  problem  still  existed  and 
this  precipitated  further  research.  This  time,  some- 
what more  sophisticated  approaches  were  taken 
which  considered  spatial  as  well  as  spectral  proper- 
ties of  Landsat  data. 

Other  research  that  was  incorporated  into  the  Pro- 
cedure 1 design  dealt  with  the  transformation  of 
Landsat  data  to  enhance  properties  related  to  crop 
growth.  Specifically,  a transformation  was  developed 
for  converting  the  Landsat  channels  to  variables  that 
quantified  soil  brightness,  the  green  development  of 
the  crop  canopy,  and  senescence  or  yellowing  of  the 
crop  canopy.  This  transformation  was  used  to 
develop  numerical  displays  for  the  analyst  (only  the 
soil  brightness  and  greenness  variables  were  used)  to 
allow  him  to  track  crop  growth  and  relate  it  to  a crop 
calendar.  In  addition,  it  allowed  him  to  view  spectral 
space  and  deduce  clustering  properties  within  the 
measurement. 

Besides  the  problems  arising  from  Phases  I and  II 
of  LACIE  which  stimulated  research  leading  to  Pro- 
cedure 1 and  other  developments,  certain  problems 
were  known  to  exist  before  LACIE.  One  of  these  was 
that  whenever  an  area  estimate  is  obtained  using  an 
error-prone  classification  process,  errors  in  classifica- 
tion can  introduce  bias  into  the  estimate.  This  known 
problem  motivated  research  into  the  development  of 
area  estimation  methods  that  do  not  depend  on 
classification  or  that  attempt  to  remove  the  bias 
caused  by  classification.  In  fact,  one  of  the  concepts 
considered  in  the  early  research  was  later  incorpor- 
ated into  the  stratified  area  estimation  concept  used 
in  Procedure  1.  Another  problem  that  was  suspected 
to  exist  before  LACIE  and  that  was  confirmed  as  a 
problem  after  the  start  of  the  experiment  dealt  with 
the  efficient  use  of  machine  processing  in  such  an  ap- 
plication. Simply  stated,  the  problem  is  one  of  being 
able  to  classify  large  areas  accurately  with  only  a 
minimal  amount  of  training  data.  The  concept  came 
to  be  known  as  “signature  extension.”  In  LACIE, 
training  data  are  required  for  each  segment  to  be 
classified.  An  example  of  signature  extension  would 
be  a situation  in  which  only  one  of  five  segments  is 
used  to  obtain  training  data  (with  the  implication 
that  the  “signatures”  of  wheat  and  nonwheat  are  ob- 
tainable from  that  segment)  and  all  five  are  subse- 
quently classified.  In  fact,  such  an  attempt  was  made 


in  the  first  year  of  LACIE  and  it  resulted  in  failure. 
Several  approaches  to  the  problem  were  subse- 
quently considered.  The  final  approach,  which  was 
in  the  developmental  stage  at  the  end  of  the  third 
year  of  LACIE,  was  based  on  statistical  sampling 
concepts. 

This  paper  is  intended  to  be  an  overview  of  the 
major  research  that  was  conducted  during  the  3 years 
of  LACIE  to  solve  problems  associated  with  segment 
wheat  area  estimation.  Papers  detailing  the  mathe- 
matical notions  are  referenced.  The  research  topics 
that  have  been  alluded  to  previously  and  that  will  be 
coverc  ’ in  the  following  sections  are  Proportion 
Estimation  Clustering,  Feature  Extraction,  and  Sig- 
nature Extension. 


PROPORTION  ESTIMATION 

Proportion  estimation  considers  methods  that 
estimate  the  areal  proportion  of  a crop  type  in  a seg- 
ment. The  central  idea  is  to  obtain  estimators  that  are 
unbiased  or  at  least  are  asymptotically  unbiased. 
“Asymptotically  unbiased”  means  that  as  the  sample 
size  gets  larger,  the  bias  of  the  estimates  gets  smaller. 

If  crop  types  could  be  uniquely  identified  from  the 
Landsat  spectral  measurements,  then  unbiased 
estimation  of  crop  proportions  would  be  relatively 
simple.  An  approach  would  be  to  classify  each  pixel 
into  its  correct  crop  type,  tabulate  these  classifica- 
tions, and  divide  each  tabulation  by  the  total  number 
of  pixels  to  obtain  the  estimate.  The  problem  is  that 
crop  types  are  no'  "niquety  identifiable  from  Land- 
sat imagery,  or,  if  they  are,  an  error-free  identifica- 
tion method  has  not  been  found.  Consequently,  one 
must  develop  methods  that  account  for  these  errors. 
Fortunately,  statistical  methods  have  been  proposed 
to  deal  with  this  problem.  Some  of  these  methods 
were  known  before  the  start  of  LACIE;  others  were 
developed  during  the  course  of  the  project. 

To  obtain  an  intuitive  feel  for  the  nature  of  the 
problems  and  of  possible  solutions,  consider  the 
diagrammatic  explanation  illustrated  in  figure  I.  In 
this  figure,  weighted  wheat  and  nonwheat  probability 
densities  or  likelihoods  are  illustrated.  The  weighting 
consists  of  the  actual  proportions  of  crop  acreage  in  a 
segment.  Imagine  that  a maximum-likelihood 
classification  procedure  is  to  be  used  to  calculate  the 
proportions  of  crop  types  in  a segment.  (A  max- 
imum-likelihood classifier  is  used  in  LACIE.  For  dis- 
cussion, see  the  paper  by  Heydorn  et  al.)  Such  a 
classifier  would  partition  the  Landsat  segment  into 
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FIGURE  1.— Diagrammatic  Explanation  of  proportion  estima- 
tion. 


two  groups.  One  group  would  contain  all  measure- 
ments for  which  the  weighted  likelihood  of  being 
wheat  was  greater  than  that  of  being  nonwheat;  in 
the  other  group,  the  reverse  situation  would  hold. 
The  point  that  defines  this  partition  is  labeled 
“weighted  maximum-likelihood  decision  point”  in 
the  figure.  If  decisions  are  made  in  this  way,  errors  of 
omission  and  commission  will  occur.  If  these  omis- 
sion and  commission  errors  do  not  balance  out,  the 
final  tabulation  will  be  in  error.  This  error  is  called  a 
bias. 

One  approach  to  obtaining  an  unbiased  estimate  is 
to  estimate  the  amount  of  error  involved  and  in 
some  fashion  remove  it  from  the  final  answer. 
Another  approach  is  based  on  the  observation  that 
the  mixture  density,  which  is  simply  the  probability 
density  associated  with  each  pixel  of  Landsat 
measurements  without  consideration  of  crop  type,4 
is  (under  appropriate  assumptions)  a unique  mixture 
of  the  densities  associated  with  each  crop.  When  the 
crop  type  densities  are  known,  the  exact  proportions 
in  which  they  are  mixed  to  form  the  mixture  density 
can  be  computed. 

A number  of  methods  fall  under  the  two  ap- 
proaches. The  ones  that  appeared  to  fit  the  LAC1E 
application  best  are  mentioned  in  table  I.  Detailed 
description  of  each  method  can  be  found  in  the  paper 
by  Feiveson  entitled  “Estimating  Crop  Proportions 
From  Remotely  Sensed  Data."  To  illustrate  the  ideas 


involved,  consider  two  of  the  methods  mentioned  in 
table  I. 

First  consider  the  so-called  “CDF  mixture 
method."  The  basic  model  is 


F ■ «Fj  + (1  - oJFj 


where  F “ marginal  cumulative  distribution  func- 
tion (CDF)  corresponding  to  the  mix- 
ture densitys 

F|  ” CDF  corresponding  to  the  wheat  den- 
sity 

F2  “ CDF  correspond^  to  the  nonwheat 
density 

a ■*  proportion  of  wheat  area  in  the  seg- 
ment 

Given  that  F|  and  F2  can  be  estimated  through 
some  sort  of  a sampling  process,  in  which,  for  exam- 
ple, an  analyst  would  pick  and  label  examples  of 
wheat  and  nonwheat  pixels  from  a Landsat  CIR  im- 
age, an  estimate  of  a,  say  6,  would  be  the  value  of  £ 
that  minimizes 


\\m  - iFj(x)  - (i  - «£2(x)||2 


subject  to  the  constraint  that  0 ^ « 1.  Here, 

£;<x)  = |X(*i).£i(x2) K[xn)] 

fyx>  = |X(xl)  ^2(x2)  • ^2(x«)] 


where  x,  ■»  Landsat  measurement  for  the  /th  chan- 
nel, / — 1,2 n.  (The  symbol  “A"  denotes  "esti- 

mate of.”) 

In  this  approach,  Fean  easily  be  determined  to  a 
high  degree  of  accuracy  by  considering  as  many  pix- 
els in  the  segment  as  is  deemed  necessary,  since  crop- 
type  labeling  information  is  not  required.  On  the 
other  hand,  since  a knowledge  of  crop  types  is  re- 


4ln  more  elemeniary  terms,  (he  mixture  density  can  be 
viewed  as  that  obtained  by  simply  histograming  the  Landsat 
measurements,  where  a histogram  value  (probability  density)  is 
the  frequency  with  which  a given  pixel  occurs  in  the  scene. 


*The  marginal  cumulative  distribution  function  (or  CDF)  is 
the  indefinite  integral  of  the  density  function.  That  is,  Ftx)  - 
X-go  Ay)dv,  where  /is  the  density  of  one  channel  of  Landsat 
measurements. 
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Table  l. — Proportion  Estimation  Methods 


Method 

Description 

Responsible  institution 

Inverting  the  confusion  matrix 

Estimate  the  amission/commission  error  matrix  and  use  it  to  cor- 
rect for  bias 

University  of  Texas  at  Dallas 

Maximum-likelihood  estimate 
or  proportion 

Assume  normal  component  densities  and  maximize  the  likeli- 
hood of  the  mixture  distribution  with  respect  to  mixing  pro- 
portions 

University  of  Texas  at  Dallas 

Method  of  moments 

Estimate  the  proportion  of  component  moments  in  the  mixture 
moments 

Texas  A.  & M.  University 
(TAMU) 

CDF  mixture  method 

Estimate  the  proportion  of  component  marginal  cumulative  dis- 
tribution functions  (CDF's)  in  the  mixture  marginal  CDF's 

University  of  Texas  at  Dallas 

"BIN"  method 

Same  as  CDF  mixture  method  except  density  histograms  used  in 
place  of  CDF's 

Lockheed  Electronics  Com- 
pany  (LEC) 

Posterior  probability 

Treat  classification  as  a small-grains/non-small-grains  stratifica- 
tion and  estimate  small-grains  proportion  from  a stratified 
random  sample 

NASA  Johnson  Space  Center 
(JSC) 

quired  to  estimate  fj  and  F2,  their  estimates  require 
considerably  more  effort.  Consequently,  the  esti- 
mates of  F,  and  F>  are  generally  made  up  of  con- 
siderably fewer  samples  than  is  the  estimate  of  F. 
The  variance  of  & is  dominated  by  the  variance  of  Ft 
and  F2.  Hence,  in  practice.  & is  not  expected  to  be  ex- 
actly a,  the  true  proportion. 

Consider  the  method  called  “posterior  prob- 
ability" in  table  1.  This  method  is  the  one  used  in 
Procedure  1.  The  basic  idea  here  is  to  correct  for 
classification  error  through  a second  sampling  of  the 
segment.  More  appropriately,  however,  the  method 
can  be  considered  as  a two-stage  sampling  process.  In 
the  first  stage,  the  analyst  samples  the  segment  to  ob- 
tain a machine  classification  of  wheat  and  nonwheat. 
The  resulting  classification  map  is  treated  as  a 
stratification  of  the  segment  into  a potential  wheat 
stratum  and  a potential  nonwheat  stratum.  The 
analyst  then  samples  again  and  uses  the  sample  and 
the  stratification  to  complete  a stratified  area  esti- 
mate. In  the  second  sampling,  the  analyst  can  allo- 
cate his  sample  in  proportion  to  stratum  sizes.  This 
procedure  is  called  probability  proportional  to  size 
(PPS)  sampling.6  The  proportion  estimate  takes  the 
form 


where irt , - Ar  (analyst  decides  pixel  is 
wjieat/pixel  classified  wheat) 

*10  " A (analyst  decides  ,-tsel  is 
wheat/pixel  classified  nonwheat) 

|8  — /V  (pixel  classified  wheat) 

The  estimators  and  $)0  are  estimators  of  con- 

ditional probabilities.  These  conditional  probabilities 
are  often  called  posterior  probabilities.  The  term  0 is 
the  marginal  probability  of  the  classifier's  catling  a 
pixel  wheat.  It  is  obtained  simply  by  tabulating  the 
classification  results. 

Notice  that  the  posterior  probabilities,  the  A's.  can 
be  viewed  as  corrective  terms  to  the  machine  esti- 
mate. 0,  of  the  true  proportion.  Although  the 
posterior  probabilities  arc  not  directly  related  to  the 
omission  and  commission  errors  illustrated  in  figure 
l,  these  are  in  a sense  the  inverse  probabilities  associ- 
ated with  those  errors. 

As  with  the  other  proportion  estimation  methods, 
the  estimators  associated  with  analyst  interpretation, 
the  £'s.  are  costly  to  obtain  and  are  generally  ob- 
tained through  only  limited  sampling.  The  P deter- 
mined by  this  method,  however,  can  be  shown  to  be 
unbiased  and  has  an  associated  variance  that  is  lower 


^his  is  one  option  available  in  Procedure  I . A second  option 
called  poststratified  sampling,  which  was  the  one  used  in  LACIE, 
is  explained  in  the  paper  by  Heydom  el  al. 
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than  the  variance  that  could  be  obtained  through  the 
use  of  a second  sampling  without  the  benefit  of 
machine  classification. 

Each  of  the  methods  listed  in  table  I was  evalu- 
ated on  typical  LACIE  segment  data.  All  labeling, 
however,  was  done  from  ground-truth  information 
and  not  from  analyst  interpretation.  The  result 
showed  that  on  Landsat  data,  no  one  method  had  a 
clear-cut  advantage  (see  the  oaper  by  Feiveson  en- 
titled “Estimating  Crop  Proportions  from  Remotely 
Sensed  Data”).  The  posterior  probability  approach, 
however,  more  easily  fits  into  an  approach  in  which 
some  form  of  classification  is  used,  as  was  the  case  in 
the  first  two  phases  of  LACIE.  Since  there  was  a con- 
siderable body  of  knowledge  about  classification  ap- 
proaches, use  of  a classification  approach  was  con- 
tinued in  the  second  LACIE  design. 


CLUSTERING 

Clustering  was  used  in  LACIE  as  a means  of  auto- 
matically estimating  parameters  required  for  max- 
imum-likelihood classification.  In  particular,  the 
LACIE  classification  algorithm  was  based  on  the 
assumption  that  each  crop  type  could  be  statistically 
modeled  as  a linear  combination  of  normal  distribu- 
tions. By  first  clustering  a segment,  it  is  possible  to 
associate  each  cluster  with  a crop  type  by  analyst  in- 
terpretation methods  and  thereby  to  associate  a 
cluster  with  a crop-type  distribution.  The  means  and 
covariances  obtained  by  using  every  sample  in  a 
given  cluster  can  serve  as  estimates  of  the  required 
crop-type  means  and  covariances.  Admittedly,  these 
estimates  have  some  undesirable  properties;7 
however,  on  the  whole,  the  approach  is  feasible,8  as 
was  demonstrated  in  the  large-scale  experiments 
with  Procedure  I. 

Before  the  design  of  Procedure  l,  research  was 
started  to  develop  clustering  algorithms  that  would 


7Poims  from  a cluster  are  in  fact  samples  from  a truncated  dis- 
tribution This  means  that  the  estimate:-  made  on  the  basis  of  this 
truncated  distribution  or  the  mean  and  covariance  of  the  untrun- 
cated distribution  arc  likely  to  be  biased 

8 This  statement  is  not  intended  to  address  the  notion  of  using 
clustering  alone  as  a means  of  classification  This  concept  is  con- 
sidered in  the  paper  by  Kauth  and  Richardson  entitled  "Signature 

Extension  Methods  in  Crop  Area  Estimation." 


function  well  in  this  application.  The  first  approach 
was  simply  to  take  an  available  cluster  algorithm9 
which  clustered  point  spectral  data  and  refine  it  so 
that  it  would  produce  acceptable  clusters  without  re- 
quiring continual  adiustment  of  parameters.  This  ap- 
proach was  moderately  successful,  as  judged  by  the 
results  obtained  with  Procedure  1. 

To  develop  an  improved  algorithm,  two  funda- 
mentally different  approaches  were  taken.  One  was 
simply  to  use  point-clustering  ideas  but  add  addi- 
tional information  regarding  the  spatial  properties  of 
Landsat  agricultural  data.  For  the  most  part,  this  ad- 
ditional information  was  associated  with  the 
agricultural  field  structure  in  the  scene.  Algorithms 
of  this  type  will  be  referred  to  as  spatial  clustering 
algorithms.  The  other  approach  is  related  to  the  con- 
cepts explained  in  the  previous  section  on  proportion 
estimation.  That  is,  the  mixture  density  is  resolved 
into  its  component  probability  densities  in  the  hope 
that  these  component  densities  can  be  associated 
with  unique  crop  types. 

Table  II  is  a list  of  the  clustering  algorithms  that 
were  investigated  in  LACIE.  Detailed  explanations 
of  each  algorithm  can  be  found  in  the  references 
cited  in  that  table.  A synopsis  of  the  main  ideas 
follows. 


ISOCLS 

The  Iterative  Self-Organizing  Clustering  System 
(ISOCLS)  is  fundamentally  a point-clustering 
algorithm  in  that  each  pixel  of  Landsat  measure- 
ments is  grouped  with  some  other  pixel  without 
regard  to  the  coordinate  (spatial)  location  of  that  pix- 
el in  the  image.  A set  of  starting  points  (or  seed 
points)  is  given  and  each  pixel  is  assigned  to  the 
closest  (in  the  /|  or  “city-block”  metric)  seed.  This 
operation  forms  the  initial  set  of  clusters.  Means  and 
standard  deviations  are  estimated  for  all  clusters  in 
this  initial  set.  Next,  a sequence  of  operations  is  per- 
formed in  which  clusters  having  too  large  a standard 
deviation  in  any  given  coordinate  direction  are  split 
to  form  two  smaller  ones,  clusters  whose  means  are 
too  close  together  are  joined,  and  clusters  that  are  too 


**An  algorithm  called  ISOCLS  (Iterative  Self-Organizing 
Clustering  System)  was  available  at  the  Johnson  Space  Center.  !t 
was  the  one  used  in  Procedure  1 . 


Table  II.— Clustering  Algorithm 


Clustering 

algorithm 

Type  of 
clustering 

Responsible  institution 

ISOCLS8 

BCLUST** 

Spectral 

Spectral 

LEC 

Environmental  Research 
Institute  of  Michigan 
(ERIM) 

ECHOc 

Spatial 

Laboratory  for  Applications 
of  Remote  Sensing 

AMOEBA** 

Spatial 

TAMU 

CLASSYe 

Spectral 

JSC  (NRC)  and  LEC 

UHMLEf 

Spectral 

University  of  Houston 

*E  P.  Kan.  "The  JSC  Oiuienn*  Profram  1SOCLS  and  Its  Applications,"  LEC- 
0483,  Lockheed  Electronic*  Co  (Houston).  July  I97J 

bR.  J Kauth  and  W Richardson.  “Signature  Extension  Methods  in  Crop  Area 
Estimation.**  LACIE  Symposium 

CR  L.  Kettig  and  D A Landgrebc.  “Classification  of  Mullispcctral  Image  Data  by 
Extraction  and  Classification  of  Homogeneous  Objects,**  IEEE  Trans  Geoscience 
Electronics,  vol  GE-M.  no.  I.  Jan.  I9?t».  pp  19-26. 

Bryant.  “On  the  Clustering  of  Multidimensional  Pictorial  Data."  LACIE  Sym- 
posium 

eR.  K Lennington  and  M E Rassbach.  “CLASSY— An  Adaptive  Maximum 
Likelihood  Clustering  Algorithm."  LACIE  Symposium. 

'W  A Coberly  andC  L.  Wigmum.“UHMLE— Program  Description  User  Guide." 
Rep.  48.  Dept,  of  Math  . Un»v.  of  Houston,  Oct.  1975. 


small  are  combined  with  larger  ones.  Following  this 
operation,  the  mean  of  each  cluster  is  recomputed. 
Pixels  can  be  reassigned  to  the  nearest  mean  and  the 
process  repeated  as  many  times  as  desired.  The 
algorithm  can  therefore  compute  the  number  of 
clusters  in  a given  application. 


BCLUST 

The  BCLUST  or  “blob-clustering”  algorithm  in- 
cludes spatial  information  in  the  clustering  process 
by  augmenting  each  vector  of  Landsat  channels  with 
two  location  coordinates;  i.e.,  the  line  and  point 
numbers  of  the  pixel.  These  augmented  vectors  are 
clustered  into  groups  called  “blobs”  that  are  intended 
to  be  representative  of  agricultural  fields.  This  "blob- 
bing” is  done  by  comparing  a given  augmented  vec- 
tor with  the  means  of  established  blobs.  The  vector  is 
assigned  to  the  closest  blob  provided  its  distance  is 
less  than  a given  threshold.  If  the  distance  is  greater 
than  that,  then  the  vector  is  designated  as  the  mean 
of  a new  blob.  After  a new  vector  has  been  added  to  a 
blob,  a new  blob  mean  is  computed  using  all  vectors 
in  that  blob.  When  all  vectors  have  been  assigned  to 
blobs,  then  a similar  procedure  is  used  to  build 


clusters  from  blobs.  During  this  phase  of  the 
algorithm,  clusters  are  formed  by  grouping  the 
means  of  blobs.  A blob  mean  is  defined  as  the 
arithmetic  average  of  the  vectors  for  all  the  pixels  in 
that  blob.  The  spatial  coordinates  are  deleted  from 
the  vectors  during  this  calculation. 


ECHO 

Rather  than  operate  on  one  pixel  at  a time  as  do 
the  ISOCLS  and  BCLUST  algorithms,  the  algorithm 
ECHO  (Extraction  and  Classification  of  Homo- 
geneous Objects)  operates  on  four  pixels  at  a time, 
using  a “region-growing”  concept.  The  hope  is  that 
regions  (or  clusters)  will  resemble  agricultural  fields. 
Groups  of  four  pixels,  or  superpixels,  are  .ested  for 
homogeneity.  Nonhomogeneous  superpixels  are 
judged  to  be  on  field  boundaries  and  are  initially  ex- 
cluded. The  homogeneous  superpixels  are  con- 
sidered to  be  field  interior  pixels.  Next,  field  super- 
pixels  are  combined  into  regions.  Two  contiguous 
superpixels  are  grouped  if  they  pass  a statistical 
“goodness  of  fit”  test  (that  is,  a test  is  made  to  deter- 
mine whether  the  pixels  in  two  superpixels  come 
from  the  same  distribution).  If  two  superpixels  are 
grouped,  then  any  subsequent  comparisons  are  made 
between  contiguous  superpixels  and  the  existing 
group  or  cluster.  In  this  way,  clusters  grow  until  the 
test  fails.  Finally,  the  boundary  pixels  are  assigned 
using  maximum-likelihood  classification. 


AMOEBA 

AMOEBA  is  also  designed  around  spatial  cluster- 
ing principles.  Here,  however,  a region-growing  con- 
cept is  not  applied.  Rather,  a set  of  heuristically 
derived  rules  is  first  employed  to  single  out  boundary 
pixels  and  all  points  judged  very  near  these  bound- 
aries. The  complement  of  all  such  points  makes  up 
another  set  of  pixels  called  “patches.”  The  basic  hope 
is  that  these  patches  closely  resemble  agricultural 
fields  or  at  least  homogeneous  portions  of  fields.  Pix- 
els are  then  clustered  by  grouping  with  a set  of  patch 
means  using  the  same  comparative  logic  (nearest 
neighbor)  as  does  the  first  group  operation  in 
ISOCLS.  Clusters  are  then  tested  for  purity  using  a 
misclustering  criterion  that  considers  a clustering  er- 
ror to  have  occurred  if.  for  a given  pair  of  pixels,  (1) 
the  pixels  came  from  the  same  real  class  but  are 
clustered  differently  or  (2)  the  pixels  came  from 
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different  real  classes  and  are  in  the  same  cluster.  The 
error  given  by  (1)  can  be  tested  by  assuming  that  ail 
pixels  in  the  same  patch  come  from  the  same  real 
class.  This  assumption  is  reasonable  if  a patch  repre- 
sent s one  agricultural  field  (assuming  that  a field  can 
contain  only  one  crop  type).  The  examination  of  the 
error  given  by  (2)  is  not  as  straightforward.  Basically, 
the  idea  is  that  two  pixels  that  are  spectrally  distant 
ought  to  belong  to  different  real  classes.  After  the 
misdustering  criterion  has  been  computed,  the 
duster  with  the  most  errors  is  eliminated.  The  pixels 
in  the  deleted  cluster  are  combined  with  the  existing 
dusters,  again  using  j nearest  neighbor  logic.  The 
misdustering  criterion  is  again  computed  and  the 
process  U repeated.  The  final  clustering  is  that  which 
gives  the  lowest  value  for  the  misdustering  criterion. 


CLASSY 

The  CLASSY  algorithm  attempts  to  fit  the  total 
mixture  density  (where  mixture  density  is  as  defined 
in  the  section  on  proportion  estimation)  of  the  seg- 
ment pixels  by  normal  distribution  density  func- 
tions. The  number  of  distributions,  the  mean  and 
covariance  matrix  of  each,  and  the  mixing  propor- 
tions are  estimated.  Initially,  the  algorithm  attempts 
to  fit  one  normal  distribution  to  the  mixture  density 
by  using  iterative  maximum-likelihood  estimation 
procedures  to  estimate  the  mean  and  covariance 
matrix.  On  the  basis  of  higher  order  moments,  a test 
is  made  to  decide  on  the  goodness  of  the  fit.  If  the 
test  fails,  two  normal  distributions  are  tried.  The 
parameters  of  the  two  new  distributions  are  obtained 
by  fitting  to  the  first  through  the  fourth  moments  of 
the  original  parent  cluster.  This  splitting  operation 
may  be  repeated.  Periodically  during  the  computa- 
tions, the  choice  to  split  is  reexamined,  and,  as  a 
result,  the  parent  cluster  may  be  restored.  When  the 
choice  is  made  to  fit  with  more  than  one  normal  dis- 
tribution, the  mixing  proportions  (prior  prob- 
abilities) are  available,  as  they  have  been  estimated 
using  maximum-likelihood  estimation  methods. 
Thus,  in  an  inventory  application  (like  LACIE), 
CLASSY  will  estimate  crop-type  proportions  pro- 
vided a crop-type  label  is  assigned  (through  analyst 
interpretation  or  otherwise)  to  each  cluster. 


UHMLE 

The  University  of  Houston  Maximum-Likelihood 


Estimation  (UHMLE)  algorithm  is  similar  to 
CLASSY  except  that  the  number  of  component  dis- 
tributions to  fit  the  mixture  must  be  specified.  No 
splitting  or  joining  of  the  clusters  is  attempted.  Max- 
imum-likelihood iteration  is  used  in  estimating  the 
proportions  and  the  mean  and  covariance  matrix  of 
each  cluster. 


Evaluation 

Extensive  testing  of  the  algorithms  other  than 
ISOCLS  has  not  yet  been  done.  (For  details  of  tests 
done  on  ISOCLS,  see  the  paper  by  Wheeler  et  al.  en- 
titled “An  Evaluation  of  Procedure  1.”)  However,  it 
is  reasonable  to  believe  that  either  the  spatial  variety 
or  the  distributional  variety  (e.g.,  CLASSY)  would 
offer  significant  improvement  just  as  a basic 
classifier  of  Landsat  data.  Since  agricultural  fields  are 
very  likely  to  contain  the  same  crop  type  (at  least, 
this  is  normally  the  case  in  the  United  States),  the 
spatial  algorithms  should  clear  up  a substantial 
amount  of  spectral  confusion.  The  distributional 
variety  is  theoretically  the  ideal  type  of  algorithm  for 
an  inventory  application.  Indeed,  these  algorithms 
can  directly  estimate  the  proportion  of  a crop  type 
using  principles  similar  to  those  discussed  in  the  first 
section  of  this  paper.  It  remains  to  be  seen  whether 
or  not  the  assumptions  related  to  this  theory  hold  in 
real  applications  and  whether  or  not  the  algorithms 
are  sufficient  to  withstand  violations  of  the  assump- 
tions. 


FEATURE  EXTRACTION 

In  the  initial  phases  of  LACIE,  research  in  feature 
extraction  topics  was  pursued  along  traditional  lines 
found  in  the  pattern  recognition  literature;  that  is, 
transformations  of  Landsat  data  were  sought  for  con- 
verting the  Landsat  channel  data  to  a new  set  of 
variables,  called  features,  which  preserved  some 
desirable  statistical  property.  These  properties  were 
expressed  in  terms  of  criterion  functionals  (e.g., 
Bhattacharyya  coefficient  or  divergence)  related  to 
the  probability  of  correct  classification.  In  each  case, 
a transformation  was  developed  which  would  best 
preserve  the  probability  of  correct  classification. 

The  motivation  for  this  research  was  to  reduce  the 
number  of  Landsat  measurements  that  must  be  con- 
sidered in  clustering  and  classification  operations.  In 
a multitemporal  application,  as  many  as  16  Landsat 
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channel  measurements  make  up  a pattern  vector. 
Thus,  a multitemporal  clustering  application  can  re- 
quire a prohibitive  amount  of  time  if  done  on  con- 
ventional computers.  However,  later  in  LACIE, 
parallel  processor  computing  devices  were  purchased 
and,  thus,  the  motivation  for  further  research 
vanished. 

Two  accomplishments,  however,  stand  out  in  this 
research.  One  was  that  a method  was  derived  for 
computing  the  linear  transformation  that  converts 
the  Landsat  measurement  variables  to  a single  varia- 
ble in  such  a way  as  to  best  preserve  the  probability 
of  correct  classification.  That  is  to  say,  if  X is  a vector 
of  random  variables  of  Landsat  measurements,  a 
vector  b is  found  so  that  Y * b'X  is  a new  random 
variable  for  which  the  probability  of  correct 
classification  obtained  using  Y in  place  of  X is  as 
close  as  possible  to  that  obtained  using  X.  The  other 
accomplishment  is  related  to  the  efficiency  of  com- 
puting features.  Basically,  a method  was  developed 
for  computing  the  best  linear  transformation  (in 
terms  of  a given  criterion  functional)  by  constructing 
the  transformation  from  a set  of  Householder 
transformations.  For  details  of  these  two  ac- 
complishments, see  the  paper  by  Decell  and  Guse- 
man  entitled  “Linear  Feature  Selection  With  Ap- 
plications." 

In  later  phases  of  LACIE,  a different  concept  was 
developed  for  deriving  features.  Rather  than  using 
statistical  criteria  for  their  definition,  criteria  related 
to  crop  growth  were  proposed.  Here,  the  basic  plan 
was  not  to  minimize  (or  maximize)  some  functional, 
as  was  explained  previously,  but  rather  to  develop 
transformations  heuristically.  Table  III  contains 
descriptions  of  the  major  transformations.  Of  those 
listed,  the  one  that  transforms  the  Landsat  measure- 
ments to  brightness,  greenness,  yellowness  varia- 
bles— the  “tasseled-cap"  transformation — has 
received  the  most  attention. 

Figure  2 illustrates  the  basic  ideas  involved.  The 
figure  shows  that  the  Landsat  spectral  values  in  an 
agricultural  scene,  containing  vegetation  at  different 
stages  of  development,  will  occupy  a region  (in  the 
Landsat  measurement  space)  that  resembles  (ac- 
cording to  Kauth  et  al.  in  the  paper  entitled  “Feature 
Extraction  Applied  to  Agricultural  Crops  as  Seen  by 
Landsat”)  a tasseled  cap.  The  “sweatband”  part  of 
the  hat  contains  what  is  called  the  “plane  of  soils" 
and  is  essentially  the  region  describing  soil  bright- 
ness. The  brighter  the  soil,  the  larger  the  radius.  As  a 
crop  develops  and  covers  more  of  the  soil  with  green 
vegetation,  the  spectral  values  move  upward  along 


the  greenness  axis.  Once  the  crop  begins  to  yellow 
(or  otherwise  lose  its  green  appearance),  a compo- 
nent of  motion  along  the  yellowness  axis  is  estab- 
lished. 

The  axes  corresponding  to  this  linear  transforma- 
tion were  derived  largely  by  empirical  means 
through  examination  of  Landsat  data  and  data  on 
soil  color.  While  an  underlying  theory  has  as  yet  not 
been  found,  it  appears  that  the  transformation  does 
bring  out  crop  growth  properties  within  Landsat 
data. 


SIGNATURE  EXTENSION 

It  would  be  desirable  in  an  inventory  application 
such  as  LACIE  to  automate  the  process  as  much  as 
possible  so  that  only  a few  individuals  (analysts) 
could  inventory  an  entire  country.  In  LACIE,  it  was 
believed  that  this  automation  could  be  accomplished 
by  computer  analysis  methods  whereby  an  analyst 
could  examine  a small  amount  of  Landsat  data  and 
thereby  train  a computer  to  recognize  wheat  over 
some  large  area. 

Initially,  the  concept  was  implemented  and  tested 
in  a very  rudimentary  way.  Within  a group  of  LACIE 
segments,  one  was  selected  for  obtaining  the  training 
data,  and  then  the  training  segment  and  its  four 
nearest  neighbors  were  classified  on  the  basis  of  that 
training.  It  was  soon  discovered  that  regional  effects, 
such  as  soil  background  or  variations  in  cropping 
practices,  as  well  as  atmospheric  effects  modified 
spectral  appearance  to  the  point  where  such  an  ap- 
proach failed. 

The  second  major  attempt  sought  to  develop 
methods  to  remove  the  atmospheric  and  possibly  the 
regional  effects.  Some  success  was  achieved  in 
developing  a method  that  adjusted  for  atmospheric 
haze.  However,  when  other  perturbations  not  due  to 
haze  were  present,  as  is  almost  always  the  case,  these 
methods  also  failed.  (See  the  paper  by  Minter  en- 
titled “Methods  of  Extending  Crop  Signatures  From 
One  Area  to  Another"  for  a detailed  discussion  of 
the  approaches  considered.) 

One  of  these  approaches  is  illustrated  in  figure  3. 
Essentially,  the  idea  was  to  pair  two  segments  and 
normalize  one  with  respect  to  the  other  by  a cluster 
matching  process.  The  belief  was  that  crop  types 
would  produce  a unique  cluster  pattern  and  that  if 
the  same  crop  types  existed  in  two  segments,  the 
cluster  patterns  would  only  be  shifted,  one  with 
respect  to  the  other,  by  the  operation  of  some 
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Table  111.— Transformations  on  Landsat  Data a That  Enhance  Crop  Growth  Characteristic s6 


Transformation 


Description 


Comments 


Tasseled-Cap  Transforma- 
tion 


B is  intended  to  be  a soil  brightness  variable;  G is 
intended  to  be  a variable  that  measures 
green  biomass  development  (i.e.,  a vegeta- 
tive index  variable);  Y is  intended  to  be  a 
variable  that  measures  crop  “yellowing";  Alls 
called  “non-such"  since  few  or  no  crop 
development  characteristics  are  measured  by 
this  variable;  X/  is  the  Ah  Landsat  channel 
measurement,  I m 1,2, 3, 4. 


Transformed  Vegetation 
Index  (TVI) 


V 


X4  - X 


X4  ♦ X 


1 

2 


♦ 


2 


Channel  4 — channel  2 appears  to  be  related  to 
green  development.  The  sum  of  these  chan- 
nels is  used  as  a normalizing  factor  and  the 
1/2  is  simply  a constant  to  ensure  that 


Transformed  Vegetative 
Index  (TVI6) 


TVI6  merely  substitutes  channel  3 for  channel  4. 


Differenced  Vegetative  In- 
dex (DVI) 


24X4  Xj 


Ashburn  Vegetative  Index 
(AVI) 


2X4  Xj  if  2X4  Xj  ^ 0 
0 if  2X4  - Xj  < 0 


DVI  again  measures  a difference  between  chan- 
nels 4 and  2.  The  constant  2.4  is  intended  to 
adjust  the  index  for  soil  brightness;  i.e.,  soils 
should  give  a DVI  value  of  very  near  zero. 

AVI  is  very  similar  to  DVI  without  the  soil  line 
adjustment. 


Ratioed  Vegetative  Index 
(RV1) 


RVI  has  some  properties  similar  to  those  of 
TVI.  For  soils,  this  may  be  roughly  constant 
at  a value  of  about  2.4. 


Perpendicular  Vegetative 
Index  (PVI) 


Perpendicular  Vegetative 
Index  (PVI6) 


V(s2  Xj)2  ♦ (s4  X4)2 

where  Sj  = 0.851Xj  ♦ 0.3S5X4 
S4  * 0.355Xj  ♦ 0.148X4 

yl(s2  s2y*(s3  x3y 


PVI  is  intended  to  measure  the  green  develop- 
ment that  occurs  along  the  perpendicular  to 
the  soil  brightness  line.  S,  and  S(  are  in- 
tended to  measure  the  soil  reflectance. 


PVI6  is  similar  to  PVI  with  channel  4 replaced 
by  channel  3. 


where  Sj  - 0.498  ♦ 0,$43Xj  ♦ 0.498Xj 

S3  * 2.734  ♦ 0.498Xj  * 0.457Xj 


'Transformation  coefficients  apply  to  1 andaat-2  data. 

bSee  A.  J Richardson  and  C.  L Wiegand.  “Distinguishing  Vegetation  From  Soil  Background  Information."  Photogram  Eng  St  Remote  Sens . vol.  43.  no  12.  Dec.  1977, 
PP  1541-1552 
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unknown  affine  transformation.10  The  transforma* 
tion  could  be  obtained  by  attempting  to  ‘’overlay" 
the  cluster  pattern  from  one  segment  onto  the  next, 
as  shown  in  figure  3.  Several  algorithms  were 
developed  for  implementing  this  concept.  The  first 
was  called  MASC  and  subsequent  refinements  were 
called  CROP-A,  ROOSTER,  and  OSCAR.1 1 

Even  though  spectral  crop*type  observations  are 
not  stable  from  segment  to  segment,  it  may  be  that 
observations  from  a collection  of  segments  maintain 
their  discriminating  properties.  This  concept  is  il* 
lustrated  in  figure  4.  In  that  figure,  the  collection  of 
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FIGURE  3.— One  approach  lo  signature  extension  based  on  a 
recognitlon-segment-lo-training-segment  transformation,  (a) 
Original  clusters  from  the  training  (open)  and  recognition 
(hatched)  segments,  (b)  Recognition  segment  clusters 
transformed  to  conform  to  training  segment  clusters. 


PROPOSED 

SIGNATURE  EXTENSION 
BASED  ON  STATISTICAL 


FIGURE  4.— Contrast  between  original  and  proposed  signature 
extension  concept. 


points  labeled  T0  is  meant  to  represent  the  spectral 
values  from  one  segment.  It  is  seen  that  the  straight 
line  indeed  separates  wheat  from  nonwheat  in  that 
segment  but  not  in  alt  others.  However,  if  data  are 
taken  from  more  than  one  segment,  such  as  from  the 
segments  labeled  T,  then  it  becomes  apparent  that 
one  curved  line  could  separate  the  wheat  from  non* 
wheat  in  all  the  segments. 

When  dealing  with  LACIE  segments,  this  concept 
would  imply  that  one  should  be  able  to  find  a small 
number  of  segments  which,  when  pooled,  would 
serve  as  training  segments  for  classifying  the  other 
segments.  “Pooling"  means  that  several  training  seg- 
ments would  be  grouped  and  in  effect  this  group 
treated  as  a larger  segment  in  a training  process.  If. 
for  example,  a classification  of  an  entire  Landsat 
frame  is  desired,  then  it  may  be  possible  to  find  a 
very  smalt  portion  of  that  frame  that  would  serve  as 
the  training  area.  Basically,  the  approach  is  similar  to 
the  one  used  to  classify  a segment;  i.e.,  a training 


10An  affine  transformation  is  one  that  would  rotate,  contract 
or  expand,  and  translate  the  Landsat  measurement  vectors.  Thus, 
if  X is  a vector  of  Landsat  measurements.  ,4X  + b (where  4 is  a 
matrix  and  b a vector)  is  the  mapping  resulting  from  the  affine 
transformation  4(-)  + b. 

''MASC  (Multiplicative  and  Additive  Signature  Correction) 
and  CROP-A  (Cluster  Regression  Ordered  on  Principal-Axis) 
were  developed  by  ERIM;  ROOSTER  (Rank  Order  Optimal 
Transformation  Estimation  Routine)  and  OSCAR  (Optimal  Sig- 
nature Correction  Algorithmic  Routine)  were  developed  by  LEC. 
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sample  is  taken  from  that  segment  and  used  to 
classify  the  entire  segment.  The  difference  is  that  in 
the  signature  extension  approach,  much  more  varia- 
tion in  spectral  values  can  be  expected  when  the  area 
of  interest  begins  to  expand  from  the  size  of  a seg- 
ment. 

To  cope  with  this  variation  in  spectral  values,  at 
least  two  things  can  be  done.  One  is  first  to  partition 
the  area  to  be  estimated  into  strata  in  which  the 
variation  is  expected  to  be  small.  Thus,  one  approach 
for  constructing  such  strata  would  be  to  single  out  all 
environmentally  static  variables  thought  to  affect 
Landsat  reflectances,  estimate  their  effects  (using, 
for  example,  regression  or  analysis  of  variance  ap- 
proaches with  Held  spectral  mean  as  the  dependent 
variable),  and  then  establish  areas  in  which  changes 
in  these  variables  are  small.  These  areas  are  then  the 
strata.  This  approach  was  in  fact  taken  and  is  dis- 
cussed in  detail  in  the  paper  by  Thomas  et  al.  entitled 
“Development  of  Partitioning  as  an  Aid  to  Spectral 
Signature  Extension.” 

The  so-called  dynamic  effects,  such  as  those  due 
to  changes  in  atmospheric  haze,  should  also  be 
minimized.  One  such  transformation,  which  uses 
only  Landsat  data,  was  developed  to  rescale  the  data 
to  minimize  haze  distortion.  The  resulting  algorithm, 
called  XSTAR,  applies  an  affine  transformation. 
Ah)  + b,  where  A is  a matrix  and  b a vector.  The  ele- 
ments of  A and  b are  exponential  functions  of  a 
variable  that  depends  on  the  shift  in  yellowness  (see 
the  previous  section  on  feature  extraction).  It  was 
determined  empirically  that  movement  along  the 
yellowness  axis  in  a given  segment  is  correlated  with 
changes  in  atmospheric  haze,  and  it  is  this  observa- 
tion that  is  exploited  in  the  XSTAR  transformation. 

Given  a stratification,  the  problem  that  remains  is 
to  select  a small  number  of  samples  that  can  be  used 
to  obtain  an  estimate  of  the  wheat  area  in  a stratum. 
The  approach  (discussed  in  the  paper  by  Kauth  and 
Richardson  entitled  “Signature  Extension  Methods 
in  Crop  Area  Estimation”)  is  based  on  first  cluster- 
ing the  segments  into  spectral  groups  and  applying  a 
probability  proportional  to  size  sampling  to  this 
grouping.  In  PPS  sampling,  a number  of  samples  are 
allocated  to  each  group  in  proportion  to  the  size  of 
the  group.  For  the  approach  to  be  efficient,  both 
spectral  homogeneity  and  the  random  mix  of  wheat 
area  within  the  agricultural  area  (wheat  area 
homogeneity)  should  be  present  within  a given 
stratum.  Thus,  the  spectral  stratifications  discussed 
previously  should  be  intersected  with  an  area 
stratification.  Such  area  stratifications  are  discussed 


in  the  paper  by  Hallum  and  Basu  entitled  “Natural 
Sampling  Strategy." 

The  overall  approach  (i.e.,  area  stratification,  haze 
correction,  and  segment  selection)  has  yet  to  be  dem- 
onstrated. Each  of  the  three  elements  has  been  sepa- 
rately studied. 


CONCLUSIONS 

The  Large  Area  Crop  Inventory  Experiment 
began  with  concepts  that  could  be  traced  back  to 
early  mapping  approaches.  Here,  the  basic  idea  is  to 
classify  a given  area  as  accurately  as  possible.  The 
need  to  meet  LACIE  objectives  with  Landsat-type 
data  motivated  ideas  related  to  making  an  inventory 
of  an  area  without  the  use  of  classification  (or  deal- 
ing with  classification  error  to  remove  estimator 
bias)  while  minimizing  the  number  of  manual  in- 
terpretations required  by  a signature  extension  ap- 
proach. Although  many  of  these  were  not  demon- 
strated in  the  large-scale  experiment,  significant 
progress  was  made  in  the  supporting  research  pro- 
gram which  can  be  applied  to  funre  designs. 

As  is  generally  the  case,  however,  research  leads 
to  yet  more  research,  and  LACIE  is  no  exception.  It 
would  appear  that  Landsat- 1 and  Landsat-2  data  do 
not  contain  enough  information  to  discriminate  be- 
tween crop  types  perfectly  all  the  time  and  therefore 
a basic  problem  arises  when  no  ground-truth  data  on 
crop  types  in  an  area  are  available.  LACIE  attempted 
to  use  analyst  interpretation  to  supply  this  ground 
truth,  but  labeling  errors  resulted.  Depending  on  the 
area  and  the  time  of  year,  these  errors  could  be  large. 
It  would  appear  then  that  new  approaches  are  needed 
to  reduce  labeling  error.  Perhaps  better  use  of  multi- 
year Landsat  data,  a more  detailed  understanding  of 
the  cropping  practices  in  an  area,  better  crop  calendar 
prediction,  and  a better  understanding  of  the  limiting 
sources  of  error  in  Landsat  data  related  to  crop  dis- 
crimination may  provide  the  insight  required  to 
develop  improved  designs. 

If  analyst  labeling  errors  cannot  be  reduced  to 
sufficiently  low  levels,  then  methods  considerably 
different  from  those  used  in  Procedure  1 need  to  be 
considered.  One  such  method  might  be  centered 
around  the  use  of  a LIST12  concept,  in  which  the 


,2list  it  an  acronym  for  Label  Identification  by  Statistical 
Tabulation  (see  the  paper  by  Pore  and  R.  Abotteen  entitled  “A 
Programed  Labeling  Approach  to  Image  Interpretation”). 
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analyst  isj  r«juii#dtqn!y  to  answer  questions  for 
which  consistently,  accurate  responses  are  possible. 
These  responses  would  be  used  integrally  in  machine 
processes  that  also  consider  Landsat  and 
meteorological  data.  Another  approach  might  be  to 
use  some  of  the  mixture  distribution  fitting  methods 
discussed  previously.  In  such  methods,  the  analyst 
may  only  be  required  to  select  the  appropriate  com* 
ponent  distributions  to  be  used  from  a “bank”  of 
possible  distributions. 


It  may  be  that  a satellite  system  that  provided  data 
with  higher  spatial  and  spectral  resolution  and  that 
greatly  increased  the  number  of  looks  at  a crop  dur* 
ing  its  development  cycle  would  significantly  in* 
crease  accuracy  when  coupled  with  the  same  basic 
processing  concepts  used  at  the  beginning  of  L ACIE. 
However,  throughout  LACIE,  the  desire  has  been  to 
design  the  most  accurate  processing  approach  possi* 
ble  for  a given  sensor  system.  It  is  expected  that 
future  research  will  be  motivated  by  the  same  desire. 
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Estimating  Crop  Proportions  From 
Remotely  Sensed  Data 

A.  H.  Feiveson* 


The  golden  wheat  doth  flower  forth  throughout  the  LACtE  segment 
While  eyes  of  Landsat  scrutinise  from  Infrared  to  green. 

The  Imagery  Is  analysed  and  cluster  maps  art  made 
So  that  each  pixel's  classified  by  looking  at  Its  sheen. 

The  ones  called  wheel  are  counted  19  to  tell  how  much  is  there, 

But  evil  bias  twists  our  aim— we  know  not  how  we  fare! 

What  should  we  do?  h would  be  nice  to  classify  the  pixels. 

But  !o!  Our  acreage  estimate  Is  subject  to  distortion; 

Hence  thus  we  must  new  methods  try.  For  after  all.  In  LACtE, 

The  goal  we  seek  Is  nothing  but  to  know  the  wheat  proportion! 


INTRODUCTION 

In  LACIE.  wheel  acreage  is  estimated  by  the  sam- 
ple survey  approach;  i.e.,  the  proportion  of  wheat  is 
directly  estimated  Tor  each  oT  a number  of  5 • by  6- 
nautical-mile  sample  segments  and  then  used  in  an 
aggregation  to  obtain  « large  area  estimate.  With  a 
reasonable  sample,  this  method  stands  or  Tails  on  the 
accuracy  of  the  wheat  proportion  estimates  for  the 
individual  segments. 

The  standard  approach  until  recently  has  been  to 
have  analysts  label  data  to  train  a maximum  likeli- 
hood classifier,  then  classify  every  pixel  in  the  seg- 
ment and  use  the  fraction  classified  as  wheat  for  the 
wheat  proportion  estimate.  Although  it  is  intuitively 
appealing  to  be  able  to  classify  the  pixels  as  wheat  or 
nonwheat,  such  an  enumeration  is  not  required  in 
LACIE— ell  that  is  needed  is  an  estimate  of  the  pro- 
portion of  wheat  in  each  segment.  Because  the 
classification/pixel-count  method  is  theoretically 
biased  even  if  all  distributional  assumptions  are  met, 
a search  for  alternative  ways  to  estimate  crop  propor- 
tions has  been  initiated.  In  this  paper,  some  of  these 
methods  are  described  and  some  initial  test  results 
on  intensive  site  data  are  presented. 
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STATEMENT  OF  THE  PROBLEM— 

THE  PIXEL'COUNTtNG  TECHNIQUE 

Let  (x,)  be  a set  of  p-dimensional  measurement 
vectors  corresponding  to  the  V pixels  in  a LACIE 
segment.  In  the  rest  of  this  paper,  it  will  be  assumed 
that  the  vectors  x,  are  independent  observations 
sampled  from  a mixture  population  with  class  dis- 
tribution function  (CDF) 


m 

f\*)  • £ a.FAn).  (1) 

/-I 


where  ay  is  the  proportion  of  the  segment  in  ground- 
cover  class  J.  defined  as 


<v >• 
/ 


and  fj(u)  is  the  corresponding  CDF  for  the yth  class. 
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The  most  general  problem  ie, “Given  1^)*..,  esti- 
mate |ay)jL,  without  any  knowledge  of  m or  //a).*’ 
Since  this  problem  cannot  be  solved  without  further 
assumptions,  one  of  the  following  cases  is  usually 
assumed. 

1.  The  number  of  classes  m is  known  (or  at  least 
an  upper  bound  is  known)  and  fyx)  is  from  a known 
identifiable  family,  such  as  a rot  of  distinct  n:  *lti  vari- 
ate normal  distributions. 

2.  The  number  of  classes  m is  known,  a set  of  ob- 
servations from  each  distribution  Fji»)  is  available 
(i.e.,  training  data),  and  the  functions  f/x)  are 
assumed  to  be  members  of  some  identifiable  family, 
usually  known  up  to  a set  of  parameters. 

With  respect  to  both  cases,  basically,  a class  of 
finite  mixing  distributions  is  Identifiable  if  and  only 
if 

M N 

E aiFt(*)  • E a/G/(») 

r- 1 i*  i 


for  all  values  of  x implies  that  M " N and  that,  for 
each  / (I  * /<  N),  there  issom ejil)  (I  *JO)  ^ 
N)  such  that  a,  ■ a'/i)  and  F{*)  ■ GAi)ix).  See 
Teicher  (ref.  1)  for  more  deuils. 

The  LACIE  investigators  are  forced  to  work 
under  the  framework  of  case  2.  The  Fj  values  are  not 
known  in  advance;  they  are  estimated  from  training 
samples  chosen  by  analysts.  Since  the  designation  of 
training  data  as  being  from  a particular  class  is  sub- 
ject to  error  and  since  the  training  data  set,  even  if 
designated  correctly,  represents  only  a relatively 
small  sample  from  its  parent  population,  the  result- 
ing estimates  of  Fji x)  are  not  always  reliable. 
However,  until  large  signature  banks  are  built  up  and 
models  for  adjusting  distributions  for  haze.  Sun 
angle,  etc.,  are  functional,  there  is  no  choice  but  to 
estimate  the  Fy values  from  training  data.  One  of  the 
rules  of  LACIE  has  been  that  no  ground  data  can  be 
used.  If  this  restriction  were  relaxed,  one  could 
guarantee  that  training  data  were  from  the  correct 
class;  however,  since  LACIE  was  designed  to  operate 
in  foreign  areas  where  no  ground  truth  is  available,  it 
would  not  be  realistic  to  assume  errorless  labeling  of 
training  data. 

The  determination  of  m.  the  number  of  classes,  is 
itself  an  interesting  problem.  Since  the  training  data 
in  a segment  are  insufficient  to  estimate  the  class  dis- 
tributions nonparametrically.  it  is  assumed  that  Ff 
represents  either  a parametric  family  of  multivariate 


normal  distributions  or  mixtutes  of  multivariate  nor- 
mal dictributions  with  knowr  mixing  weights. 
Which  model  is  used  depends  on  whether  m is  taken 
as  the  total  number  of  ground-cover  classes  (includ- 
ing all  subclasses  of  wheat  and  nonwheat)  or  is  equal 
to  2 (wheat  versus  nonwheat). 

In  the  first  case,  where  m is  taken  to  be  the  total 
number  of  ground-cover  classes  (i.e.,  the  multiclass 
model),  distribution  of  theyth  class  is  assumed  to  be 
multivariate  '.tormal,  with  mean  n,  and  covariance 
matrix  i fhe  density  function  lor  theyth  class  is 
given  by 

J }(*)  ■ — j-J— - «P  J-  3<*  - P)r2  '(*  - *oJ 

IXI*(2»)J  (2) 

in  the  second  case  (m  * 2).  there  are  only  two 
“classes,"  wheat  and  nonwhest;  however,  since  there 
is  bound  to  be  a large  variety  in  the  signatures  of  non- 
wheat (other  crops,  nonagriculture,  etc.),  it  would  be 
unrealistic  to  assume  a multivariate  normal  distribu- 
tion for  all  nonwheat,  instead,  it  is  assumed  tnat  the 
nonwheat  is  itself  a mixture  of  subclasses,  each  of 
which  is  multivariate  normal. 

A similar  model  is  used  for  wheat  to  take  into  ac- 
count the  different  varieties,  growth  stages,  etc.,  of 
wheat  found  in  a segment.  As  a consequence,  the 
class  density  functions  are  given  by 

m, 

/,(x)  - £ <*,*/,*(«)  <?8) 

A * I 


and 


m 2 

W E •»/*«■  <3bl 

A -I 

where  /j(x)  and  /2(x)  arc  the  density  functions  for 
wheat  and  nonwheat,  respectively;  L (x)  is  the  den- 
sity function  for  the  Ath  subclass  within  theyth  class; 
and  |aM)  and  |a2J  are  the  respective  mixing 
weights. 

Each  approach  has  problems.  In  the  first  case 
(m  - number  of  subclasses),  it  is  not  obvious  what  m 


634 


should  be.  Present  procedures  employ  a standard 
clustering  algorithm  in  an  attempt  to  define  m and 
break  up  the  data  into  subclasses.  In  the  second 
method,  m is  obviously  equal  to  2,  hut  clustering  is 
still  necessary  to  define  subclasses  needed  for  the 
estimation  of  fjk.  Furthermore,  the  coefficients  aJk 
are  not  known  and,  for  lack  of  better  information, 
are  assumed  equal  within  a main  class;  i.e.,  alk  *■* 
«/*(>“  1,2). 

Once  classes  and/or  subclasses  are  defined,  esti- 
mates of  the  density  functions  fjix)  are  obtained 
from  training  data,  then  all  pixels  in  the  segment  are 
classified  by  the  maximum  likelihood  classification 
rule;  i.e.,  a pixel  with  measurement  vector  x is 
classified  as  class  j if 

A 

ff(x)  = max 


The  proportion  of  the  pixels  classified  as  “wheat”  or 
os  a subclass  of  “wheat”  is  then  taken  to  be  the  wheat 
proportion  estimate. 

The  accuracy  of  this  or  any  proportion  estimation 
method  will  depend  to  some  extent  on  how  well  the 
class  distributions  are  estimated.  Some  methods, 
however  (including  classification/pixel  counting), 
are  theoretically  biased  even  if  the  fs  values  are 
known.  What  is  sought  here  are  procedures  that 
theoretically  are  relatively  unbiased  and  fairly  insen- 
sitive to  errors  in  estimates  of  class  distributions,  so 
that  reasonably  accurate  crop  proportion  estimates 
can  still  be  made  in  the  context  of  LACIE-type 
applications. 


Bias  of  Pixel  Counting 

The  present  LACIE  procedure  of  counting  pixels 
classified  as  wheat  will  be  called  “pixel  counting”  or 
“PC.”  In  this  section,  it  will  be  shown  that  PC  is 
biased  even  if  the  density  functions  fj{x)  are  known. 
To  do  this,  consider  the  sample  space  7 of  all  possi- 
ble measurement  vectors  x.  In  maximum  likelihood 
classification,  with  continuous  density  functions,  7 
is  broken  into  disjoint  (except  for  sets  of  measure 
zero)  regions  Rk  such  that 


Define  conditional  probabilities  by 


Pi!  “ fR  Mfi*  <«> 


i.e.,  Py  is  the  probability  of  an  observation  from  class 
j being  classified  as  class  / (/,  j — 1, . . . , m). 

If  a,  is  the  (unconditional)  probability  of  classify- 
ing a pixel  into  class  i,  it  follows  that 


m 

= £ Pij°r 
/=! 


or 

e = Pa,  (5) 

where  e is  the  m x | vector  (e(,  . . . , em ) T and  P is 
the  m x m matrix  of  the  probabilities  ptJ.  It  thus 
follows  that,  if  Bw  is  the  proportion  of  pixels 
classified  as  wheat,  then  its  expectation  is  not  a^, 
the  true  proportion  (where  the  subscript  w denotes 
wheat),  but  is  instead  equal  to 


£ v*/’ 

/ 


even  if  the  distribution  for  each  class  is  known.  In 
general,  the  bias  vector  of  PC  for  all  classes  with 
known  densities  is  equal  to 


e - a - (P  - I)a.  (6) 


When  the  class  densities  are  not  known  (as  in 
LACIE)  and  estimates  j£(x)  are  used  for  fj(X),  the 
situation  is  analogous;  i.e.,  7 is  split  into  regions  Rj 
such  that 


X e/?  ifff(x)  = ma x/ix). 

r 


Rj  - *|//(x)  = max  /;.(x) 
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and  the  expectation  of  4 is  given  by 


Thus, 


£$)  = Qa  (7) 

where 


O - K) 


JF/x);  <«> 

i.e.,  q(i  is  the  probability  of  an  observation  from  class 
jL falling  in  the  Ah  (estimated)  classification  region 
R, . It  follows  that  the  bias  of  PC  is  given  by 

£($  a)  = (Q  - I)ot.  (9) 

Unbiasing  PC 

If  Q were  known  and  nonsingular,  one  would  un- 
biasedlv  estimate  a by  Q~l  e;  i.e.,  from  equation  (7), 

£«rl  4)  = «?«)  = «. 

It  is  interesting  to  note  that  knowledge  of  the 
matrix  Pis  not  necessary,  or  even  sufficient,  for  un- 
biasing PC  based  on  estimated  densities.  The  advan- 
tage of  knowing  the  densities  and^P  lies  in  the 
variance  of  the  corrected  estimate.  If  a and  u are  the 
PC  proportions  using  the  known  and  estimated  den- 
sities respectively,  and  if  ak  = P~'e  and  au  ■* 
CT'u,  both  of  the  estimates  are  unbiased  with  respec- 
live  covariance  matrices  P-1  fe(P-1) r and 
Q~]  where  V9  and  Ku  are  the  covariance 

matrices  of  e and  u,  respectively. 

To  obtain  V9  and  fu,  note  that  /\4  is  distributed 

multinomially  (et em),  where  /Vis  the  number 

of  classified  pixels  in  the  segment;  hence,  *■>  1//V 

\D%  - ee7},  where  Dm  - diag  (e, •„). 

Similarly,  = UN  [0U  — uu7],  where  Du  = diag 
(U| um)  and  u = f(u)  = Qu\  i.e.,  u,  is  the  prob- 

ability of  a randomAobservation  from  the  segment 
falling  in  the  region  Rt 


‘np~1d*(p~Y~n  (,0) 

and 

n(i!)  > ± e-'[ou  - «ir](fi-*)r 

= <"> 

An  examination  of  equations  (10)  and  (II)  shows 
that  only  the  terms  P“,D#(P-,)r and  QTXDU(Q~{)T 
contribute  to  the  difference  between  P(4)  and  ^(u). 
The  elements  of  Du  and  although  not  equal  to 
each  other,  are  clearly  of  the  same  average  size  since 

m m 

ErEt 

f=l  i-l 


= 1. 

As  a consequence,  K(4)  and  f(u)  differ  mainly 
because  of  the  relative  size  of  the  elements  of  Q~l 
compared  to  those  of  P_1.  By  knowing  the  densities 
and  hence  the  classification  regions  |Py}  as  opposed 
to  estimated  regions  (/<,},  one  should  obtain  better 
classification  accuracy  and  hence  P should  be 
“closer”  to  the  identity  matrix  than  Q.  As  a result, 
one  would  expect  the  elements  of  P-1  to  be  generally 
smaller  than  those  of  Q-1. 

Odell-Chhikara  estimator. — Chhikara  and  Odell1 


R.  S.  Chhikara  and  P.  L.  Odell.  “Acreage  Estimates  for  Crops 
Using  Remote  Sensing  Data."  NASA  Johnson  Space  Center  An- 
nual Technical  Report  (JSC-08971),  University  of  Texas  at  Dallas, 
1974. 
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proposed  that  PC  could  be  made  unbiased  by 
estimating  (?by  Q,  say , then  letting  the  corrected  Cli- 
mate of  a be  given  by 


To  investigate  the  properties  of  this  estimator,  a 
number  of  questions  must  be  addressed.  How  is  Q 
estimated  Is  Q always  nonsingular?  What  about 
bias  caus  by  the  fact  that  E(Q~l)  is  not  equal  to 
O-1  even  tf  E(Q)  - Q1  What  if  some  elements  of 
«<w.  are  negative  or  greater  than  I or  the  sum  of  alt 
the  elements  is  not  1 ? What  is  the  variance  of  &0H. , if 
it  exists? 

Estimation  of  Q:  Odell  and  Chhikara  suggested 
that  Q be  estimated  by  classifying  training  data,  so 
that 


where  «,  is  the  number  of  training  pixels  labeled  as 
class ./  by  the  analyst  and  «„  is  the  number  of  those 
pixels  which  were  classified  as  class  /.  A potential 
problem  with  this  estimate  is  that  the  same  data  used 
for  computing  O are  also  used  to  train  the  classifier. 
As  a result.  Q will  be  biased  toward  an  identity 
matrix  which  represents  perfect  agreement  between 
labeling  and  classification.  Furthermore,  errors  are 
caused  by  mislabeling  of  training  data;  i.c.,  q.t  esti- 
mates the  probability  of  a pixel's  being  classified  as 
class  / given  that  it  was  labeled  (by  the  analyst)  as 
class./,  not  the  prob.  .v  of  its  being  classified  into 
class  ( given  that  it  actually  was  from  class ./. 

Another  approach  to  the  estimation  of  Q is  to 
assume  the  data  from  class ./  are  actually  distributed 
1,).  where  ju,  and  1,  are  means  and  covariances 
estimated  from  the  ,/th  set  of  training  data,  and  to 
compute  by  the  Monte  Carlo  method;  i.e.,  classify- 
ing randomly  generated  observations  from  A ($,,  i,) 
and  using  equation  (13).*  This  procedure,  of  course, 
depends  heavily  on  the  normality  assumption  as  welt 
as  the  accuracy  of  jit,  and  £,,  which  is  also  subject  to 
labeling  error. 


Existence  of  $-1;  expectation  and  variance  of 
&0<:  if  equation  (12)  is  to  have  meaning,  the  non- 
singularity of  Q must  be  established.  Unfortunately, 
this  is  not  always  true  when  Q is  computed  using 
equation  (13)  even  if  Q itself  is  nonsingular.  The  co- 
efficients are  distributed  multinomially  and  thus 
have  a nonzero  probability  of  obtaining  values  such 
that  Q is  singular;  thus,  strictly  speaking,  E[&0<) 
does  not  even  exist  in  this  case!  In  the  second  case 
(Monte  Carlo),  «,can  be  taken  to  be  arbitrarily  large; 
hence,  if  the  estimated  class  distributions  are 
reasonably  separated,  Q will  not  be  singular. 
However,  a theoretical  problem  in  computing  E{&0^) 
still  exists. 

Even  if  equation  (13)  is  taken  as  being  conditional 
on  (I  being  nonsingular  and  even  if  £(0)  — Q&o-c  is 
still  biased  because,  in  general,  £(@_,|  V nonsingular) 
is  not  equal  to  Q-1-  Furthermore,  the  variance  of  &0< 
can  be  quite  large,  resulting  in  estimates  having  ele- 
ments which  are  negative  or  greater  than  unity. 

Modified  O-C  estimator. — Because  of  the  pre- 
viously described  shortcomings,  it  was  decided  to 
consider  a modified  estimate  &A/,  which  is  defined  as 
the  solution  to  the  problem: 


Minimize  jju  - (14) 

subject  to  the  elements  of  & w being  nonnegative  and 
summing  to  1. 

If  Q is  nonsingular  and  the  elements  of  tHu  are 
nonnegative,  then  <aM  »*  a To  show  this  relation- 
ship, it  is  necessary  only  to  show  that  the  elements  of 
(r'u  sum  to  l,  since  a0<  — y 1 u makes  equa- 
tion (14)  equal  to  zero;  Le.,  it  is  certainly  a minimum. 
Since  the  columns  of  Q sum  to  1,  then  ery  — oJ, 

where  - (1,  1 1).  Since  9rQ  - *r,  then 

er=  • rQr  1 and  hence 

er(5  *S)  « (er3  ')  u 


= I (15) 


-In  the  m-class  case  In  the  two-class  case,  the  data  arc 
assumed  to  be  distributed  according  to  equation  (3).  where  fA  is 
taken  as  VtMrt,  i)4>  with  and  i,  being  estimated  from  the  Mh 
subclass  within  the  .Ah  class  of  training  data. 
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since  it  is  already  known  that  the  elements  of  P sum 
to  1. 

Note  that  «M  can  be  computed  for  either  of  the 
two  methods  of  computing  Q.  To  distinguish  be- 
tween the  two  estimators  in  further  references,  &T 
will  be  the  one  solving  equation  (14)  when  Qis  com- 
puted from  training  data,  and will  be  the  solu- 
tion to  equation  (14)  when  Q is  computed  by  the 
Monte  Carlo  method. 

Because  of  the  convexity  constraints  in  equation 
(14)  and  because  £((H)  is  not  equal  to  (T1.  a T and 
lwc  are  still  biased.  Whether  they  are  less  biased 
than  the  “uncorrected”  estimate  6 is  at  present  an 
unanswerable  question  theoretically.  Results  of 
some  numerical  testing  of  these  estimators  will  be 
given  later. 

Two-dass  versus  m -class  models. — Guseman  and 
Walton3  have  suggested  that  it  would  probably  be 
more  accurate  to  attempt  unbiasing  PC  in  the  two- 
class  case  than  in  the  m-class  case,  mainly  because  of 
the  improved  condition  of  the  matrix  Q,  e.g.,  it  wheat 
and  nonwheat  were  well  separated  but  their  respec- 
tive subclasses  were  not,  then  the  nr- class  “0"  matrix 
would  be  »ll  conditioned  whereas  the  two-class  “(T 
matrix  would  be  a near  identity. 

One  problem  with  the  two-class  model  is  that,  in 
general,  Q cannot  be  unbiasedly  estimated  (even 
with  perfect  labeling  of  training  data)  unless  the 
training  data  consist  of  a random  sample  within  each 
major  class.  Until  tne  advent  of  Procedure  1 (P-l) 
(see  the  paper  by  Neydorn  entitled  “Classification 
and  Mensuration  Approach  of  LAC1E  Segments"), 
training  samples  were  not  chosen  at  random  either 
between  major  classes  or  within  them,  because  it  was 
thought  that  the  resulting  labeling  accuracy  would  be 
excessively  low;  io  the  last  2 years  of  LAC1E, 
however,  random  training  samples  became  opera- 
tionally available.  The  next  section  will  include  dis- 
cussion of  the  unbiasing  technique  used  in  Procedure 
1,  which  takes  advantage  of  this  randomness.  With 
nonrandom  sampling,  however,  (/and  hence  propor- 
tion estimates  of  the  form  of  equation  (14)  could 
easily  have  more  bi?\  albeit  less  variance  in  the  two- 
class  case  than  in  the  m-class  case. 

Guseman  and  Walton  give  a procedure  for  com- 
bining the  m-class  and  two-class  estimators.3  It  is 


claimed  that,  if  the  densities  /#{*)  are  known  and 
the  me  lass  estimator  is  unbiased,  then  a certain 
linear  combination  of  the  m-class  and  two-class 
estimators  is  also  unbiased.  For  completeness,  a brief 
review  of  the  Guseman  and  Walton  technique  is 
given  and  how  it  may  be  generalized  is  shown. 

Let  fiJk  be  the  proportion  of  the  pixels  from  class 
j , subclass  k,  where  j * 1,2  and  k — l, . . . , Sup- 
pose one  uses  equation  ( 14)  to  obtain  estimates  pjk  of 
ftp  where  m - m,  + mj.  If  the  estimates  are  un- 
biased, one  can  then  unbiasedly  estimate  by 


*1  ~ E fyk  af  = E fyr 

k k 

(Presumably,  in  this  situation,  the  estimates  etp 
although  unbiased,  have  a large  variance  and  are 
hence  undesirable  as  estimates  by  themselves.) 

At  the  same  time,  let  /)<x)  be  an  arbitrary 
weighted  average  of  the  given  sn^lass  densities 
,/jk(x).  (Guseman  and  Walton  lake./)  to  be 


m 


E Vx)- 


i *=i 


but  this  is  more  a convenience  than  a necessity.)  One 
can  then  define  a partition  of  the  sample  space  into 
regions  5,  and  Sj  such  that 


Corresponding  to  the  3)  is  a 2 x 2 “0”  matrix 
denoted  by  where 


"JV  ^(x)jx 

*1 


3L.  F.  Guseman  and  J.  R Walton,  ‘ Meth.xls  Tor  Estimating 
Proportions  of  Convex  Combinations  of  Normals  Using  Linear 
Feature  Selection."  Submitted  to  Communications  Statistics— 
Theory,  Methods,  Fart  A. 


Supposed  » (3j ,t32) r >s  the  vector^of  proportions 
of  the  observations  lying  in  5,  and  S2.  respectively. 
Then,  one  can  let  a = P“’d  be  another  estimate  of 
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a,  if  (J-1  exists.  Note  that  a is  biased  because  the 
arbitrary  mixing  weights  are  not  the  true  ones;  i.e., 
E(B)  is  not  equal  to  (?a,  but  is  in  Tact  equal  to  Q*ia, 
where 


Q*  m 4$ 

=/r  V ' £ V/*(x)t/x 


Since  Q is  only  2x2,  however,  one  would  expect  a 
to  be  a much  more  stable  estimate  than  &. 

Guseman  and  Walton  suggest  that  a and  & could 
be  combined  to  obtain  an  estimate  (“Estimator  3") 
of  the  form 

a*  = a ± (a  - Q1q4) 

= a ± Q~*( d - Q*a),  (16) 


were  possible  to  estimate  Q~l  (if  it  exists)  un* 
biasedly,  instead  of  inverting  an  estimate  of  Q, 
Although  no  method  for  estimating  (?-1  directly  has 
been  developed,  it  is  possible  to  define  a set  of  “in- 
verse" probabilities  which  are  estimable  if  a random 
subsample  of  correctly  labeled  training  data  is  availa- 
ble. These  probabilities  can  then  be  used  to  linearly 
unbias  & without  going  through  an  inversion  process, 
thus  resulting  in  an  improved  estimate  of  a,  guaran- 
teed to  be  convex. 

Suppose  one  had  a set  of  estimated  density  func- 
tions (,*{x) } and  corresponding  classification  regions 
(«,}(/  - l, . . . , m).  If  a pixel  were  chosen  at  ran- 
dom from  the  segment,  one  could  ask  what  the  prob- 
ability is  of  the  pixel's  having  come  from  class  /given 
that  it  was  classified  as  j.  These  “inverse”  prob- 
abilities. /»,,,  are,  of  course,  related  to  the  coefficients 
«(yin  equation  (8)  through  the  relation 


<18> 


where 


= /7  «,  ‘ £ Wjk^d*-  07) 

k 


From  equation  (17),  it  is  easily  seen  that  £($*S) 
= Q\ i*  E(3)  and  hence 

/f(d  Q*a ) = 0 
=*  E(a*)  = a. 


Note  that  any  2 x 2 matrix,  //.  cou’J  be  substituted 
in  equation  (16)  for  ±5“*  and  till  leave  a*  un- 
biased. An  interesting  question  still  unresolved  is, 
"What  should  //be  such  that  solutions  of  a*(H)  - 
& + //(3  - y*6)  have  minimum  mean-squared 
error?” 

"Inverse" probabilities;  the  P-1  approach. — Much  of 
the  difficulty  in  unbiasing  PC  would  be  relieved  if  it 


where  and  u,are  as  defined  previously. 

If  one  had  a random  sample  of  Tobservations  and 
could  correctly  “label”  them  (i.e.,  tell  what  class  each 
one  came  from),  then  an  almost  unbiased  estimate  of 
h0 could  be  made  by 


K »)•  <■» 

I 


where  Tj  is  the  number  of  observations  classified  as 
class;  and  t(>  is  the  number  of  observations  classified 
as  class  j and  labeled  as  class  i. 


Because  the  sample  is  random,  7}  is  a random 
variable  (as  opposed  to  «( in  eq.  (13));  however,  it  is 


' fr  i / x'-  A — 

true  that  E{h0 ;|  7}  > 0)  * h0  \ hence,  h ^ is  unbiased 
as  long  as  T;  is  not  equal  to  zero. 

If  labeling  were  sufficiently  accurate  such  that 
H — (/L)  were  a reasonable  estimate  of  H — (h0), 
then  PC  could  be  corrected  to  give  an  estimate 
&i  where 


(20) 
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which  is  essentially  what  is  done  in  LAC1E,  as  de- 
scribed in  the  paper  by  Heydorn. 

To  show  that  &yis  approximately  unbiased  under 
perfect  labeling,  consider  $/'>,  the  Ah  element  of 
ml,  which  is  equal  to 


Since  Tj  is  distributed  biiiwmially  (f,«  ) 

e(W  * o)  ■ "■,/[>  - (■  - ^)r] 


' A 

'V/- 


* 7\y 


(23) 


Its  expectation  is  then  equal  to 


= CD 


Remembering  that  is  the  proportion  of  the  N 
pixels  in  the  segment  that  were  classified  as  class  j 
and  that  Tj  is  the  number  of  pixels  classified  as  class  j 
from  the  (random)  training  sample  of  size  T < N 
one  can  write  0,  -(7;  + 0}  )/N,  where  G is  the  num-’ 
ber  of  pixels  not  in  the  training  sample  that  were 
classified  as  class/  Since  7Jand  6 are  from  indepen- 
dent samples  of  respective  sizes  /and  (N—  7),  they 
are  independent;  hence. 


\ 

T<) 


Tj  + (N  - T)Uj 
N 


By  a similar  argument,  given  Tj , t0  is  independent 
of  Gj.  Aiso,  if  Tj  is  not  equal  to  zero,  £(r  J T.)  - 
hjjTj ; hence,  if  Tj  is  not  equal  to  zero. 


Substituting  into  equation  (22)  yields  the  uncondi- 
tional expectation 


**(«/") -Ev, 


= £ (by  eq.  (18)) 
i 


= (24) 

Thus  a,  is  unbiased  under  perfect  labeling  when  all 
the  Tj  values  are  nonzero. 

Note  that,  under  the  preceding  assumptions,  the 
raw  estimates 


Ti)  - 


..  r \Tt  * P - Ovl 

V/  [ Jj ^ 


t (N  T) 
N 


(22) 


S 0>  - 

R 


T 


are  also  unbiased  estimates  of  It  is  shown  in 
Heydorn’s  paper,  however,  that  £/'>  has  lower 
variance  than  in  fact,  can  be  shown  to  be 
a poststratified  estimate  of  «,,  where  the  strata  are 
precisely  the  partition  of  the  sample  induced  by  the 
classifier. 
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Ganerallxatlon  of  PC-Based  Techniques 

It  may  be  noted  that  there  is  no  requirement  that 
the  sample  space  7 be  partitioned  into  regions 
through  a classification  rule.  The  discussion  of  un- 
biasing PC  applies  for  any  pwiition  of  7 into  disjoint 
regions  { Rj)jm , where  the  phrase  “is  classified  as 
class/’  is  replaced  by  “x  € where  if  is  an  obser- 
vation. For  every  partition  K - (Rh...,  Rk)  of  X, 
there  corresponds  a k x m"Q'  matrix,  say 


were  all  similar  in  that  (1)  the  sample  space  7 was 
split  into  disjoint  regions  and  (2)  the  primary 
statistic  used  for  estimation  was  the  vector,  6,  of  pro- 
portions of  the  pixels  the  measurements  of  which 
fell  into  each  of  the  regions.  The  expectation  of  this 
vector  was  then  shown  to  be  a linear  function  of  the 
target  proportion  vector  a which  could  be  solved  in 
turn  for  an  estimate  of  a as  a function  of  A. 

Suppose  one  defines  indicator  functions 


<M)  = qv(R) 


*,(*) 


! 1 X € Rt 

0 otherwise. 


(26) 


and  the  analogous  relationship 


u = Q ot  , (25) 

Ax  t txin  mxt 

where  u,  = /*{x  6 /?,|,  x is  a random  observation 
from  the  mixture  distribution  in  equation  (1),  and 
u ■=  (u/t . . . , u*) T.  Note  also  that  the  number  of  par- 
titions k does  not  have  to  equal  the  number  of  classes 
m although,  for  all  the  elements  of  a to  be  estimable, 
one  needs  k & nt.  If  k > m,  one  could  use  least- 
squares  estimates  of  a to  replace  “(Tl”  in  earlier 
discussions. 

An  example  of  a partition  not  based  on  classifica- 
tion would  be  one  in  which  u,  — 1/A  for  all  values  of 
/';  i.e.,  “statistically  equivalent  blocks.”  One  desirable 
property  of  such  a partition  is  that  estimates  of  the 
type  of  equation  (19)  would  be  as  stable  as  possible 
since,  with  a reasonable  sample  size,  the  expectation 
of  all  the  7*  values  would  be  safely  removed  from 
zero. 


where  R,  is  the  Ah  disjoint  region  with  union  7,  as  in 
the  preceding  section.  Then,  using  equation  (1), 
equation  (2S)  as  given  in  the  preceding  section  can  be 
written 


Uj  = 

m 

s Lvy 

M 

m /* 

• £ {JF^ 


= Ylai  JL 

/'I  A 


* A,(x)  £ Ot.  dFf{x ) 

- £'(«,(*))  (i  * 1 k) 


OTHER  PROPORTION ‘ STIMATION  ~ ^ qvaf 

METHOD8 


General  Unear  Functional  Estimates 

The  second  section  of  this  paper  included  discus- 
sion of  some  proportion  estimation  methods  which 


By  writing  equation  (25)  in  the  form  of  equation 
(27),  it  can  easily  be  seen  that  such  an  equation  holds 
for  any  set  of  functions,  {«,).  the  expectations  of 
whicit  exist  under  the  distribution  of  equation  (1). 
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The  coefficients  are  simply  the  conditional  expec- 
tations of  gj  (x)  given  that  x is  a random  observation 
from  population  J.  Given  enough  linearly  indepen- 
dent functions  g h one  can  then  use  equation  (27)  to 
estimate  or  by  replacing  E(gt)  with  gf  This  class  of 
estimators  will  be  called  linear  functional  estimators 
(LFE’s). 


Special  Cates  of  LFE's 

Method  of  moments.— -To  establish  the  method  of 
moments  estimator,  let  g^x)  — x(,)  (/  • l, p) 

and  gp+/+/x)  - x<'>xV>(i  - l pjm  i /), 

where  x{v)  is  the  vth  component  of  the 
p-dimensional  observation  vector  x.  In  this  case,  k is 
equal  to  p + pip  + 1)/2  and  the  expressions  in  equa- 
tion (27)  arc  safely  overdetermined  for  any  reasona- 
ble number  of  classes. 

The  left  side  of  equation  (27)  is  obtained  by  com- 
puting the  first  and  second  moments  from  the  entire 
mixture  distribution  (i.e.,  the  whole  segment), 
whereas  the  coefficients  q^on  the  right  side  of  equa- 
tion (27)  are  the  conditional  first  and  second  mo- 
ments, which  can  be  computed  from  training  data 
from  each  of  the  m classes.  Then,  as  in  equation  (14), 
the  system  g — Q&  is  solved  for  & by  constrained 
least  squares,  where g — (g{ gk)T. 

Marginal  CDF. — Let  Pv(x)  be  the  sample  univari- 
ate CDF  from  the  mixture  distribution  (eq.  (1))  of 
x<t'i  taken  over  all  the  pixels  in  the  segment.  Then 
Pv(x)  is  simply  the  fraction  of  the  observations  the 
vth  component  of  which  is  less  than  or  equal  to  x.  Its 
expectation  is 


Fv(*)  ~ 12 

where  F^,(x)  is  the  marginal  CDF  of  the  vth  compo- 
nent of  the  j\h  class  distribution.  For  any  set  of  arbi- 
trary real  numbers  ( x^ ) ££ , , the  set  of  equations 

fa)  ■ i /•!: (28) 


can  be  approximated  by  replacing  Fv  by  Pv  and  F^  by 
(the  >atter  being  computed  from  training  data 


from  the  yth  class)  and  solved  for  a by  constrained 
least  squares.  To  avoid  degeneracies,  it  is  a good  idea 
to  spread  xfe  over  the  distribution  of  x(v). 

Specifically,  x*,  can  be  taken  as  the  (//A/)-th  quan- 
tile of  F„.  Equation  (28)  is  then  replaced  by 


where  n^  is  the  number  of  observations  from  train- 
ing sample  J which  had  a vth  component  less  than  or 
equal  to  x^  and  is  the  total  number  of  observa- 
tions from  training  class/  (Note  that  there  are  pM 
equations  and  m unknowns.) 

Marginal  "BIN"  estimator. — The  CDF  estimator 
can  be  slightly  modified  to  produce  what  is  termed 
the  “BIN”  estimator  by  replacing  Fv(xh)  with 
Fy(xJ  - Fv(xiA  v),  where  x,„,  is  taken  as  -oo. 
Similarly,  F^ix^)  is  replaced  by 
Fjyix ,.j «,).  If  the  sets  (x^j  are  taken  as  quantiles 
again,  then  the  left  side  of  equation  (29)  becomes 
MM  for  all  values  of  /and  v,  whereas  the  coefficients 
njjv  are  taken  as  the  number  of  observations  in  train- 
ing sample  j the  vth  components  of  which  lie  in  the 
interval  (x^.x*). 

Density  functions.— Another  example  of  an  LFE  is 
that  which  uses  estimated  densities  for  gr  Let  fix) 
be  the  estimated  density  function  for  the  yth  class. 

Then,  one  can  let  g,(x)  - fix)  (/  - 1 m).  In 

this  case,  the  system  of  equation  (27)  is  square;  i.e., 
k — m. 


Maximum  Likelihood  Eatimatora 

An  altogether  different  approach  to  proportion 
estimation  is  that  of  maximum  likelihood.  Suppose 

class  density  functions  ffx)  are  known.  If  |Xj 

Xyy)  is  a random  sample  of  N observations  from 
equation  (1),  then  the  likelihood  L of  the  sample  is 
given  by 


N m 

■ n r “/>(«<)•  <»> 

/*  i /*  i 


Maximum  likelihood  estimates  of  a,  are  obtained  by 
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maximizing  L (or  log  L),  subject  to  a,  * 0 and  sum  As  a consequence,  equation  (31)  can  be  rewritten  as 
ay  “1.  Let 


Q ■ logL 


(33) 


Then 


where 


which  suggests  a fixed-point  iteration  procedure  for 
solving  Tor  ay  by  successive  substitution  of  trial 
values.  Since  any  positive  values  of  a j substituted 
into  the  left  side  of  equation  (33)  will  result  in  posi- 
(31)  tive  values  in  the  right  side,  such  a procedure,  if  it 
converges,  is  guaranteed  to  produce  a nonnegative 
estimate.  To  show  that  the  solution  to  equation  (33), 
if  it  exists,  is  a maximum,  consider  the  second 
derivatives  of  Q: 


/(*<)  • £ aM*i) 

v 


and  X is  a LaGrange  multiplier.  Setting  'dQfdeij'm  0 
yields 


ft  /(*/)  “ 
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(32) 


Summing  equation  (32)  over  J yields 


<-t  /*(*,) 


(34) 


Since  the  matrix  H - (h^)  is  clearly  seen  to  be  a sum 
of  seminegative  definite  rank  1 matrices,  it  is  itself 
seminegative  definite.  Furthermore,  if  N > m,  H is 
negative  definite  with  probability  1.  As  a conse- 
quence, any  solution  to  equation  (33)  is  a maximum 
likelihood  estimate  of  a. 

If  the  functions  fj are  not  known,  a similar  scheme 
can  be  implemented  using  parametric  estimates 
(usually  Gaussian)  for  fj.  Details  of  this  type  of 
estimation,  including  some  convergence  theorems, 
may  be  found  in  reference  2. 


PERFORMANCE  OP  PROPORTION 
ESTIMATORS 


■ X 


£ 


n*t) 


* X 


• X * JV. 


All  the  proportion  estimators  described  in  the  two 
preceding  sections  are  dependent  on  certain  assump- 
tions in  order  to  be  unbiased  and  to  have  a reasona- 
bly small  variance.  Every  estimator  requires  a ran- 
dom sample  from  each  of  the  m classes;  therefore, 
nonrandomness  or  outright  mislabeling  could 
seriously  affect  the  performance  of  the  proportion 
estimator.  In  addition,  some  of  the  estimators  have 
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distributional  assumptions,  usually  normality; 
others,  however,  are  nonparametric.  In  some  cases, 
failure  of  an  assumption  to  hold  may  only  increase 
the  variance  of  the  estimate;  in  most,  however,  the 
estimator  will  become  biased.  As  a consequence,  no 
attempt  is  made  to  compare  the  merits  of  various 
estimators  under  ideal  conditions  (i.e.,  when  all 
assumptions  are  met).  What  really  matters  in 
LACIE-type  applications  is  the  degree  of  insen- 
sitivity of  the  estimator  to  violations  of  its  assump- 
tions. 

Because  of  the  myriad  of  possible  proportion 
estimators  which  can  be  constructed,  it  would  be  an 
impossible  task  to  test  them  all,  especially  on  real 
data.  There  has  been  very  little  comparative  testing 
to  date;  however,  Ziegler  (ref.  3)  did  compare  the 
performance  of  eight  estimators  on  some  L ACIE  in- 
tensive test  site  data.  It  appeared  that  the  best  results 
were  obtained  with  the  CDF  or  “BIN”  estimators. 
The  theoretical  bias  of  raw  pixel  counting  proved  no 
worse  than  that  caused  by  the  failure  of  various 
assumptions  in  other  estimators.  The  reader  is  re- 
ferred to  reference  3 for  details. 

In  reference  4,  Guseman  did  some  limited  testing 
comparing  the  m-class  and  two-class  versions  of 
equation  (16)  for  the  special  case  H — (H.  He  found 
that  the  two-class  case  gave  better  results. 

The  LACIE  Accuracy  Assessment  personnel  did 
some  comparison  studies  of  pixel  counting  and  Pro- 
cedure 1 on  eight  LACIE  segments.  No  meaningful 
differences  in  performance  were  noted.  Again, 
despite  being  theoretically  biased,  PC  did  no  worse 
than  P-1,  probably  because  of  labeling  errors  in  P-1. 

Earlier  studies  by  the  Environmental  Research 
Institute  of  Michigan  (ERIM)  (refs.  $ and  6)  on  mo- 
ment-type estimators  (first  moments  only)  were  also 
inconclusive.  The  on'"  "literal  conclusion  possible  at 
present  is  that  poor  to  inadequate  “training”  data 
cause  most  of  the  proportion  estimation  methods  to 
be  indistinguishable.  When  future  surveys  are  made 
with  some  ground  truth  available  for  training,  the 
selection  of  an  efficient  proportion  estimation 
method  will  become  a much  more  important  task 
than  it  is  at  present. 


REFERENCES 


1.  Teicher,  H.:  Identiflability  of  Mixtures.  Ann.  Math.  Stat., 
1961,  pp.  244-248. 

2.  Peters,  B C.;  and  Walker,  H.  F.:  An  Iterative  Procedure . . . 
(and  other  articles).  Final  Report  (Univ.  of  Houston),  NASA 
Contract  NAS  9-12777,  May  1, 1975  to  May  30,  1976. 

3.  Zieglet,  L.  R.:  An  Evaluation  of  Algorithms  for  Estimating 
Wheat  Proportions  From  Landsat  Data.  Technical  Memoran- 
dum 642-2031,  Lockheed  Electronics  Company,  Houston, 
Tex.,  May  1977. 

4.  Guseman,  L.  F : Applications  of  Feature  Selection.  Final  Re- 
port, NASA  Contract  NAS  9-I4689-4S,  June  1,  1975  to  May 
31,  1976. 

5.  Nalepka.  R.  F.;  and  Hyde.  P.  D.:  Estimating  Crop  Acreage 
From  Space-Simulate  Multispectrai  Scanner  Data.  NASA 
CR-tRlM  31650-148-T,  Aug.  1973. 

6.  Horowitz,  H.  M.;  Hyde,  P.  D.;  and  Richardson,  W.:  Improve- 
ments in  Estimating  Proportions  of  Objects  From 
Multispectrai  Data.  NASA  CR-ERIM  190100-25-T,  Apr. 
1974. 


ACKNOWLEDGMENTS 

The  author  wishes  to  acknowledge  the  work  of 
many  contributors  whose  research  has  been  included 
in  this  paper.  In  particular,  H.  0.  Hartley,  P.  L.  Odell, 
W.  S.  Coberly,  J.  P.  Basu,  and  R.  P.  Heydorn  have 
made  many  valuable  inputs  to  the  material  presented 
herein.  Of  special  note  is  the  series  of  articles  by 
University  of  Texas  at  Dallas  personnel  in  their  An- 
nual Report  for  1974-1 975.4 


*P.  L,  Odell,  “Statistical  Theory  and  Methodology  for  ote 
Sensing  Analysis  With  Special  Emphasis  on  LACIE."  NaSA 
Johnson  Space  Center  Annual  Report  (JSC -09703).  University  of 
Texas  ai  Dallas.  197$. 


Appendix 

Categorization  and  Error  Analysis 
of  Proportion  Estimators 


An  inspection  of  figure  1 will  show  how  the  pro* 
portion  estimators  discussed  in  this  paper  may  be 
grouped.  They  are  labeled  "El”  to  “E16"  for  easy 
reference.  Under  each  estimator,  symbols  are  shown 
which  indicate  the  members  of  the  given  list  of 
assumptions  that  must  hold  for  the  procedure  to  be 
feasible  as  a proportion  estimator.  In  addition  to  this 
li't  in  figure  1,  all  estimators  assume,  of  course,  that 
there  is  enough  information  in  the  data  to  actually 
separate  the  classes.  (Such  has  not  always  been  the 
case  in  LAC1E!) 

Note  that  there  are  three  main  categories  of 
estimators  shown.  The  first  type,  maximum  likeli- 
hood, is  heavily  dependent  on  distributional  assump* 
tion— usually  that  of  normality.  If  the  data  were 
really  normal  for  each  of  m known  classes,  max* 
imum  likelihood  would  probably  give  the  best 
performance. 

Unfortunately,  the  use  of  clustering  to  define 
classes  is  essentially  in  contradiction  with  the  nor- 


mality assumption,  thus  making  El  and  E2  ques- 
tionable unless  clearly  identifiable  classes  are  known 
a priori.  Since  true  means  and  covariances  are 
generally  unknown  even  if  the  distribution  of  the 
data  is  normal,  El  tends  to  be  biased;  however,  it  is 
not  subject  to  wild  variation,  as  is  E2,  when  adequate 
training  samples  are  not  available. 

The  second  major  category  of  proportion  estima- 
tors is  that  based  on  classification  of  individual  pix- 
els. Within  this  category  are  the  "raw"  (uncorrected) 
pixel-counting  estimators  E3  and  E4;  the  “(F  matrix 
corrected  estimators  ES,  E6,  and  E7;  and  “P-l”  (E8), 
which  is  really  in  a category  by  itself. 

All  the  estimators  in  this  class  depend  to  some  ex- 
tent on  normality  because  they  use  estimated 
multivariate  normal  densities  to  define  classification 
regions.  Failure  of  the  data  to  be  normal,  however, 
does  not  cause  as  much  damage  as  it  does  under  max- 
imum likelihood;  in  fact,  for  ES  and  E8,  only  the 
variance  of  the  estimator,  not  its  bias,  is  effected. 


lAMitouaarti  <s comic?  and 
RANDOM  OVIR  *MOtl  KIM 
LAMLiD  MUSH  « COMIC T AND 
RANDOM  Wit  Ml*  I ACM  MAJOR 

Class i • class caui 

LAOiltDIAWLf  tft COMRICT  AMO 
RANDOM  WITHIN  I ACM  SUBCLASS 
1 1WO  CLASS  CAM  I 

OAf  A ARC  NORMALLY  DtSTAtSUTID 
WITHIN  A CLASS*  CLASS  CAM  I OR 
SUKL  AU  I TWO  CL  ASS  CAM  I 


R SORCLASSWfMMTS  KNOWN  OR  ALL 
IOOA1  * TWO  CLASS  CAM  I 
tt  NUMBfH  O*  CL  ASMS  KNOWN 
»"■  CLASS  CAN  I 

I CL  ASS  OR  SUBCLAM  OIStNISUVTOWf 
KNOWN 

• NOT  NgCf SBARV  IN  UNMAMONI SS 
BUT  HI  LtSRtDUCK  VARIANCf 


FIGURE  t.-Categor?  ration  of  proportion  estimation  procedures. 
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In  order  to  estimate  the  "Q'  matrix  unbicuedly,  a 
good  random  training  sample  is  required  for  each 
class.  In  E6  and  E7,  where  the  Q matrix  is  estimated 
by  Monte  Carlo  or  numerical  integration  (as  opposed 
to  ES,  where  it  is  estimated  nonparametrically),  dis- 
tributional assumptions  are  more  important.  The 
trade-off  of  E7  against  ES  is  that  fewer  training  ob- 
servations are  needed  to  estimate  Qin  E7  if  the  data 
are  known  to  be  normal,  although  serious  errors  may 
result  if,  in  fact,  they  are  not. 

The  Guseman  estimator,  E6,  is  a “hybrid"  because 
it  combines  a two-class  bissed  estimator  with  an  m- 
class  "(?'  matrix  procedure.  Since  Guseman't  Q 
matrix  is  estimated  by  integration  of  normal  den- 
sities, it,  too,  is  dependent  on  the  normality  assump- 
tion. 

Procedure  1,  or  E8,  is  unique  in  that  the  only  real 
assumption  for  its  unbiasedness  is  that  samples 
selected  at  random  throughout  the  whole  segment  can 


be  unbiasedly  labeled  and  that  all  classes  are  repre- 
sented. In  E8,  classification  is  only  used  as  a 
stratif.cation  device;  hence,  the  assumption  of  nor- 
mality helps  only  through  producing  effective  strata, 
not  by  reducing  bias.  Although  E8  does  not  require 
as  many  assumptions  as  other  procedures,  the  one  it 
does  require  is  the  most  stringent;  i.e.,  the  ability  to 
label  randomly  sampled  pixels  from  the  mixture  dis- 
tribution in  equation  (1).  Other  procedures  only  re- 
quire random  observations  within  a class,  not  be- 
tween classes. 

The  third  major  group  of  proportion  estimators 
(LFE's)  are  essentially  nonparametric.  They  require 
only  knowledge  of  the  number  of  classes  and  ran- 
dom samples  from  each  class.  They  tend  to  have 
higher  variances  than  some  other  methods,  but  they 
have  the  advantage  of  not  being  biased  by  the  failure 
of  distributional  assumptions  to  hold. 
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On  the  Clustering  of  Multidimensional  Pictorial  Data 
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ABSTRACT 

A new  approach  vo  problems  of  clustering  and 
classification  of  multidimensional  pictorial  data  is 
presented.  Proceeding  logically  from  simple  models 
and  assumptions,  the  author  descri-  js  the  develop- 
ment of  a clustering  technique  and  program.  Some 
tests  of  the  program  have  been  performed,  and  this 
work  is  reported.  The  techniques  make  use  of  infor- 
mation from  the  spatial  domain. 


INTRODUCTION 

One  application  of  remote  sensing  is  the  use  of 
satellite-  or  aircraft-acquired  multispectral  scanner 
(MSS)  data  to  conduct  land  usage  inventories  over 
large  geographical  areas.  An  essential  part  of  a 
realistic  program  to  conduct  such  an  inventory  is  the 
application  of  cluster  analysis  to  help  the  human 
analyst  label  remotely  sensed  data.  Cluster  analysis 
(clustering)  lets  the  analyst  label  clear-cut  cluster 
cases  such  as  a large  field  (easily  and  accurately), 
while  assuming  that  other  members  of  the  cluster  are 
in  the  same  class.  Other  members  of  the  cluster  then 
have,  the  same  label;  since  this  category  includes 
difficult-to-label  cases,  much  tedious  analysis  is 
saved. 

In  this  paper,  several  related  ideas  on  using  spatial 
relationships  to  aid  clustering  and  classification  are 
presented.  Techniques  for  selecting  “pure"  picture 
elements  (pixels)  and  “starting"  cluster  centers  (for 
an  iterative  or  /c-means  clustering  program)  and  for 
assigning  pixels  to  clusters  (classification)  are  dis- 
cussed. A new  clustering  technique  that  is  both 
elegant  and  economical  is  proposed.  Each  technique 
is  described  in  detail  in  an  appendix;  each  has  been 
implemented  as  a computer  program.  First  tests  of 
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the  clustering  program  on  four-pass  Landsat  data  are 
described. 

The  origin  of  the  innovations  suggested  here  lies 
in  the  philosophy  of  the  approach.1  In  dealing  with 
such  an  ill-defined  concept  as  clustering,  it  is  essen- 
tial that  one  identify  the  assumptions  being  made 
about  the  objects  being  clustered  (the  model)  and 
study  whether  the  desired  results  of  clustering  are 
justified  on  the  basis  of  the  model  and  the 
methodology  being  employed.  It  is  essential  that  the 
reality  of  the  model  be  verified  independently  of  the 
methodology.  It  is  not  advocated  that  these 
philosophical  issues  be  resolved  here,  but  they  are 
considered. 

The  methodology  used  here  is  derived  in  a new 
analytical  framework  for  studying  pictorial  informa- 
tion. One  job  of  the  theorist  is  to  proffer  analytical 
frameworks  (and  models  within).  Another  is  to 
create  sample  theories  consistent  with  the  frame- 
work. The  success  of  the  ideas  presented  here  has 
one  inescapable  consequence:  current  theory,  based 
on  mathematical  statistics,  needs  to  be  critically  ex- 
amined. These  early  results  definitely  suggest  that 
real  data  are  inconsistent  with  the  assumptions  of 
current  theory.  What  is  lacking  is  a replacement 
theory.  It  would  surely  be  unwise  to  wantonly  aban- 
don current  theoretical  work  because  a new 
methodology,  which  at  first  seems  to  contradict  ex- 
isting theory,  has  appeared.  The  theoretical  problem 
is  open. 

A thread  which  starts  in  the  preceding  two 
paragraphs  runs  through  all  the  ideas  discussed  here. 
It  is  the  concept  of  reality.  The  reader  must  be  cau- 


*To  some  extent,  this  work  is  a contribution  to  the  “Philoso- 
phy of  Clustering.”  It  certainly  seems  clear  that,  when  the  ap- 
proach taken  here  is  carefully  studied  and  combined  with  other 
,•  ork  in  understanding  multidimensional  pictorial  data,  a signifi- 
cant advance  in  the  methodology  of  using  spatial  associations  will 
result.  (Meanwhile,  it  is  realized  there  may  be  no  such  subject  as 
the  Philosophy  of  Clustering.) 
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tioned  that  the  word  “real”  is  used  in  the  naive  sense, 
principally  to  distinguish  between  what  happens  in 
the  model  and  what  has  happened  in  experiments 
performed  in  the  setting  of  the  model.  For  example, 
in  the  model,  the  concept  of  a field  is  defined, 
whereas  actual  (real)  fields  are  naively  believed  to 
exist.  Of  course,  a thoughtful  reader  will  reflect 
seriously  on  this  situation  and  realize  that  naive  real- 
ism cannot  be  justified. 


CURRENT  CLUSTERING  TECHNIQUES 

Of  the  many  clustering  techniques  suggested  in 
the  literature,  several  on  remotely  sensed  data  seem 
to  be  effective.  Iterative  algorithms  such  as 
ISODATA  (refs.  1 and  2)  and  CLASS  (ref.  3)  have 
been  used  successfully  on  four  channels  of  aircraft 
scanner  (C-l)  data  and  on  Landsat  data  (refs.  4 to  6). 
A similar  technique  is  described  by  Wacker  and 
Landgrebe  (ref.  7).  (For  a discussion  of  the  applica- 
tion of  some  of  these  techniques  to  remote  sensing, 
see  Duran  and  Odell  (ref.  8,  pp.  100  to  102).  See  also 
Anderberg  (ref.  9,  pp.  156  to  175)  for  a comprehen- 
sive comparison  of  variants  of  ISODATA  in  use  at 
the  time  (1973).)  Recent  developments  (in  addition 
to  CLASS),  such  as  the  gravitational  clustering  ideas 
of  Ball  (ref.  10)  and  later  Wright  (ref.  11),  show  some 
promise  for  clustering  arbitrary  metric  data;  their  po- 
tential in  remote  sensing  has  not  been  fully  evalu- 
ated. 

All  the  techniques  discussed  previously  are  non- 
hierarchical  clustering  methods.  The  dominant  idea 
in  each  of  these  techniques  is  to  take  some  initial 
clustering  of  the  data  and  rearrange  the  assignments 
(of  data  to  clusters)  to  improve  the  partition.  Even 
the  simplest  of  these  methods,  the  basic  k-means 
program  of  MacQueen  (ref.  12),  requires  two  passes 
through  the  data  with  each  pixel  being  classified  on 
each  pass.  When  programed,  most  of  the  techniques 
will  be  structured  with  a starting  procedure  module 
(ref.  13):  “INITIAL  NUMBER  OF  CLUSTERS, 
CLUSTER  CENTERS.”  As  a module,  it  can  be 
studied  separately. 

Although  these  methods  include  the  most  effi- 
cient clustering  techniques  known,  the  cost  (in  com- 
puter resources)  of  applying  them  to  problems  of 
remote  sensing  is  still  high.  This  point  is  discussed  by 
Dubes  and  Jain  (ref.  13)  and  by  Wright  (ref.  11).  The 
cost  is  high  because  of  the  amount  of  data  to  be 
clustered  (typically  23  000  pixels  in  one  Landsat  seg- 
ment), the  dimensionality  of  the  data  (four  channels 


for  each  temporal  acquisition),  and  the  nature  of  the 
clustering  program.  There  are  four  obvious  ap- 
proaches to  reducing  the  cost. 

1.  Reduce  the  dimensionality  of  the  problem. 

2.  Select  a small  but  representative  subset  of  the 
data  and  cluster  it. 

3.  Get  a better  starting  partition. 

4.  Get  a better  clustering  program. 

Implementation  of  the  first  of  these  methods  is 

variously  called  feature  selection,  factor  analysis,  or 
multidimensional  scaling.  An  account  or  the 
mathematical-statistical  trickery  of  feature  selection 
can  be  found  in  Andrews  (ref.  14).  There  is  convinc- 
ing evidence  (refs.  15  and  16),  for  example,  that  each 
acquisition  of  Landsat  agricultural  data  is  at  most 
two  rather  than  four  dimensional.  However,  even  if 
the  dimensionality  is  reduced  to  two  for  each  acquisi- 
tion (using  a transformation  such  as  the  one 
developed  by  Kauth  and  Thomas  (ref.  15)),  high 
dimensionality  remains  because  at  least  three  ac- 
quisitions are  required  to  separate  the  real  classes 
present  (in  this  case,  crops).  The  second  point,  which 
might  be  called  data  selection,  and  the  third  are  dis- 
cussed next;  the  fourth  point  is  discussed  after  a brief 
discussion  of  models  and  assumptions. 


TECHNIQUES  FOR  SELECTING  DATA 

The  easiest  way  to  reduce  the  number  of  data 
units  to  k is  to  choose  the  First  k units  encountered. 
The  obvious  disadvantage  for  pictorial  data  is  that 
some  of  the  real  classes  present  in  the  data  may  not 
be  represented  in  the  first  k,  even  if  k is  large. 
Therefore,  pixels  are  selected  at  random  or  in  some 
Fixed  spatial  pattern  which  spreads  the  selected 
points  over  the  data.  Consider,  however,  the  follow- 
ing problems. 

1.  If  the  sample  is  sparse,  a real  class  may  not  be 
represented.  In  particular,  prior  knowledge  about  the 
structure  of  the  data  (e.g.,  the  number  of  classes) 
may  be  lost. 

2.  Some  samples  may  come  from  classes  which 
are  of  no  interest  so  that  processing  resources  are 
wasted. 

3.  Some  (in  the  case  of  Landsat  data,  many)  sam- 
ples may  be  mixtures  of  real  classes  (being  on  a 
spatial  boundary);  iterative  clustering  programs  such 
as  ISODATA  are  likely  to  produce  clusterings  in 
which  a mixture  class  captures  a real  class. 

Implicit  in  this  discussion  are  two  ideas.  The  First 
is  very  simple — real  classes  exist.  The  second  idea  is 
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about  the  pictorial  nature  of  the  data— real  classes 
are  present  in  spatial  associations  (i.e.,  think  fields). 
In  the  first  part  of  this  paper,  a technique  for  sam- 
pling multidimensional  pictorial  data  based  on  these 
ideas  is  presented;  in  appendix  A,  a computer  pro- 
gram for  implementing  the  technique  is  described. 

Others  have  considered  the  problem  of  selecting 
representative  pixels.  For  example,  Hall  et  al.  (ref. 
17)  delete  pixels  from  a scene  with  fewer  than  four 
occurrences.  A similar  scheme  has  been  developed 
by  W.  Coberly  (private  communication).  High-fre- 
quency pixels  are  selected  for  clustering  and  factor 
analysis;  however,  small  classes  are  still  likely  to  be 
lost.  Also,  as  the  dimensionality  of  the  data  in- 
creases, the  histograms  become  more  scattered  and 
the  hashing  program  (by  which  one  accumulates 
multidimensional  histograms)  becomes  more  com- 
plex and  time  consuming.  The  technique  proposed 
here  has  the  same  purpose  as  histogram-based  selec- 
tion of  high-frequency  pixels — to  select  for  analysis 
prototypes  which  are  pure  (i.e.,  not  mixture)  pixels 
and  which  represent  each  real  class. 

The  third  method  mentioned  for  reducing  the  cost 
of  an  iterative  clustering  program  is  to  start  near  a 
“solution”  (so  that  fewer  iterations  are  required),  if 
this  procedure  can  be  managed  with  negligible  added 
computational  burden,  computer  time  will  be  saved. 
The  program  described  in  appendix  A for  finding  ini- 
tial cluster  centers  is  fast  and  automatic;  the  idea  of 
the  technique  is  introduced  next. 


FINDING  FIELDS  IN  REMOTELY  SENSED 
DATA 

Let  the  multidimensional  data  vector  in  row  /,  col- 
umn j,  be  denoted  by  d(>/.  Let  | v|  denote  the  euclid- 
ean length  of  a vector  v (so  that  | v — w|  is  the  dis- 
tance between  v and  w).  Suppose  there  are  r rows  and 
c columns;  pixels  with  row  index  1 (r)  or  column  in- 
dex 1 (c)  will  be  called  border  pixels.  Others  are  said 
to  be  inside  the  scene.  Each  pixel  d, , inside  has  four 
nearest  neighbors;  i.e.,  left,  right,  above,  and  below. 
In  Landsat  data,  these  are  probably  the  only  neigh- 
bors that  matter.  In  data  with  better  resolution  (such 
as  most  aircraft-acquired  data),  many  more  neigh- 
bors can  be  considered  in  forming  spatial  judgments. 

An  irresistibly  interesting  problem  in  pictorial  pat- 
tern recognition  is  the  boundary  detection  problem. 
The  approach  taken  here  to  finding  fields  actually 
defines  a set  which  is  almost  certain  to  contain  the 
boundary;  spatially  connected  (ref.  18)  sets  remain- 


ing are  called  fields.  Only  the  four  nearest  neighbors 
are  considered  in  deciding  connectedness.  Since  thin 
boundaries  ?re  not  required,  a simple  one-dimen- 
sional gradient  thresholding  technique  (with 
thresholds  set  automatically)  is  used  to  mark  proba- 
ble boundary  points.  The  thresholds  are  set  so  that 
about  one-third  of  the  scene  is  boundary. 

This  technique,  when  tested  on  real  Landsat 
agricultural  data,  has  been  observed  to  select  fields 
which,  on  comparison  with  ground-truth  maps,  are 
found  to  contain  representatives  from  each  real  class 
(crop  type)  and  never  to  include  two  or  more  distinct 
real  classes.  This  experimental  evidence  supports  the 
following  three  assumptions. 

Assumption  l:  Real  classes  exist— Mark  each  pixel 
pair  inside  the  scene  which  spectrally  differs  by  more 
than  a threshold  from  its  left  or  above  neighbor;  set 
the  threshold  so  that  between  one-fourth  and  one- 
half  of  the  pixels  are  marked.  The  complement  of  the 
set  of  marked  points  is  called  the  set  of  pure  pixels, 
and  the  spatially  connected  components  of  this  set 
are  called  fields. 

Assumption  2:  Each  field  contains  exactly  one  real 
class. 

Assumption  Each  real  class  is  presented  in  at 
least  one  field. 

The  program  described  in  appendix  A is  based  on 
these  assumptions.  Once  the  fields  are  formed,  pixels 
are  selected  (roundrobin  fashion)  until  an  adequate 
supply  for  clustering  is  obtained.  On  the  assump- 
tions, it  is  known  that 

1.  Each  real  class  is  represented. 

2.  Each  pixel  comes  from  a real  class. 

Also,  real  classes  present  only  in  small  fields  are 
not  as  likely  to  be  captured  by  classes  present  in  large 
fields  in  the  clustering  program,  as  is  the  case  with 
random  selection;  this  is  true,  since  the  roundrobin 
selection  technique  will  select  the  same  number  from 
each  field  regardless  of  the  field  size. 

.The  process  of  obtaining  starting  cluster  centers 
(rather  than  pixels  to  cluster)  is  now  outlined.  Sup- 
pose s starting  cluster  centers  are  desired.  Form 
fields  as  described  previously  and  select  5 pixels 
from  each  field;  call  these  pixels  test  pixels.  Form 
the  mean  vectors  of  each  field,  (hi  one  Landsat  seg- 
ment, there  are  typically  300  fields.)  Classify  test  pix- 
els by  nearest  spectral  neighbor  to  the  cluster  centers 
and  count  the  number  of  times  a center  attracts  a test 
pixel.  Eliminate  all  cluster  centers  to  which  no  test 
pixels  are  assigned.  (This  step  will  automatically 
guarantee  distinct  cluster  centers.)  Now  eliminate  all 
centers  with  1,2,...  assignments  until  sare  obtained. 
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reclassifying  test  pixels  which  were  assigned  to  an 
eliminated  center. 


THE  CLASSIFICATION  PROBLEM 

Following  this  starting  procedure,  a clustering  pro- 
gram (designed  to  cluster  arbitrary  metric  data)  will 
produce  a clustering  of  the  sampled  data;  on  exit,  the 
cluster  centers  will  be  known.  The  problem  now 
becomes  one  of  classification  (of  all  pixels).  How  can 
one  label  each  pixel  in  the  scene  with  the  correct 
cluster  center?  This  question  has  been  studied  in 
some  depth  recently  by  Bauer  et  al.  (ref.  19).  They 
utilize  five  different  classification  algorithms. 

1.  Maximum  likelihood  per-point  classifier 

2.  ECHO  (refs.  20  to  22)  classifier 

3.  Layered  (ref.  23)  classifier 

4.  Minimum  distance  to  the  means,  per-point 
classifier 

5.  A parallelepiped  per-point  or  “levels”  classifier 

In  terms  of  accuracy,  the  minimum  distance  to  the 

means  (classifier  4)  ranks  better  (but  perhaps  not  sig- 
nificantly better)  than  maximum  likelihood,  and  the 
cost  is  about  one-third  as  much.  All  the  other 
classifiers  tested  fall  behind  maximum  likelihood 
and  cost  more  than  nearest  neighbor  classification. 
In  the  report,  the  authors  express  surprise  that 
nearest  neighbor  classification  is  more  accurate  but 
point  out  that  the  training  statistics  were  developed 
using  an  unnamed  clustering  algorithm  which  used 
minimum  distance  assignments;  the  means,  vari- 
ances, and  correlation  matrices  were  then  formed 
assuming  multivariate  normal  distribution  for  the 
training  data.  This  explanation  is  not  very  convinc- 
ing. In  the  opinion  of  this  author,  the  reason  that 
nearest  neighbor  assignments  are  superior  to  max- 
imum likelihood  lies  in  the  basic  failure  of  the 
assumptions,  especially  in  sample  sizes  encountered 
here  (i.c.,  the  training  classes  are  not  Gaussian). 

Another  report  by  Richardson  and  Pentland  (ref. 
24)  contains  a comparison  of  maximum  likelihood 
against  13  basic  methods,  6 of  which  are  actually 
spatial  cleanup  operations  which  follow  a maximum 
likelihood  (or,  in  principle,  any)  classification.  None 
of  these  methods  (except  the  maximum  likelihood 
classifier)  are  mentioned  previously;  thus,  it  is 
difficult  to  compare  the  results.  However,  the  “9- 
point  rule”  spatial  operations  classifier  always 
slightly  improved  the  classification  accuracy.  This 


work  is  difficult  to  evaluate  since  the  authors  reduce 
the  problem  to  an  unnatural  two-class  problem  from 
the  beginning,  presumably  to  give  the  9-point-rule 
classifiers  a better  chance. 

In  all  this  work,  the  ECHO  classifier  is  unique  in 
its  use  of  spatial  information.  An  earlier  technique 
related  to  ECHO  was  developed  by  Gupta  and  Wintz 
(ref.  25);  they  use  hypothesis  testing  to  grow  “blobs” 
with  similar  gray  levels  and  textures  and  develop  a 
classification  algorithm  which  performs  well  on 
aircraft-acquired  MSS  data.  The  main  problem  (other 
than  computer  time)  with  the  application  of  these 
ideas  to  Landsat  data  really  lies  in  the  fact  that  tex- 
tural information  in  a single  pixel  is  meaningless  and 
that  2-  by  2-pixel  areas  are  already  too  large.  In  addi- 
tion, it  is  not  credible  that  an  estimate  on  textural  in- 
formation obtained  from  a 2-  by  2-pixel  area  is  suffi- 
ciently significant  to  warrant  the  action  the  program 
takes  (growing  a field).  Setting  the  thresholds  to  pre- 
vent propagation  of  random  fields  leaves  more 
points  which  must  be  handled  on  a per-point  basis. 
Apparently,  the  thresholds  (which  must  be  set  by  the 
user)  in  the  tests  of  ECHO  reported  were  set  to 
achieve  at  about  two-thirds  the  cost  of  maximum 
likelihood  classification  with  about  a 1 -percent  loss 
in  classification  accuracy.  This  result  would  be  quite 
impressive  if  it  were  not  for  the  nearest  neighbor 
classification,  which  aciiieves  better  accuracy  in  half 
the  time. 

Still,  the  use  of  spatial  information  in  classifica- 
tion is  a good  idea.  Van  Rooy  and  Lynn  (ref.  26)  dis- 
cuss the  use  of  spatial  information  for  improving  the 
accuracy  of  a classification  They  assume  (without 
defining  "field”)  that 

1.  The  majority  of  fields  contain  many  data 
points. 

2.  The  initial  classification  was  accurate. 

These  assumptions  held  for  C-l  flight  data.  For 

Landsat  data,  the  first  assumption  usually  is  not 
valid.  What  replaces  it  here  results  from  a simple 
analysis  of  boundary  pixels. 


THE  BOUNDARY  PIXEL  MODEL 

Consider  two  adjacent  real  fields  containing  dis- 
tinct real  classes  oand  d.  A pixel  ft  on  the  real  bound- 
ary between  these  two  fields  will  be  averaged  by  the 
remote-sensing  hardware  and  appear  as  ft  — ac  + 
(I  — «)</,  0 < « < 1.  However,  it  is  possible  (and 
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likely,  if  c and  d are  well  separated)  that  b will  ac- 
tually be  spectrally  nearer  some  other  class  (say  class 
e)  so  that  b is  misclassified  in  class  eby  any  per-point 
classification.  Two  surprisingly  useful  observations 
can  be  made  from  this  simple  model.  The  first  is 
clear— apparent  boundary  classifications  should  be 
suspected.  (Suspect  a pixel  which  fails  to  be  classified 
like  at  least  two  of  its  four  nearest  neighbors,  and 
totally  reject  one  which  is  unlike  all  four.)  The  sec- 
ond is  based  on  looking  at  the  spectral  distance  from 
b to  r.  Since  b — c * (1  — a)(d  — r),  the  euclidean 
distance  from  b to  c is  simply  (1  — a)  \d  — c|,  and 
therefore  the  distance  from  b to  the  nearer  of  cor  dis 
not  greater  than  1/2|  d — r|.  This  model  of  a bound- 
ary pixel  thus  leads  to  a cluster-dependent  threshold. 
If,  for  each  cluster  center,  a.z  is  the  cluster  center 
with  | a - i|  largest,  then  r{a)  — 1/2  |o  — z|  is  a re- 
jection threshold  (for  cluster  a),  and  no  pixel  p 
should  be  assigned  to  a if  \a  — p\  > r{j). 

The  spatial-spectral  classification  technique  can 
now  be  outlined;  an  associated  program  is  described 
in  appendix  B.  First  determine  the  rejection 
threshold  for  each  cluster.  Recall  that  fields  were 
found  (in  the  spatial  starting  procedure/pixel  selec- 
tion step).  Spectrally  classify  each  field  by  nearest 
cluster.  If  the  distance  to  the  nearest  cluster  exceeds 
the  rejection  threshold  for  that  cluster,  increase  the 
number  of  clusters  by  1 (adding  the  field  mean 
which  was  rejected  as  a new  cluster  center)  and 
recompute  the  rejection  threshold.  When  all  field 
means  have  been  classified,  begin  a spatial  map  of 
labels.  Classify  all  unlabeled  (and  so  nonfcld  point) 
pixels.  Declassify  any  pixel  which  has  less  than  two 
neighbors  in  the  same  class  (considering  only  the 
four  nearest  neighbors).  Examine  the  four  neighbors 
of  each  unclassified  pixel  and  find  the  class  (of  these 
four)  to  which  the  pixel  is  nearest  but  which  it  has 
not  rejected.  If  one  is  found,  label  the  pixel  with  this 
label.  If  none  is  found,  restore  the  pixel's  old  label. 
Now  perform  a spatial  cleanup  operation.  First, 
declassify  a pixel  with  no  neighbor  (of  the  four 
nearest)  in  the  same  class;  then,  when  three  of  the 
four  neighbors  of  a declassified  pixel  are  in  the  same 
class,  transfer  this  class  label  to  the  pixel.  The  second 
operation  is  similar  to  what  Van  Rooy  and  Lynn 
(ref.  26)  propose,  with  the  following  revised  assump- 
tion. 

Many  fields  have  few  members,  but  each  field 

has  at  least  two.  It  may  be  that  the  fuzzy  set 

theory  (ref.  27)  can  be  applied  here  to  make 


more  sense  of  these  assumptions.  (This  work 
has  not  been  done.) 


THE  PHILOSOPHY  OP  CLUSTERING 
PICTORIAL  DATA 

In  the  book  “Patterns  of  Discovery"  (ref.  28), 
Hanson  examines  the  problem  of  how  an  analyst  can 
analyze  and  construct  hypotheses  about  the  data. 
Although  no  explicit  mention  of  cluster  analysis  is 
made,  the  book  contains  many  references  to  the 
problems  of  spatial  perception.  In  fact,  the  book 
opens  with  two  microbiologists  viewing  the  same 
image. 

Imagine  these  two  observing  a Protozoon- 
Amoeba.  One  sees  a non-celled  animal.  The 
first  sees  Amoeba  in  all  its  analogies  with 
different  types  of  single  cells  ....  Within  this 
class  Amoeba  is  distinguish  d only  by  its  inde- 
pendence. The  other,  however,  sees  Amoeba's 
homology  not  with  single  cells  but  with  whole 
animals  .... 

Although  the  two  view  the  same  image,  what  they 
perceive  as  significant  ot  relevant  is  not  the  same. 
Similar  points  are  stressed  by  Anderberg  (ref.  9,  pp. 
22  to  24).  Polya  (ref.  29.  p.  110)  makes  the  point 
simpler  and  with  more  generality.  “Let  us  not  neglect 
the  obvious  and  let  us  note;  two  people  presented 
with  the  same  evidence  may  honestly  disagree"  and 
(p.  Ill)  “.  . . two  persons  presented  with  the  same 
evidence  and  applying  the  same  patterns  of  plausible 
inference  may  honestly  disagree."  These  philosophi- 
cal observations  are  important  to  one  interested  in 
clustering,  for  they  bring  into  question  the  reality  of 
the  information  in  the  data.  A specific  example  in 
remote  sensing  (patterned  after  Hanson's)  follows. 
Three  observers  view  film  of  a four-pass  Land- 
sat  segment  taken  from  a region  of  agricultural 
interest.  One  observer  is  interested  in  labelling 
“wheat  vs.  other";  that  is,  the  temporal-spectral 
behavior  of  wheat  and  the  conditions  which 
prevailed  in  the  place  and  time  this  data  was  ac- 
quired (as  known  to  the  observer)  are  used 
along  with  spatial  associations  to  separate 
wheat  from  “other."  The  second  is  interested  in 
yield  estimation:  although  each  individual  pixel 
comprises  over  an  acre  of  real  area,  this  ob- 
server is  attempting  to  understand  the  fine 
structure  of  the  data  so  as  to  predict  each  pixel's 


651 


yield.  It  may  be  that  differing  amounts  of  soil 
moisture  and  variations  in  soil  type  or 
agricultural  practices  wilt  affect  yield,  and  this 
observer  is  attuned  to  perceive  such 
differences. 

A third  observer,2  interested  in  but  ignorant  of 
agriculture,  perceives  commonplace  spatial 
features  (fields,  roads,  clouds  and  cloud 
shadows  and  so  on),  and  distinguishes  between 
the  passes  as  being  “noisy”  or  “clean,”  the 
fields  as  being  rectangular  or  round,  wide  or 
narrow.  This  observer  sees  the  clustering  prob- 
lem more  as  one  of  filtering:  it  is  desired  to 
clean  up  the  noisy  fields,  to  enhance  the  fuzzy 
boundaries  and  somehow  transform  four 
passes  to  one  pleasing  and  plausible  image 
without  losing  much  information.  (Of  course, 
this  observer  knows  a precise  mathematical 
definition  of  “information"  which  is  probably 
unrelated  to  the  needs  of  the  first  two.) 

Let  us  adopt  the  third  observer's  orientation 
(despite  his  obvious  unsuitability  for  solving  the  real 
problem),  believing  that  a product  (a  clustering  of 
the  data)  which  pleases  him  will  be  useful  to  the 
others.  His  background  (not  merely  temperament,  as 
is  pointed  out  by  Polya)  leads  him  to  raise  questions 
such  as  “Given  a clustering  of  the  data,  which  is  the 
probability  that  a pixel  is  misclustered?”  If  this  ques- 
tion is  meaningful,  then  it  seems  that  an  objective 
function  could  be  defined  and  that  venous  cluster- 
ings of  the  data  could  be  compared,  with  the  best  of 
those  compared  being  selected.  Unfortunately,  the 
question  is  actually  meaningless.  Without  external 
labels,  the  correctness  of  a classification  of  a single 
pixel  has  no  meaning.  (Although  schemes  exist  to 
transform  labels  to  clusters,  a deeper  question  of  the 
reality  of  the  labels  remains.) 

Although  the  absolute  real  class  into  which  a pixel 
should  be  clustered  is  unknown,  there  are  samples 
from  the  same  real  class;  recall  the  assumption  that 
each  field  is  assumed  to  contain  exactly  one  real 
class,  so  that  all  pixels  from  the  same  field  are  in  the 
same  real  class.  Consider,  therefore,  a pair  of  pixels 
and  a clustering  of  the  data.  There  are  four  mutually 
exclusive  possibilities. 


2lt  is  unfortunate  that,  generally,  remote-sensing  software  is 
developed  by  people  with  mathematical-statistical-engineering 
backgrounds.  The  present  author  (a  typical  third  observer)  is  no 
exception. 


1.  The  pair  can  be  in  the  same  real  class  and  be 
clustered  in  the  same  cluster. 

2.  The  pair  can  come  from  two  different  real 
classes  and  be  clustered  differently. 

3.  The  pair  can  come  from  the  same  real  class  and 
be  clustered  differently. 

4.  The  pair  can  come  from  different  real  classes 
and  be  clustered  alike. 

The  last  two  cases  represent  errors.  The  prob- 
ability of  case  3 or  4 will  be  called  the  pair  probability 
of  misclassification  (PPMC).  Error  case  3 can  be 
estimated  using  samples  from  the  same  real  class;  a 
strategy  for  estimating  case  4 will  be  proposed  pres- 
ently. Clearly,  the  estimate  of  the  PPMC  can  be  used 
as  an  objective  function  to  choose  one  clustering 
among  many. 


AN  INTERNALLY  SUPERVISED 
CLUSTERING  TECHNIQUE 

The  pair  idea  leads  to  a technique  which  has  the 
internal  structure  of  a pattern  recognition  algorithm 
in  the  sense  of  Kaminuma  et  al.  (ref.  30).  However, 
since  paradigms  are  extracted  from  the  data  without 
external  supervision,  a user  sees  it  as  a clustering 
technique.  Others  have  used  internal  parameter-free 
similarity  to  determine  the  number  of  clusters  (ref. 
31,  for  example).  Furthermore,  the  use  of  pairs  in 
measuring  similarity  between  two  clusterings  was 
proposed  by  Rand  (ref.  32),  and  the  technique  was 
used  to  compare  clustering  programs  by  Dubes  and 
Jain  (ref.  13,  p.  260).  The  novelty  of  our  approach  to 
finding  the  clusters  is  that  the  pairs  for  supervision 
are  selected  from  a “perfect”  real  clustering,  and  the 
technique  tries  to  make  the  clustering  equal  real 
clustering. 

The  actual  technique  proceeds  as  follows.  Once 
the  fields  are  formed  and  labeled  by  the  starting  pro- 
cedure or  by  some  other  means  (ref.  33).  sets  of  5 
pixels  are  drawn  from  each  field  which  has  at  least  5 
elements.  The  sets  of  S are  called  test  sets.  Each  test 
set  contains  10  unordered  pairs  from  the  same  real 
class.  To  obtain  samples  from  different  real  classes, 
take  the  family  of  test  sets  and  rearrange  the  family 
(keeping  test  sets  together)  so  that  the  values  in 
channel  I of  the  first  element  in  each  test  set  are  in 
nondecreasing  order.  Let  there  be  s test  sets  and  let 
n - s/4.  To  obtain  samples  presumed  to  be  from 
different  real  classes,  select  the  five  from  test  set  k 
and  one  each  from  test  set  k — n and  A + n.  (Details, 
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such  as  what  to  do  when  one  of  k±  n test  sets  is  not 
a valid  index,  are  provided  in  appendix  C.)  This  pro- 
cedure furnishes  10  pairs,  which  probably3 * *  are  from 
different  but  not  too  different  real  classes. 

For  any  given  clustering,  it  is  now  easy  to  estimate 
the  PPMC  objective  function.  The  selection  of 
clusterings  to  evaluate  is  discussed  next.  Start  by 
using  the  spatial-spectral  starting  module  to  obtain, 
at  most,  200  starting  cluster  centers.  Classify  all  test 
pixels  and  count,  for  any  cluster  center,  the  number 
of  test  set  pixel  pairs  which  were  split  plus  the  num- 
ber of  different  field  pixel  pairs  which  were  not  split 
(these  are  errors).  At  the  same  time,  evaluate  the 
estimate  on  the  PPMC.  Eliminate  the  cluster  which 
has  the  largest  number  of  errors,  reclassify  test  pixels 
which  were  assigned  to  that  cluster,  and  continue  un- 
til one  cluster  remains.  The  clustering  with  the 
smallest  PPMC  wins.  (It  is  interesting  that  a cluster- 
ing with  but  one  cluster  has  a high  PPMC,  since  sam- 
ples from  different  real  classes  are  all  classed  alike.) 

A program  (named  AMOEBA)  to  implement  this 
technique  is  described  in  appendix  C. 


Table  I.— Description  of  Four  Test  Data  Sets 


Identitlcation 

Acquisition  dates a 

Comments 

1857 

76073.  76109, 
76154.  76191 

Data  exceptionally 
free  of  noise 

186$ 

76073.  76109, 
76164.  76190 

Typical  data 

1854 

76055.  76154, 
76164.  76199 

Passes  1,  2.  and  3 
noisy:  typical  Kan 
line  noise  in  pass  2 

1861 

76056.  76128. 
76164.  76182 

All  passes  noisy; 
much  of  the  seg- 
ment fallow  or 
pasture 

*Thc  first  two  digits  represent  the  year,  the  Iasi  three  represent  the  Julian  date 


3 A more  reliable  way  of  selecting  pairs  from  different  real 

classes  is  needed  This  “probably"  situation  is  undoubtedly  the 

weakest  and  least  understood  feature  of  the  clustering  technique. 


RESULTS 

The  new  clustering  program  AMOEBA  was  tested 
on  four  Landsat  segments  from  the  US.  Great 
Plains.  The  data,  which  are  described  in  table  1,  were 
furnished  by  the  NASA  Johnson  Space  Center 
(JSC).  Before  entering  the  program,  a transformation 
somewhat  like  the  Kauth-Thomas  transformation 
(ref.  IS)  is  used  to  halve  the  dimensionality  of  the 
problem:  the  linear  combinations  (c,  + c2  + cj)/4  and 
(-c2  + fj  + c4)/3  are  roughly  the  brightness  and 
greenness;  since  channel  1 is  used  to  sort  test  sets  to 
obtain  pairs  from  different  real  classes,  the  “best"  ac- 
quisition (here  taken  to  be  pass  3)  is  transformed  to 
channels  1 and  2. 

In  table  II,  the  execution  characteristics  of 
AMOEBA  are  described.  This  version  of  the  pro- 
gram includes  performing  all  preprocessing  and  mak- 
ing one  universal  formatted  image  tape  for  display 
on  NASA  JSC  hardware.  The  program  is  written  in 
Fortran  language  and  compiled  using  the  Fortran  H 
extended  optimizing  compiler  (with  OP'l  - 2).  (Of 


Table  //. — Execution  Characteristics  of 
AMOEBA  Version  6 


Characteristic  Value 


Computer  used  AMDAHL  470  V/6 

Execution  time  per  ll7-line  by 

196-pixel  segment,  sec  147  + 2 4 x no.  of  passes 

Memory  used,  kbytes 200  + 100  x no.  of  passes 

Compilation  lime,  sec 8 


Table  Hi — AMOEBA  Version  6,  Two-Pass  Data 


Characteristic  Segment 


is.v 

IHt>5 

ISM 

IS6I 

Starting  no.  of  clusters 

360 

337 

341 

338 

No.  of  test  pixels 

1500 

1430 

1270 

128$ 

Tinal  no.  of  clusters 
Cluster  size 

II 

22 

12 

16 

Largest 

5286 

4844 

7040 

7093 

Smallest 

74 

36 

75 

104 

No  of  unclassified  pixels 

II 

5 

22 

11 
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course,  the  compilation  needs  to  be  done  only  once.) 

In  tables  111  and  I\  , typical  results  are  shown.  The 
four-pass  tests  used  all  the  data  from  table  I.  Two- 
pass  tests  used  passes  2 and  3.  The  results  show  one 
rather  surprising  characteristic,  in  three  of  the  four 
tests,  fewer  clusters  were  found  using  four  acquisi- 
tions than  when  using  two.  (Of  course,  the  difference 
may  not  be  significant.)  In  segment  1854,  however. 
37  clusters  were  found  in  four-pass  data  and  only  12 
in  two-pass  data. 

The  comparison  of  line-printer  cluster  maps  with 
1976  ground  truth  was  gerer  illy  very  encouraging. 
As  might  be  expected,  the  two-pass  tests  yielded 
much  less  accurate  approximations  (particularly  in 
segment  1854,  where  the  two  acquisitions  are  only  10 


7 \Hl.t  / 1 . — .4 MOEBA  I onion  ft,  Emir-Bass  l)aia>‘ 
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days  apart).  In  figure  1,  typical  film  products  of  the 
output  of  the  clustering  are  displayed. 


SUMMARY 

Two  related  aspects  of  using  information  from  the 
spatial  domain  have  been  presented.  The  first  con* 
cerns  finding  sets  of  pixels  which  are  spatially  and 
spectrally  associated;  these  sets  may  be  called 
“fields.”  Three  assumptions  on  the  relation  between 
(unknown)  real  classes  and  fields  are  formalized.  On 
these  assumptions,  one  is  able  to  automatically  ex* 
tract  samples  of  the  data  known  to  represent  each 
real  class;  in  addition,  popular  field  mean  vectors  are 
proposed  as  starting  cluster  centers  for  an  iterative 
clustering  program  to  process. 

The  second  use  of  spatial  information  concerns 
the  problem  of  classifying  mixture  pixels.  A simple 
model  for  the  boundary  between  two  real  fields  leads 
to  logic  for  detection  of  probable  erroneous 
classifications.  The  fields  found  by  the  starting  pro* 
cedure  summarized  previously  are  classified  as 
monoliths;  and  all  other  pixels  are  classified,  yielding 
a map  of  labels.  Any  pixel  not  having  at  least  two 
neighbors  in  the  same  class  is  suspected  of  being  in- 
correctly classified;  the  boundary  model  furnishes  a 
method  for  reclassifying  such  pixels. 

Incidental  to  this  work,  it  was  noticed  that  sam- 
ples of  the  data  which  probably  come  from  different 
real  classes  could  be  obtained.  First,  the  samples 
could  be  ordered  from  the  fields  on  some  (essentially 
arbitrary)  one-dimensional  attribute.  Selected  pairs 
spread  out  in  this  order  can  be  believed  to  come  from 
different  real  classes.  Using  these  pairs,  together  with 
the  samples  from  the  same  field  (and  thus  real 
classes),  one  can  evaluate  an  arbitrary  clustering  of 
the  data  by  estimating  the  probability  that  a pair 
from  the  same  real  class  is  clustered  differently  plus 
the  probability  that  a pair  from  different  real  classes 
is  clustered  alike.  At  the  same  time,  clusters  which 
are  most  involved  in  splitting  pairs  from  the  same 
real  class  or  which  have  the  least  discriminatory 
ability  for  pairs  from  different  real  classes  are  iden- 
tified and  eliminated  (thereby  reducing  the  number 
of  clusters).  The  clustering  with  the  lowest  pair- 
misclustering  probability  is  selected. 

Based  on  these  ideas,  the  clustering  program 
AMOEBA  is  described.  Internally,  the  program  is  a 
pattern  recognition  program;  but,  from  without,  it 
appears  to  be  an  unsupervised  clustering  program. 


The  program  is  fast  and  automatic;  thus,  no  choices 
(such  as  arbitrary  thresholds  to  set  or  split/combine 
sequences)  need  be  made.  The  difficult  problem  of 
finding  the  number  of  clusters  is  solved  automat- 
ically. At  the  conclusion  of  the  program,  all  points  in 
the  scene  are  classified;  however,  a provision  is  in- 
cluded for  a “reject"  classification  of  some  points 
which,  within  the  theoretical  framework,  cannot  ra- 
tionally be  assigned  to  any  cluster. 
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Appendix  A 

A Spatial  Starting  Procedure 


Notation:  Suppose  the  data  to  be  analysed  are  in  an 

array  of  vectors,  t - 1, . . . , r and  J - 1 c. 

Also  available  is  an  array  (jfor  saving  labels. 

Step  /.  Cover  the  boundary. 

a.  Let  u - re/2,  m - rc/S,  and  / - re/4.  Sum  the 
squared  distance  |d^  - d//f,|J  over  every  fifth  row 
and  every  fifth  column  and  divide  by  the  number 
summed,  obtaining  an  estimate  of  the  intrinsic 
variability  n of  the  data.  Set  A — 2 n;  h is  the  horizon* 
tal  tolerance. 

b.  Set  v — 136/10;  v is  the  vertical  tolerance.  Ini* 
tialize  the  array  ltJ  to  all  zeros.  For  column  j - 2. 

. . . , c and  row  / - 2 r.  compare  |d,,  - d,  ,_  ,|2 

> h:  if  so,  set  , ” I and  , • 1 . Then  test \ou  - 
d,_  1(/|  > v;  if  so,  set  lt  J — 1 and  /,_  X J — 1 . When  all 
pixels  have  been  processed  in  this  way,  pass  through 
the  inside  of  the  array  ltj  and  fill  in  the  holes.  If  lt  J — 
1 , skip;  otherwise,  test  the  neighbor  above  and  beiow. 
seeing  if  both  are  marked  (i.e.,  - 1)  and,  if  not.  test 
the  left  and  right  neighbors.  If  either  pair  is  marked, 
set  lt  J ■*  1.  Count  the  number  n of  points  with  L.  — 
1 . If  u > n > /.  exit  to  Step  2;  otherwise,  replace  n by 
(6  + A*mAi)/2  and  repeat  Step  b.  Call  pixels  at  (/./) 
with  I'j  - 0 pure  pixels. 

Step  2.  Label  the  fields  and  extract  test  sets. 

Comment:  In  this  step,  connected  sets  of  pur*  pix- 
els called  fields  are  marked  in  the  array  However, 
the  maximum  field  size  is  limited  to  SO  pixels.  At  the 
same  time  as  fields  are  being  formed,  S test  pixels  are 
extracted  from  each  field  which  contains  at  least  S 
and  are  stored  in  an  array  of  test  sets.  The  slightly 
obscure  program  to  perform  the  field  labeling  is  ac* 
tually  an  efficient  maze-solving  algorithm. 

a.  Initialize  a stack  S of  fifty  6-vectors  by  setting 

S(Dj>)  — 2 for  D - 1 ,2,3,4  andp  — 1 50.  Initial- 

ize a number-of*test*sets  counter  to  t — 0 and  a num* 
ber-of-fields  counter  to/  — 0.  Set  the  row  counter  to 
i - 2. 

b.  Let  the  column  counter  be  j — 2. 

c.  If  l,  j + 0,  the  pixel  at  i.j  is  not  a pure  pixel,  so 
proceed  to  Step  2m.  Otherwise,  initialize  a stack 
pointer  p and  a stack  index  x both  to  zero  and  set  Iq 
- /.^,-;.and</-0. 


d.  Set  m‘  1.  If />-  50,  go  to  Step  2i.  Otherwise, 
set  p — pr  1 and  save  the  coordinates  (IqJq)  in 
S{Sj>)  and  S(6,p). 

e.  Search  directions  D - 1 (left),  D - 2 (right), 
D - 3 (up),  and  D - 4 (down)  except  for  direction 
d,  if  the  location  searched  is  off  the  scene,  go  to  Step 
2f.  If  it  is  not,  examine  the  label  at  the  point.  If  the 
label  is  negative,  go  to  Step  2k.  If  it  is  positive,  go  to 
Step  2f.  Otherwise,  change  the  label  at  the  l Ovation  to 
1,  set  S(Dj>)  — 0,  and  go  to  Step  2g. 

f.  Set  S{Dj>)  - I. 

g.  Search  the  next  direction  (repeating  Step  2e). 
When  all  directions  have  been  searched,  locate  the 
pixel  pointed  to  by  the  stack  index  x.  the  pointer  is  to 
location  (u  -I*  I,  v),  where  u sx  (mod  4)  and  v — 
(x/4)  + 1.  The  search  is  performed  by  incrementing  x 
until  x>  200  (Step  2i);  S(w  + I,  v)  — 0 (Step  2h); 
S(w+  1,  v)  - I (next x)  or  S(w+  1.  v)  - 2 (Step 2i). 

h.  Set  x - x + 1;  set  d - the  reflection  of  direc- 
tion u + 1 , and  construct  the  address  (IqJq),  which  is 
the  stacked  pure  pixel  location  pointed  to  by  u -f  1 
modifying  (S(5.v),  S(6, »•)).  Go  to  Step  2d. 

i.  All  pure  pixels  (up  to  SO)  in  the  current  compo- 
nent have  been  marked;  they  must  now  be  labeled.  If 
/ « -400,  exit  to  Step  2n.  Otherwise,  set  / - / - 1 
and  set  L — /. 

j.  Go  through  the  s — 1 to  p points  stacked  and  set 

(a*  .1  “ L,  at  the  same  time,  restore  S(D.s)  - 2 

for  D — 1 .2,3,4.  Proceed  to  Step  21. 

k.  Set  L - the  negative  label  found  (in  Step  2e). 
Go  to  Step  2i- 

l.  If  t > 1495  or  if  p < 5.  go  to  Step  2m.  Other- 
wise, set  q “ (p  — l)/4.  * — 1 + »q(«  - 0 4), 

and  save  the  set  of  S test  pixels  (one  test  set)  by  mov- 
ing data  in  location  |S(S.A)vS(6^k))  to  the  test  set  ar- 
ray. Set  f - t + 5. 

m.  Set  j m j + 2;  if  j < c.  repeat  Step  2c.  Other- 
wise, set  j - 2,  / — i + 2.  If  / < r,  repeat  Step  2b; 
otherwise,  exit  to  the  next  step. 

n.  The  i/S  test  sets  of  S pixels  (from  the  same 
field)  and  m - —L  labeled  fields  have  been  formed. 
Note  that  t «6  1500  and  m < 400. 
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o.  tn  another  pan  through  the  data  and  labels, 
form  the  sum,  the  count,  and  then  the  means  of  each 
fleld.  Let  the  mean  vectors  be  / * 1 m. 

Step  X Select  the  cluster  centers.  Copy  the  m field 
meant  to  fluster  center  vectors  C,,  / - 1, . . . , m,  and 
initialize  kt  ™ 1, 1 — 1, . . . , m.  f-n  nf  — m.  Suppose 
that  at  most,  n starting  duster  centers  are  desired  and 
that  no  duplicates  are  wanted. 

a.  Classify  all  r test  pixels  by  nearest  cluster 
center  (euclidean  distance);  count  the  number  t , of 
times  cluster  center  C,is  the  object  of  a classification. 
Eliminate  any  cluster  with  t,  - 0 (by  setting  k,  - 0), 
decrementing  nf  each  time  this  happens.  (Indden- 
tally,  this  will  eliminate  duplicates  in  the  family  of 
cluster  centers.)  Set  e — 0. 

b.  Replace  e by  e + 1 and  set  / — 1. 

c.  If  r,  +e,  skip  to  Step  3d;  otherwise,  set  r,  - 0 
and  reassign  test  pixels  previously  classed  in  cluster 
C/,  updating  the  counter  for  centers  which  receive 


new  classifications.  Set  nf  - nf  — I and  exit  to  Step 
3e  if  nf  «S  m. 

d.  Set.'  - /+ 1; if#  > r,  set  em  e + I and/  - I. 
Repeat  Step  3c. 

e.  The  nf  « n starting  clusters  have  been  found. 

Rearrange  them  to  be  C| C„  and  set  n—  nf. 

Usage:  The  following  options  apply. 

Option  /.  The  user  specifies  the  number  N of 
representative  (pure)  pixels  desired;  Steps  I and  2, 
parts  a through  n are  executed.  Then,  pixels  are 
picked  roundrobin  ,'rom  the  test  sets  until  either  N 
have  been  picked  or  all  r have  been  picked.  (Emit  a 
warning  if  N < t/5.\ 

Option  2.  The  number  n of  starting  cluster  centers 
is  specified;  Steps  1 through  3 are  performed. 

Clearly,  Options  1 and  2 can  be  combined.  A third 
use  of  the  starting  procedure  is  mentioned  in  appen- 
dix C. 


Appendix  B 

A Spatial  Classification  Program 


In  this  program,  it  is  assumed  that  the  program  in 
appendix  A has  been  executed,  giving  fields  (in  a 
map  of  labels  ly)  and  field  means  (in  an  array  M,,  / ■* 

1 m).  Also,  cluster  centers  Ch  / - I n.  are 

given  (by  an  intervening  clustering  program).  Should 
no  fields  be  available,  the  performance  of  the  pro- 
gram will  be  degraded  but  it  will  still  work. 

Step  I.  Determine  a rejection  threshold  and 
classify  and  label  fields. 

a.  For  / - 1 »,  define  r,  - |C,  - Cy|2/4 

whereIC,-  C,|  »|C,-  C*|  fork  - 1 «. 

b.  Classify  <xch  field  k by  nearest  neighbor  to  a 
cluster  center  C,;  if  |M*  - C,|2  > rn  increase  the 
number  n of  clusters  by  1,  set  C„  — M*  and  repeat 
Step  la. 

c.  Go  through  the  map  of  labels  ^ and  replace  a 
field  label  (negative  integer)  by  the  cluster  of  the 
field  k.  a boundary  label  (I)  by  0. 

Step  2.  Label  the  remainder  of  fields  and  check 
boundaries. 

a.  Classify  all  unlabeled  pixels  by  nearest  unre- 


jected neighbor;  transfer  the  dasjification  to  the 
label  map. 

b.  Declassify  any  pixel  which  has  lea  than  two  of 
its  four  nearest  neighbors  in  the  same  class.  (This 
process  is  order-dependent;  it  is  recommended  that 
the  testing  go  down  rows.)  Declassify  the  pixel  t > 
replacing  the  label  by  its  negative. 

c.  Proceed  down  columns  and,  for  each 
declassified  pixel,  reclassify  it  by  finding  the  nearest 
unrejected  class  of  the  four  spatially  nearest  neigh- 
bors. If  this  attempt  to  reclassify  the  pixel  fails, 
restore  the  original  classification. 

d.  Again,  go  through  the  map  of  labels  by  col- 
umns and  label  with  zero  any  pixel  whose  four 
nearest  neighbors  do  not  have  the  same  label. 

e.  The  preceding  steps  have  been  rather  conserva- 
tive. It  may  be  desirable  at  this  point  to  restore  some 
classifications.  One  technique  is  to  label  an  unlabeled 
pixel  with  the  class  of  a majority  of  its  neighbors.  For 
most  accurate  maps,  the  three-of-four  neighbors  cri- 
terion is  recommended. 


658 


Appendix  C 

AMOEBA:  A Spatial  Clustering  Program 


Step  0.  Preprocess  the  data,  perhaps  reducing  the 
dimensionality.  Make  sure  that  information  most 
likely  to  separate  clusters  of  interest  is  contained  in 
channel  I of  the  data  vector. 

Step  I.  Set  M • 200  and  execute  the  program  of 
appendix  A.  On  exit,  a map  (.of  labels  is  passed  to 
Step  3.  The  n cluster  centers  C,  and  / test  pixels  (in 
test  sets  of  S each)  are  used  here.  Sort  the  test  sets  so 
that  the  value  of  the  first  test  pixel  in  each  test  set  in 
channel  I is  nondecreasing.  (Keep  the  test  sets 
together,  of  course.) 

Step  2.  Determine  the  number  of  clusters  and 
what  they  are. 

a.  There  are  n starting  cluster  centers  Ct C„. 

Let  kt  — t — 1 n.  Classify  all  test  pixels  t,  by 

nearest  cluster  center:  let  Lt  denote  the  label  of  the 
cluster  center  t,  that  is  nearest.  Let  min  “ 100000. 
/ “ n. 

b.  Set  n,  ■»  0 ,na»  0 .j  - l,  and  sh  — 0.  h - / 

«. 

c.  Examine  the  classification  of  test  set  ( tr  , 
r/+4}.  Let  /,  -J  + r/4;  if  7,  > r,  let  7,  - j - t/2.  Also, 
let  >2  - j - r/4;  if  j2<  I . let  ./}  - j + 1/2.  Compare  Lr 

L.+4  with  LJX  and  Ln-  for  m " /. . . . J + 4.  if  v — 

Lh  . set  «,“«,+  1 and  s,,  - s,.+  10;  repeat  for 
Lj,.  Then,  compare  all  10  distinct  test  set  classifica- 
tion pairs:  each  time  a pair  is  split  (an  event  which 
involves  two  cluster  centers),  set  na  m na  + 1 and 
(for  each  of  the  two  clusters  m.v)  set  sM  - sM  - 10,  jv 
• j„  - 10  Finally,  if  a test  set  pair  is  (cor.ectly) 
clustered  in  the  same  cluster  p , set  sM  « + 1 . Set  j 

■ j + 5.  If  j < t.  repeat  Step  2c;  otherwise,  go  to  the 


next  step. 

d.  Find  the  cluster  CM  with  kM  *0  having  sM 

minimum  and  set  k ■ ” 0;  save  the  index  m Just 
eliminated.  Compare  nt  - »,  with  min:  If  «„  — n,  < 
min,  set  min  “ ~ »,and  set  /,  * /.  Set  / • / - I . If 

/ > 2,  repeat  Step  2b;  otherwise,  continue  with  Step 
2e. 

e.  The  number  of  clusters  is  I,  (except  for  clusters 
added  by  Step  3);  go  through  the  cluster  centers  C,, 

/ I n,  and  eliminate  those  with  saved  indices 

n encountered  before  t,  were  obtained.  Rearrange 
the  cluster  centers  for  more  efficient  search. 

Step  l Set  n ■ there  are  n duster  centers  C(. 
....C,,.  Recall  the  fields  were  labeled  in  ( y.  Execute 
the  classification  program  in  appendix  B. 

Step  4.  A user-dependent  step:  various  display  pro- 
ducts and  statistical  summaries  can  be  performed 
now.  The  version  described  in  this  paper  prints  (on 
the  line  printer)  a cluster  map  and  a summary  of 
counts  of  the  number  of  elements  in  each  cluster. 
Also,  a universal  image  formatted  magnetic  tape  is 
written. 

Usage:  Three  options  are  available. 

Option  1.  In  the  preferred  mode,  ;he  program  finds 
the  number  of  clusters. 

Option  2.  In  a possibly  suboptimal  mode,  the  pro- 
gram seeks  a *n  clusters. 

Option  .1.  In  a probably  suboptimal  mode,  the  pro- 
gram seeks  exactly  n clusters. 

These  options  are  easily  implemented  and  are  in- 
corporated in  Version  6 of  AMOEBA. 
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On  Evaluating  Clustering  Procedures  for  Use  in 

Classification* 
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ABSTRACT 

Evaluation  of  clustering  as  a preprocessing  step  in 
classification  is  discussed.  Special  emphasis  is  given 
to  the  case  in  which  a limited  number  of  labeled  sam- 
ples are  available  for  the  evaluation.  An  estimated 
probability  of  correct  classification  and  a variance  of 
proportion  estimate  (a  measure  of  cluster  purity)  are 
proposed.  Three  cluster-labeling  techniques  are  de- 
scribed; two  are  presented  in  an  application  and  one 
is  theoretically  developed  to  measure  labeling  errors 
on  a pcr-cluster  basis. 


1.  INTRODUCTION 

In  highly  complex  classification  problems,  a 
method  of  partitioning  the  samples  into  subpopula- 
tions or  clusters  and  labeling  each  subpopulation  is 
sometimes  used.  Each  member  of  a subpopulation 
then  assumes  the  label  of  that  subpopulation.  Errors 
may  occur  from  two  sources:  (1)  the  labeling  of  the 
subpopulations  may  be  inaccurate  and  (2)  the  parti- 
tioning into  subpopulations  may  not  be  pure.1  In  the 
latter  case,  the  question  of  the  appropriateness  of  a 
subpopulation  label  must  be  considered.  In  this 
paper,  the  problem  of  evaluating  clustering 
algorithms  and  their  respective  computer  programs 
for  use  in  this  type  of  classification  procedure  is  ad- 


*This  material  was  developed  under  NASA  Coniracl  NAS 
9-1 5200  and  prepared  for  the  Earth  Observations  Division,  NASA 
Johnson  Space  Center,  Houston.  Texas. 

“Lockheed  Electronics  Company.  Inc.,  Houslon,  Texas. 

'’Lockheed  Missiles  and  Space  Company,  Palo  Alto,  Califor- 
nia. 

'The  word  "pure"  is  used  to  imply  that  all  elements  of  the 
duster  arc  samples  from  the  same  generic  class  of  objects. 


dressed.  The  major  problem  in  cluster  evaluation  is 
the  determination  of  a measure  of  excellence. 
However,  in  clustering  for  classification,  the  prob- 
ability of  correct  classification  (PCC)  immediately  is 
suggested  as  the  ultimate  measure  of  accuracy  on 
training  data.  A means  of  implementing  this  cri- 
terion and  a measure  of  cluster  purity  are  discussed 
in  section  2,  and  examples  are  presented  in  section  4. 
A procedure  for  cluster  labeling  that  is  based  on 
cluster  purity  and  sample  size  is  presented  in  section 
3. 

Throughout  the  paper,  a two-class  classification 
problem  is  assumed;  however,  much  of  thir  develop- 
ment is  readily  applicable  to  the  general  classification 
problem. 


2.  CLUSTER  EVALUATION  CRITERIA 

Clustering  algorithms  and  their  respective  com- 
puter programs  group  data  points  according  to 
characteristics  of  the  respective  points.  For  example, 
some  algorithms  group  points  that  are  numerically 
similar,  whereas  others  weight  numerical  proximity 
with  the  spatial  proximity  of  where  the  data  point 
was  observed.  Regardless  of  the  grouping  philosophy 
employed  by  the  algorithm,  the  question  of  cluster 
effectiveness  in  classification  is  valid  and  has  been 
inadequately  developed;  in  other  words,  criteria  are 
needed  for  the  comparison  of  clustering  algorithms. 
Two  such  criteria  arc  presented  in  the  following 
paragraphs  with  a discussion  of  theoretical  con- 
siderations concerning  the  merit  of  each.  Although 
computer  programs  do  not  always  represent  an  op- 
timal implementation  of  an  algorithm  (i.e.,  defining 
an  optimal  implementation  criterion  is  another  prob- 
lem), no  distinction  will  be  made  between  an 
algorithm  and  a program  for  implementing  the 
algorithm.  Hence,  effectively,  clustering  programs 
will  be  compared.  An  evaluation  of  programs  is 


661 


recommended  for  eliminating  errors  and  inefficien- 
cies caused  by  poor  programing. 

The  PCC  criterion  for  cluster  evaluation  in 
classification  is  theoretically  optimal  in  that  it  is  ex- 
actly the  criterion  which  determines  the  accuracy  of 
a classification.  However,  an  environment  will  be 
assumed  where  there  is  an  abundance  of  data  for 
which  the  true  classification  is  unknown,  whereas 
the  true  classification  is  known  for  a relatively  small 
subset  of  data.  This  creates  a two-sided  evaluation 
problem.  If  only  the  labeled  data  (the  subset  for 
which  the  true  classification  is  known)  are  used  for 
the  evaluation  process,  then  the  clustering  algorithm 
will  not  be  operating  in  a typical  environment  (too 
few  data  points);  on  the  other  hand,  if  all  the  data 
points  are  used  in  clustering,  the  true  cluster  struc- 
ture and,  therefore,  the  true  cluster  label  will  be 
unknown.  Either  evaluation  design  introduces  a 
source  of  procedural  error  into  the  evaluation.  The 
problem  with  the  first  evaluation  technique  is  that  its 
error  source  is  more  fundamental;  by  not  clustering 
in  a typical  environment,  the  procedure  is 
systematically  biased  toward  the  peculiarities  of  the 
clustering  algorithm  operating  on  a small  data  set. 
The  second  evaluation  technique  will  be  addressed 
here,  and  the  problem  of  error  in  cluster  labeling  will 
be  dealt  with  as  a problem  in  statistical  estimation. 
As  a general  rule,  experimental  conditions  must  be 
identical  to  performance  conditions  for  effective 
evaluations. 

The  problem  of  measuring  classification  accuracy 
is  dependent  on  the  errors  in  labeling  clusters  and  the 
applicability  of  labels  to  mixed  clusters.2  To  ac- 
complish this  labeling  in  an  unbiased  manner,  three 
techniques  have  been  developed.  The  first  is  to  select 
the  labeled  sample  nearest  the  cluster  mean.  (Either 
/t  or  /2  metrics  are  usually  used.)  This  technique  is 
referred  to  as  the  nearest  neighbor  labeling  (NNL) 
procedure  and  is  favored  because  of  its  ease  of 
automation.  The  second  technique  is  to  observe  all 
labeled  samples  that  fall  in  a cluster  and  follow  a ma- 
jority rule  (MR)  procedure.  This  technique  requires 
two  default  procedures  characterized  by  the  follow- 
ing examples. 

a.  In  the  event  of  a tie,  the  sample  farthest  from 
the  sample  mean  will  be  omitted  from  consideration. 

b.  If  no  labeled  samples  fall  in  a cluster,  the 


2“ Mixed”  dusters  arc  dusters  that  are  not  pure;  that  is.  all  ele- 
ments of  the  cluster  arc  not  samples  from  the  same  generic  class. 


cluster  is  labeled  by  the  NNL  procedure. 

This  technique  will  presumably  minimize  the 
probability  of  mislabeling  a cluster.  This  probability 
of  mislabeling  a cluster  is  a function  of  the  number  of 
labeled  samples;  h .nee,  there  is  a trade-off  in  ac- 
curacy of  cluster  ' tbeling  and  expense  of  labeling 
samples.  This  technique  is  more  difficult  to  imple- 
ment in  an  automated  computer-oriented  procedure 
and  requires  many  more  labeled  samples  to  effect  a 
measurable  difference  in  labeling  accuracy.  The  third 
technique  consists  of  labeling  samples  sequentially 
within  each  cluster  until  a labeling  confidence  of  pre- 
designated accuracy  is  achieved.  This  technique  is 
developed  in  section  3. 

A word  of  warning  is  required  about  labeling  sam- 
ples. If  the  labeled  samples  are  not  proportionally 
representative  of  the  population  of  samples  (with 
respect  to  classification  categories),  then  labeling 
biases  will  influence  cluster  labels  and  give  rise  to  er- 
rors other  than  those  due  to  sampling  variance. 
Therefore,  randomized  sampling  schemes  should  be 
employed  to  select  samples  for  labeling. 

Although  PCC  is  the  most  direct  measure  of  ac- 
curacy in  the  clustering-classification  procedure,  it 
does  net  measure  the  cluster  purity  or  the  adapt- 
ability of  the  technique  to  relaxation  of  usual  pro- 
cedures, to  bias  sampling,  to  incrementing  of  cluster 
parameters,  etc.  As  a pathological  example  of  how 
PCC  confounds  cluster  purity  with  cluster  labeling 
accuracy,  consider  a two-category  case  where  every 
cluster  is  labeled  (by  whatever  method)  as  belonging 
to  the  first  category.  Then,  given  an  equal  number  of 
samples  from  each  category,  the  PCC  is  0.5,  regard- 
less of  whether  the  clusters  are  extremely  mixed  (i.e., 
each  cluster  has  exactly  50  percent  samples  from  the 
first  category)  or  relatively  pure  (i.e.,  one-half  the 
clusters  have  exactly  25  percent,  say,  from  the  first 
category,  and  the  other  one-half  have  exactly  75  per- 
cent from  the  first  category). 

Relatively  pure  clusters  are  thought  to  lend  to  the 
procedure  a stability  or  low  variance  of  error  that 
cannot  be  achieved  by  proportional  labeling  of  mixed 
clusters  and  also  to  lend  credibility  to  the  concept  of 
clustering  as  an  effective  partitioning  procedure 
before  classification.  The  variance  (VAR)  of  duster 
proportion  is  proposed  as  a measure  of  cluster  purity. 
Precise  definitions  of  PCC  and  VAR  follow.  Let 

a.  ,V.  denote  the  number  of  samples  in  cluster  i 

b.  .1/,  denote  the  number  of  labeled  samples  in 
cluster  / 

c.  /’  denote  the  proportion  of  labeled  samples  in 
cluster  / which  were  labeled  correctly 
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Then,  by  definition. 


The  VAR  is  simply  the  variance  of  a proportion 
estimator  stratified  across  clusters,  where  the  propor- 
tion is  that  percentage  of  each  cluster  that  is  labeled 
correctly.  This  statistic  is  independent  of  the  claster- 
labeling  procedure  or  the  accuracy  of  the  label  and 
reflects  cluster  purity  weighted  by  the  number  of 
labeled  samples  in  the  dusters.  The  weighting  comes 
from  the  A/,  — 1 term  and  magnifies  the  weight 
given  to  clusters  with  very  few  labeled  samples,  but 
this  magnification  has  not  been  precisely  analyzed. 


3.  ESTIMATING  ERRORS  FOR  CLUSTER 
LABELING 


3.1  Introduction 

The  following  development  is  a fairly  general 
Bayesian  model  for  calculating  the  probability  of  cor- 
rectly labeling  a cluster  by  randomly  selecting  and 
labeling  a subset  of  size  //  of  that  cluster.  This  model 
can  be  relaxed  to  samples  "near"  the  cluster  if  it  is 
assumed  that  those  samples  near  the  cluster  make  up 
a subpopulation  of  proportions  identical  to  those 
within  the  cluster. 

The  purpose  of  a Bayesian  development  to  cluster 
labeling  is  to  apply  prior  experience  (on  similar  data) 
as  to  frequencies  of  various  cluster  purities  to  current 
labeling.  This  prior  information  is  necessary,  as  will 
be  seen,  to  provide  a probability  confidence  that  the 
cluster  is  labeled  correctly;  or,  conversely,  it  may 
provide  thresholds  on  the  proportions  of  observed 
categories  to  determine  a necessary  sample  size  for 
“confident"  labeling  in  a sequential  labeling  pro- 
cedure. 


The  model  is  given  in  sections  3.2  and  3.3.  Section 
3.4  gives  the  general  example  of  a symmetric,  quad- 
ratic prior  density.  Section  3.4,  step  a,  contains  four 
cases  that  demonstrate  the  generality  of  this  exam- 
ple; and  section  3.S  consists  of  specific  solutions  for 
n » I and  n *■=  2.  The  reader  may  wish  to  substitute 
values  of  c from  section  3.4,  step  a,  into  section  3.5  to 
see  the  effect  of  different  a priori  densities. 

An  itemized  format  is  used  in  this  section  to  facili- 
tate later  referencing. 


3.2  Notation 

Let  0,  denote  the  true  proportion  of  category  1 in 
cluster  /;  0 ^ ^ 1.  Since  clusters  are  dealt  with  in- 

dividually and  identically,  the  subscript  will  be 
dropped;  0 = 0(.  Further  notation  follows. 

a.  n — M,  is  the  number  of  labeled  samples  in 
cluster  /. 

b.  .v  **  v(  is  the  number  of  category  I labeled  sam- 
ples in  cluster  i. 

The  cluster  purity  0 is  treated  as  a random  variable 
to  reflect  the  fact  that  clusters  assume  particular 
purities  with  ascertainable  frequencies.  Also  let  s(0) 
denote  the  generalized  (possibly  discrete)  probability 
density  function  (p.d.f.)  of  0.  The  p.d.f.  g represents 
a priori  information  about  the  capability  of  the 
algorithm  to  generate  "pure"  dusters  and  will  proba- 
bly have  to  be  estimated  from  empirical  studies.  The 
fact  that  different  clustering  algorithms  produce 
clusters  of  different  purities  is  reflected  in  the  differ- 
ing values  of  g for  each  algorithm. 


3.3  Decision  Rule  Development 

This  section  will  establish  a decision  rule  in  its 
most  general  form  for  labeling  dusters. 

a.  Assume  each  sample  from  duster  / is  inde- 
pendent and  is  in  category  l with  probability  0 (a 
Bernoulli  process). 

b.  It  follows  from  the  assumption  in  step  a that  v 
is  binomially  distributed  with  parameters  0 and  //. 


-v  ^ /(Jr  |0) 


= ( ")0X(I  0)"  XI  (.v) 

{0,1 m) 
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c.  Interest  is  in  the  posterior  p.d.f.  of  9,  the  pro- 
portion or  probability  of  category  1. 

AIM  ■ J*2*™ 


d.  The  posterior  probability  that  the  cluster  i is  in 
category  2 is 


p - p 0 < 6 4 


Now,  four  special  cases  are  examined  by  varying  the 
parameter  r. 

1.  (c  “ 0)  — g is  concave  downward;  cluster 
algorithm  gives  mixed  clusters  only  (i.e.,  — means 
implied). 

2.  (r  - 1)  — £(0)  - /|O>||(0);  equi-ignorance 
principle. 

3.  (c  - 3/2)  - £(0)  - 2 £(1/2)  and  g{9)  - 302 
-3  9 + 3/2. 

4.  (c  - 3)  - g(l/2)  - 0 and  g(0)  - 1202  - 
120  + 3. 

These  equations  are  illustrated  as  follows. 


I 


and  1 — p is  the  probability  that  cluster  / is  in  cate- 
gory 1. 

e.  Decision  rules 

1.  If  v - «/ 2,  regardless  of  cluster  label,  p •» 

1/2. 

2.  If  .v  > nl 2,  the  cluster  is  labeled  category  1 
and  p is  the  probability  of  commission  (classifying 
category  2 into  category  1 ) in  cluster  labeling  and  I — 
P is  the  probability  of  correctly  labeling  a category  1 
cluster. 

3.  If  .v  < n/2.  the  cluster  is  labeled  category  2 
and  p is  the  probability  of  correctly  labeling  a catego- 
ry 2 cluster  and  1 - p is  the  probability  of  omission 
(classifying  category  1 into  category  2)  in  cluster 
labeling. 


3.4  General  Example 

Assume  £ is  a symmetric,  quadratic  a priori  den- 
sity for  9 (i.e.,  the  mathematica!  calculation  is  easier 
if  this  assumption  is  made);  then,  g(0)  - £(1)  and 
g(9)  — a92  + bO  + c. 

a.  The  fact  that  £ is  a p.d.f.  implies 
£(fi)  ” (6c  - 6 )02  + (6  - 6c)fl  + cand  0 « c 3. 


f 
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b.  The  unconditional  density  of  v can  be  shown  to  be 


,Tvt  = iHv  + I Kv  + -)  +6(1  <X.v  + IH«  + .?)  + c(n  + 2Mn  + ,t> 

(«  + i x«  + :)(/i  + ;>)  “ 

c.  The  probability  that  the  majority  of  the  cluster  is  in  category  2 is 


P = 


4[0(1  c'X.v  + 1)(»  + | .v)  + c(«  + :Xn + .'>)](  ')('  ')  +(")l(>U-  IX«+U » Ivl| 


yi  *-  3 


l(,U  c)x(n  v)  + III  (n  1) + (>(//  + l)| 


3.5  Specific  Example 

0 ^ thC  nUmbCr  °f  labclinB  pixe,s  and  v is  number  olcategory  1 labeled  samples; 


a.  Specific  case  n — 1 
1.  If  v - O.n  = 

16 

S - c 

16 


2.  if .» 


1./’ 


b.  Specific  case  n = 2 


If  A 

- o.  r 

23.  + 117 

24(.  +9) 

If  A 

*=  i , [> 

1/2. 

If  V 

■ *=  2.  p 

CT 

27  - 7c 

24(.  +9) 
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^ » .. , tw****^ **tKV*f f ynr ;v  r. 


3.6  An  Additional  Development 

The  posterior  Bayes  estimate  of  fl,  the  proportion  of  category  1 in  the  cluster,  is 

0 = H(0lx) 


1 

0g(d)f(x\6)d0 


a.  For  the  model  given  in  steps  a through  d in  section  3.3  and  the  general  a priori  assumption  of  step  a 
in  section  3.4,  it  follows  that 


* / .v  + 1 \ r 6(e  IX*  + 2X-V  + 3)  + 6(1 

rX*  + 2X«  + 4)  + cin  + 3X«  + 4)  "J 

V « + 4 / 6(o  IX*  + LH-v  + 2)  +6(1  - 

<)(*  + 1X»  + 3)  + c{n  + 2X«  + 3)  J 

b.  Specific  case  n — 1 

c.  Specific  case  n “ 2 

1.  If  v - 0,  # - 6 “ c 

1.  If  v — 0,8  - 6 “ f 

15 

18  + 2r 

2.  If  .v  ■=  \,$  = 9 + f- 

2.  If  .v  - 1.&  - 1/2. 

15 

1 If  _ ->  3 _ 12  + 3f 

3IfV  M 18  + 2c 

In  summary,  a method  of  determining  cluster 
labeling  accuracy  is  developed  that  may  be  used  after 
each  labeling  of  sets  of  samples  to  sequentially  deter- 


4. A CLUSTER  PARAMETER  EVALUATION 
EXAMPLE 

The  following  example  was  part  of  a study  per- 
formed for  the  Earth  Observations  Division  of  the 
NASA  Johnson  Space  Center  (JSC)  and  was  used  in 
the  LACIE.  The  purpose  of  the  study  was  to  com- 
pare two  sets  of  parameters  in  the  clustering 
algorithm/program.  Iterative  Self-Organizing 
Clustering  System  (ISOCLS),  used  to  partition 
32  932  samples  of  picture  elements  (pixels)  from 
Landsat  multispcctral  scanner  photographic  images. 
The  ISOCLS  algorithm  has  been  defined  by  Kan  and 
Holley  (ref.  1)  and  by  Kan  (ref.  2),  and  program 


mine  whether  sufficient  confidence  of  labeling  (by 
observing  />)  has  been  achieved  to  terminate  and 
label  a cluster  (and  establish  n). 


documentation  is  presented  by  Minter  (ref.  3). 

The  two  sets  of  parameters  that  were  compared  in 
ISOCLS  actually  constitute  two  clustering 
algorithms.  One  set  of  parameters  constitutes 
“nearest  neighbor  clustering”;  that  is.  40  pixels  (sam- 
ples) were  selected  at  random  and  labeled  "seed  pix- 
els,” and  then  each  pixel  was  assigned  to  the  seed 
nearest  to  it.  This  procedure  generates  40  clusters, 
each  with  the  label  of  its  seed.  This  is  a peculiar 
algorithm  where  the  NNL  procedure  and  the  MR 
labeling  procedure  will  always  yield  the  same  labels; 
it  is  denoted  the  NN  cluster  parameter  set.  The  other 
parameter  set  involves  similar  seeding  but  then  goes 
into  a complex  (R- means  like)  set  of  splittings  and 
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combinings  of  clusters.  These  parameters  were 
recommended  by  Wylie  and  Bean3  and  are  referred 
to  herein  as  the  MPAD  cluster  parameter  set.  The 
numerical  values  used  are  given  in  table  I.  The  data 
are  4 by  1 vectors  of  spectral  values  acquired  on  a 
particular  date.  Data  were  acquired  on  four  different 


dates  and  concatenated  into  supervectors  of  data. 
The  number  of  channels  or  elements  in  a vector  is 
thus  a function  of  the  number  of  dates  used.  The 
parameters  listed  in  table  I are  briefly  described  as 
follows. 


Table  /. — MPAD  and  NN  Cluster  Parameter  Sets 


Parameter 

duster  parameter  set  for  no.  of  channels — 

MPAD 

NN—any 

4 

8 

12 

16 

CLUSTERS 

60 

60 

60 

60 

60 

THRESHOLD 

8191 

8191 

8191 

8191 

8191 

SEP 

1.0 

to 

to 

1.0 

1.0 

PERCENT 

100 

100 

90 

90 

— 

STDMAX 

3.2 

3.6 

3.6 

3.6 

ISO 

DLMIN 

3.2 

3.9 

4.1 

4.5 

0 

NMIN 

so 

SO 

50 

50 

20 

ISTOP 

8 

8 

8 

8 

0 

SEQUEN 

SC 

SC 

SC 

SC 

— 

DOTFIL 

(a) 

(a) 

(a) 

(a) 

(a) 

‘‘Random!)  selected  suiting  dots 


a.  DOTFIL — self-generating  or  randomly 
selected  starting  vectors 

b.  STDMAX — maximum  standard  deviation  in  a 
cluster  before  splitting  occurs 

c.  DLMIN — minimum  distance  between  the 
means  of  two  clusters  needed  to  combine  them 

d.  ISTOP — maximum  number  of  iterations  in  the 
initial  splitting  sequence 

c.  SEQUEN — the  final  split/combine  (SO  se- 
quence 

f.  NMIN — minimum  number  of  pixels  needed  to 
form  a cluster 

g.  SEP — amount  of  separation  between  two  new 
clusters  after  splitting  occurs 


3Alan  D.  Wiley  and  William  C.  Bean.  "MPAD  LACIE 
Clustering  Parameter  Study."  NASA  JSC  Internal  Note  76- 
FM-II6.  1977. 


h.  PERCENT — required  percentage  of  stabilized 
clusters  needed  to  stop  the  splitting  sequence 

i.  CLUSTERS — maximum  number  of  dusters 
allowed  per  class 

j.  THRESHOLD — the  percentage  of  outlier  ob- 
servations to  be  deleted  from  consideration  (Zero 
thresholding  was  used  in  this  study.) 

Each  set  of  22  932  samples  (pixels)  in  a given  area 
is  referred  to  as  a segment  and  covers  a rectangular 
area  of  approximately  8.0  by  9.6  kilometers. 

The  four  segments  used  and  the  four  acquisition 
dates  of  each  using  the  Julian  calendar  system  arc 
given  in  table  II.  The  first  iwt  digits  represent  the 
year  and  the  last  three  digits  are  the  number  of  days 
into  the  year.  For  example.  76218  represents  August 
5.  1976.  since  that  is  the  218th  day  of  1976. 

A question  arose  during  the  course  of  this  study  as 
to  the  merit  of  using  pixels  (seed  samples)  for  which 
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Table  II. — Acquisition  Dates  for  Four  LACIE 
Segments  for  Crop  Year  1975-76 


l.oeation 

Segment 

Aequisition  dale 

A 

B 

C 

/) 

Morion  County,  Kans. 

1961 

75277 

76164 

76236 

76254 

Finney  County,  Kans. 

im 

7529$ 

76127 

76164 

76272 

Stevens  County,  Kans. 

186$ 

75349 

76136 

76172 

76)90 

Randall  County.  Tex. 

1978 

7.5313 

76074 

76164 

76218 

the  label  was  unsure  In  particular,  the  categories 
used  were  those  of  small  grains  (agriculture  crops) 
and  non-small-grains.  Pixels  selected  for  labeling 
were  sometimes  found  to  be  on  agricultural  field  bor- 
ders. making  a “pure”  label  difficult.  However,  a 
“majority  of  the  pixel”  strategy  was  employed  in 
labeling,  and  each  clustering  was  performed  with  one 
of  three  types  of  labeling  techniques. 

a.  Mixed— 40  pixels  were  selected  at  random. 

b.  Pure  border — border  pixels  that  spanned  fields 
of  differing  categories  (small  grains/non-small- 
grains)  were  replaced  with  pure  pixels. 

c Pure — all  border  pixels  were  removed  (even 
small  grains/small  grains  borders)  and  replaced  with 
pure  pixels. 

A ground-truth  label  was  used  in  labeling  to  avoid 
confounding  clustering  purity  with  analyst-in- 


terpreter errors.  After  the  PCC  was  calculated  for 
each  lest,  the  PCC's  and  the  VAR's  were  averaged 
over  the  four  sites.  These  results  are  given  in  tables 
III  and  IV.  Several  conclusions  can  be  draw  n from  ta- 
ble ill. 

a.  For  MR  labeling,  pure-border-type  labeling  pix- 
els are  uniformly  (across  number  of  channels)  bel- 
ter; i.e.,  produce  higher  PCC's. 

b.  For  NNL  and  MPA!)  parameters,  mixed  label- 
ing pixels  are  uniformly  better. 

c.  For  NNL  and  NN  parameters,  pure- border 
labeling  pixels  are  virtually  uniformly  better;  the  one 
exception  is  virtually  a tie. 

The  comparisons  of  V AR  from  table  IV  for  these 
two  clustering  parameter  sets  do  not  yield  such 
definite  results.  The  NN  parameter  set  yields  lower 
VAR's  for  4 channels  uniformly  (across  pixel  type). 


Ta  Bl.E  III. — Mean  PCC  I alues  A veraged  Over  Sites 


( hannel 

Labe  ling 
tediniifue 

( Inner  parameter  set  tor  pixel  ivpe — 

ME  A fi 

VV 

Mixed 

hire  border 

hue 

Mixed 

hire  border 

hire 

4 

MR 

85.32 

86  57 

86  52 

87|7 

88  04 

8796 

NNl. 

77.02 

76  89 

76.84 

83  29 

84  21 

84  29 

8 

MR 

89  38 

91  13 

9102 

89  78 

9|  69 

91  26 

NNL 

85  01 

84  53 

85.71 

88  61 

90  41 

89  99 

12 

MR 

90  31 

9040 

90  31 

9)  64 

92  24 

91  82 

NNL 

82.19 

80.78 

81.19 

89  26 

89  74 

89.32 

It. 

MR 

94  24 

95  17 

95.1 7 

9182 

91.76 

91  36 

NNL 

9|  69 

86  18 

86  29 

90.23 

90  01 

89  61 
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TabLe  IV. — Mean  Variance  Values  Averaged  Over  Sites 


Channel 

Clutter  parameter  set  for  pixel  type— 

MPAD 

NN 

Mixed 

Pure  border 

Pure 

Mixed 

Pure  border 

Pure 

4 

7.61 

7.13 

7.20 

6.89 

6.40 

6.50 

8 

4.32 

36* 

36$ 

5.84 

4.48 

453 

12 

4.39 

4.39 

4.47 

4.80 

4.56 

4.60 

16 

195 

1.56 

156 

$.48 

$.48 

5.52 

whereas  the  MPAO  parameter  set  yields  lower 
VAR's  Tor  8,  12,  and  16  channels  uniformly.  The 
results  for  PCC  (table  III)  and  VAR  (table  IV)  do 
not  give  identical  conclusions;  that  is,  one  parameter 
set  is  not  clearly  superior  to  the  other.  For  example, 
if  clusters  arc  labeled  using  the  NNL  procedure,  NN 
parameters  with  purc-border-pixel  labels  are  best 
(highest  PCC).  but  the  clusters  generated  are  not  as 
pure  as  with  the  MPAD  parameters  (and  mixed  pixel 
labels).  An  evaluation  of  clustering  purposes  and 
uses  is  called  for  in  the  trade-off  of  high  PCC  against 
low  VAR. 

Another  example  of  the  use  of  PCC  is  in  the  selec- 
tion of  other  parameters,  such  as  the  number  of  seed 
samples.  The  preceding  results  were  all  based  on  the 
use  of  40  starting  dots.  The  effect  on  PCC  of  increas- 
ing the  number  of  starting  dots  to  60  was  tested,  and 
the  results  are  given  in  table  V.  Only  pure-border  pix- 
els and  the  NNL  of  clusters  were  used  to  compare 
the  MPAD  and  NN  parameter  sets. 


Table  V. — Comparison  of  PCC's  Using  40  and  60 
Starting  Vectors0 


Set  Storm* 
vectors 

Channels 

4 

H 

/’ 

16 

MPAD  40 

769 

845 

808 

862 

60 

794 

890 

853 

897 

NN  40 

84.2 

904 

897 

900 

60 

833 

896 

89.6 

900 

*Purc*bof<kr  pucU  were  used  unth  NNL  of  dusters  sversfed  over  four 


The  values  in  table  V indicate  that  the  NN 
parameter  set  with  40  starting  vectors  produced  the 
highest  average  PCC.  It  can  be  observed  also  that 
there  is  either  little  or  no  gain  in  PCC  values  when 
the  number  of  channels  is  increased  from  8 to  12  or 
16,  regardless  of  the  parameter  set  used  or  the  num- 
ber of  starting  vectors. 


5.  SUMMARY 

Two  criteria.  PCC  and  VAR,  were  presented  as 
measures  in  duster  algorithm/program  evaluation; 
and  an  example  from  the  LACIE  project  at  NASA 
JSC  illustrates  their  use.  The  theoretical  foundation 
for  a system  of  cluster  labeling  as  a function  of 
cluster  purity  and  size  of  labeled  samples  is 
developed,  and  an  example  for  rather  general 
assumptions  is  generated. 
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CLASSY— An  Adaptive  Maximum  Likelihood 
Clustering  Algorithm* 

R.  K.  Lenningion 0 and  M.  E.  Rassbadfi 


ABSTRACT 

A new  clustering  method  called  CLASSY,  which 
alternates  maximum  likelihood  iterative  techniques 
for  estimating  the  parameters  of  a mixture  distribu- 
tion with  an  adaptive  procedure  for  splitting,  com- 
bining, and  eliminating  the  resultant  components  of 
the  mixture,  has  been  developed.  The  adaptive  pro- 
cedure is  based  on  maximizing  the  fit  of  a mixture  of 
multivariate  normal  distributions  to  the  observed 
data  using  its  first  through  fourth  central  moments. 
The  method  gene  ates  estimates  of  the  number  of 
multivariate  norm  1 components  in  the  mixture  and 
the  proportion,  n>  an  vector,  and  covariance  matrix 
for  each  component. 

This  paper  describes  the  mathematical  model 
which  is  the  basis  for  CLASSY  and  outlines  the  ac- 
tual operation  of  the  algorithm  as  currently  imple- 
mented. Results  of  applying  CLASSY  to  real  and 
simulated  Landsat  data  are  presented  and  compared 
with  results  generated  by  the  iterative  Self-Organiz- 
ing Clustering  System  (ISOCLS)  algorithm,  a deriva- 
tive of  the  ISODATA  algorithm,  on  the  same  data 
sets. 


INTRODUCTION 

The  Large  Area  Crop  Inventory  Experiment 
(LAC1E)  is  dependent  on  clustering  for  the  deter- 
mination of  spectral  classes  within  a Landsat  image 
of  a sample  segment  (ref.  1 ).  Currently,  the  Iterative 


*Thc  current  material  for  thif  paper  wai  developed  under 
NASA  contract  NAS  9-15200  and  prepared  for  the  Eattil  Obxcr- 
valiona  Division.  NASA  Johnson  Space  Center.  Houston.  Texas. 
CLASSY  was  developed  by  M K Rassbach  while  he  was  a Na- 
tional Research  Council  postdoctoral  felluw  working  at  the 
Johnson  Space  Center. 

"Lockheed  Electronics  Company.  Houston,  Texas. 
bF. logic.  Inc..  Houston.  Texas. 


Self-Organizing  Clustering  System  (ISOCLS)  is  used 
for  this  purpose  (refs.  2 and  3).  ISOCLS  is  basically  a 
variation  of  the  fc-means  or  ISODATA  algorithm  of 
Ball  and  Hall  (refs.  4 and  S).  Although  this  algorithm 
may  be  interpreted  as  a simplified  maximum  likeli- 
hood procedure,  it  is  fundamentally  a heuristic 
algorithm  for  breaking  a data  set  into  fairly 
homogeneous  compact  clusters. 

A new  clustering  algorithm  called  CLASSY, 
which  approximates  the  mixture  distribution  of  a 
given  data  set  such  as  Landsat  data  with  a linear  com- 
bination of  normal  distributions,  has  been 
developed.  CLASSY  operates  by  interleaving  max- 
imum likelihood  iterative  estimation  with  an  adap- 
tive procedure  for  splitting,  combining,  and  eliminat- 
ing the  resultant  components  of  the  mixture  density 
(or  clusters).  The  adaptive  procedure  is  based  on 
maximizing  the  fit  of  a mixture  of  multivariate  nor- 
mal distributions  to  the  observed  data  using  its  first 
through  fourth  central  moments.  This  procedure 
allows  new  components  (or  clusters)  to  be  created  if 
any  existing  one  appears  to  be  multimodal  or  other- 
wise nonnormal.  CLASjY  produces  an  estimate  of 
the  proportion,  mean  vector,  and  covariance  matrix 
for  each  component  in  the  multivariate  normal  mix- 
ture. It  differs  from  standard  maximum  likelihood 
procedures  in  that  it  also  generates  an  estimate  of  the 
number  of  components  in  the  mixture. 

The  CLASSY  algorithm  is  currently  implemented 
on  an  IBM  370-148  computer.  It  is  written  in  Fortran 
IV  language  and  currently  accepts  as  input  Landsat 
imagery  on  magnetic  tape.  Both  line  printer  and  mag- 
netic tape  output  are  generated  by  the  program. 

Ihe  following  section  of  this  paper  describes  the 
mathematical  model  that  is  the  basis  for  CLASSY 
and  provides  a brief  description  of  the  actual  opera- 
tion of  the  algorithm.  The  section  entitled  “Results" 
contains  comparisons  of  the  performances  of 
CLASSY  and  ISOCLS  on  simulated  data  and  on  ac- 
tual Landsat  data  used  in  LACIE.  Finally,  these 
results  are  evaluated  and  conclusions  are  developed. 
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MATHEMATICAL  DESCRIPTION  parameter  vector  *m  so  as  to  maximize  the  following 

function: 


Assumptions  and  Proto  lam  Definition 

The  fundamental  mathematical  assumption  un- 
derlying CLASSY  is  that  the  data  may  be  usefully  ap- 
proximated by  a mixture  of  multivariate  normal  den- 
sities. That  is,  if  x is  an  observation  vector  and  p is  its 
probability  density  function,  then 


m 

P(*\m*m)  “ E Vi  (“Mr)  0) 

/*i  1 


Evi  ("/h--.  )j 
(3) 


The  values  of  m and  wm  which  maximize  equation 
(3)  specify  n set  of  distributions  that  will  be  called 
clusters.  Of  course,  A(m, wm)  must  be  chosen  so  that 
it  satisfies  the  normalization  constraint 


where  at  is  the  a priori  probability  of  occurrence  of 
class  I:  p/x  >>  the  multivariate  normal  prob- 
ability density  function  for  class  / with  mean  vector 
M/and  covariance  matrix  If  mis  the  total  number  of 

classes;  wm  is  the  full  set  of  parameters  (i.e.,  (a, 

■ • ■ ‘Mm*  ^ml)- 

Given  a set  of  statistically  independent,  unlabeled 
sample  vectors  (xy).  the  likelihood  function  may  be 
formed  in  the  following  manner: 


E fA  (m*rn)dl,n  * » <«> 

m*l  J 


The  upper  limit  on  m is  infinity  since  the  possibility 
of  generating  an  infinite  number  of  clusters  must  be 
considered  (in  theory). 

Typically,  in  the  absence  of  other  information,  the 
a priori  probabilities  may  be  chosen  as 


1 (bHw-*m)-/7 
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aPi  ( w 2* 


where  N is  the  total  number  of  samples. 

So  far,  the  assumptions  and  equations  parallel  the 
usual  maximum  likelihood  development.  CLASSY 
makes  the  additional  assumption  that  each  value  of 
the  parameters  m and  wm  occurs  with  an  a priori 
probability  distribution  A (rn,wm).  This  Bayesian  for- 
mulation of  the  problem  is  taken  to  avoid  the 
degenerate  situation  of  increasing  the  likelihood  by 
generating  more  and  more  clusters  with  smaller  and 
smaller  values  of  The  practical  limit  of  this  proc- 
ess is  that  each  class  will  be  associated  with  only  one 
data  point. 

The  objective  of  CLASSY,  then,  is  to  determine 
the  discrete  parameter  in  and  the  continuous 


m 

* IlCrnm(Rm 
/•I 


0.  otherwise 


(5) 


where  C,  — Cis  a constant  containing  normalization 
factors  over  wm  space.  0 is  an  overall  normalization 
constant,  and  Rm  is  a finite  region  of  rnm  space  corre- 
sponding to  allowable  values  for  the  parameters. 
Using  this  simple  form  for  Aim. wm)  in  equation  (4). 
the  following  is  obtained. 
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Now  if 


-I 


where  y < 1.  then  the  sum  in  equation  (6)  will  con* 
verge  and  0 - 1 - y provides  the  proper  normaliza- 
tion. Thus,  larger  values  of  y provide  a priori  bias  in 
favor  of  more  clusters,  whereas  smaller  values  pro- 
vide bias  in  favor  of  fewer  clusters. 

In  the  current  version  of  CLASSY,  the  authors 
have  been  using  y • and  approximating  the  R\ 
integral  of  <ftr,  by  e2d.  This  represents  a crude  ap- 
proach to  the  problem  of  determining  the  form  of 
However,  in  practice,  the  overall  technique 
to  be  described  in  the  next  section  has  proven  not  to 
be  sensitive  to  reasonable  changes  in  the  value  of  C 
With  the  form  for  assumed  in  equation 

(5),  the  function  to  be  maximized  becomes 


where  d is  the  dimensionality  of  the  obseivations  nf 


Solution  Procedure 


Because  the  splitting  and  combining  techniques  oper- 
ate around  each  existing  cluster  and  the  statistics  for 
hypotheses  concerning  different  numbers  of  clusters 
are  maintained  separately,  it  has  been  observed  that 
the  Anal  local  maximum  will  often  be  global. 

Necessary  conditions  fo*  a maximum  of 
L(|x.j^n,vw)  with  respect  to  v^,  assuming  a fixed 
number  of  classes  m,  are  well  known  (see  Den  t and 
Hart  (ref.  6)  and  Wolfe  (ref.  7»  and  are  given  by  the 
following  equations: 
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Many  approaches  may  be  taken  to  maximize 
equation  (3).  The  approach  chosen  in  CLASSY  is  to 
interleave  maximum  likelihood  iteration  (designed 
to  maximize  U (x;),m,irm)  with  respect  to  the  con- 
tinuous parameter  vector  wm)  with  a discrete  split, 
join,  and  combine  process  (designed  to  maximize 
U[*)jn.9m)  with  respect  to  the  discrete  parameter 
m).  Although  the  theoretical  convergence  properties 
of  this  procedure  have  not  been  examined,  it  is  ex- 
pected that,  by  alternating  these  two  techniques, 
values  of  m and  9m  corresponding  to  at  least  a local 
maximum  of  L((xy),m,wm)  will  be  determined. 


where  p(/|xA.  wm)  is  the  posterior  probability  of  class 
I.  given  the  fcth  sample  vector  and  the  values  of  the 

parameters,  and  a,.  #sr  »nd  ir  / - 1 m.  are  the 

elements  of  rnm. 

Numerous  techniques  have  been  proposed  for  ob- 
taining c solution  to  this  set  of  coupled,  simultaneous 
nonlinear  equations.  Specific  methods  have  been 
suggested  by  Quirein  and  Trichel  (ref.  8).  Day  (ref. 
9\  Hasselbtad  (ref.  10),  and  Wolfe  (ref.  7).  among 
others.  CLASSY  uses  direct  functional  iteration  for 
equations  (10)  and  (11);  that  is.  use  of  estimates  for 
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M/  and  2/ on  the  right  side  to  produce  improved  esti- 
mates on  the  left  side. 

Estimates  for  the  a priori  class  probabilities  a,  are 
computed  using  an  iteration  scheme  which  has 
proved  to  converge  more  rapidly  than  simple  func- 
tional iteration  using  equation  (9).  The  scheme  used 
is  specified  by  the  following  equation,  which  is 
derived  in  the  appendix. 


°i  = 
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(12) 


where 


Pi  = P;  ( X*K2/ ) 
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a random  fashion.  Using  scrambled  data  and  updat- 
ing the  parameter  values  with  each  new  data  point, 
the  authors  have  observed  that  the  number  of  sam- 
ples (N)  required  for  'nitial  convergence  is  on  the 
order  of  a few  hundred,  even  for  large  data  sets. 
Following  initial  convergence,  the  parameters  are  up- 
dated only  after  a complete  pass  has  been  made 
through  the  data.  This  second  type  of  iteration  allows 
a fine  tuning  of  the  parameter  values  and  is  not  sub- 
ject to  problems  related  to  data  correlation.  The  con- 
ditions under  which  the  second  mode  of  parameter 
iteration  is  entered  are  discussed  later  in  this  section. 

The  same  iteration  scheme  used  to  update  the 
parameters  is  also  used  to  accumulate  third-  and 
fourth-order  central  moments.  That  is,  current 
values  of  the  parameters  are  used  with  each  new  data 
point  to  form  the  new  terms  to  be  accumulated  for 
estimating  the  moments.  The  fundamental  equations 
for  the  estimates  of  the  third-  and  fourth-order  mo- 
ments are  generalizations  of  equations  (10)  and  (11) 
and  are  given  as 


1 N 
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qt ” £ (r^r)  Mx*b,s') 

N = the  total  number  of  observations 


and 


N 

4%q  = Wt  £ W/pV  ( 'IVm ) (14) 


This  equation  is  used  by  substituting  old  values  of  a,, 
M, , and  lh  i — 1 m,  on  the  right  to  obtain  an  up- 

dated estimate  for  a,  on  the  left.  The  summations  are 
taken  over  all  values  of  x*  such  that  p,  > qt  or  p,  < qt. 

Initially,  each  new  data  point  xy  is  used  to  update 
the  parameter  values  using  equations  (8)  through 
(12).  This  procedure  allows  rapid  evolvement  of  the 
parameters  as  new  data  points  are  processed.  A 
danger  lies  in  the  fact  that  the  data  are  considered  se- 
quentially. If  significant  correlation  is  present  in  the 
data,  updating  the  parameters  with  each  new  data 
point  could  theoretically  cause  the  maximum  likeli- 
hood equations  to  converge  very  slowly  or  to  under- 
go cyclic  drifts.  This  problem  has  been  found  to  be 
particularly  severe  in  Landsat  data,  which  exhibit 
high  correlation  within  fields.  To  reduce  the  effects 
of  this  correlation,  the  data  are  initially  scrambled  in 


where  Xjk  - (*,*-/*/*) 

xJk  — the  kth  component  of  the  yth  sample 
vector 

Hjk  — the  current  estimate  for  the  kth  com- 
ponent of  the  mean  vector  of  cluster  i 


and  where 


wi  =£  P^Vjtm  ) (15) 

/=! 


The  parameter  M^is  defined  as  the  weight  for  cluster 
/ and  may  be  considered  as  the  number  of  points 
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assigned  to  a cluster  on  a fractional  probabilistic 
basis;  S(,)  is  a three-dimensional  “skewness"  tensor, 
and  K(,)  is  a four-dimensional  “kurtosis"  tensor.  To 
reduce  the  number  of  parameters  to  be  estimated  and 
stored,  traces  of  these  tensors  are  formed  using  the 
inverse  of  the  estimated  sample  covariance  matrix 
for  cluster  / (X()  to  obtain 

*»'  * i-E  *,k  ( Vsi  '*i ) o ( '|y*« ) 

/*! 

where  A ™ 1,  2, ....  (A  and 
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where  A,/  *■  1,2 rf,  and 

V = [~n  “/</] 

During  the  initial  iteration  mode,  when  parameter 
values  are  changing  with  each  data  point,  the  esti- 
mates for 


s(i)  = ( s\n s^' ) 


and 


-(K>V) 


for  each  cluster  / are  only  approximately  correct.  The 
second  mode  of  iteration  produces  a more  accurate 
estimate  of  these  statistics.  As  shall  be  seen,  the  esti- 
mates of  S*' 1 and  K('  1 are  used  in  the  maximization 
of  the  likelihood  with  respect  to  the  discrete 
parameter  m. 

The  optimization  of  /.(|x, ),*,*„)  with  respect  to 
the  discrete  parameter  m takes  the  form  of  generating 
hypotheses  concerning  the  number  of  clusters  and 


the  subsequent  testing  of  these  hypotheses  using  a 
likelihood  ratio  test.  At  certain  points  in  the  process 
of  maximum  likelihood  iteration,  it  is  possible  to 
generate  a hypothesis  concerning  the  fit  of  a given 
cluster  to  the  data;  namely,  either  that  the  data  are 
better  represented  by  two  clusters  rather  than  one  (a 
split  hypothesis)  or  that  the  data  are  better  repre- 
sented by  combining  the  given  cluster  with  another 
cluster  (a  join  hypothesis).  Each  cluster  is  checked  to 
determine  whether  either  a split  or  a join  hypothesis 
seems  reasonable  when  the  weight  for  that  cluster  as 
defined  in  equation  (IS)  exceeds  a threshold.  At  this 
same  time,  a portion  of  the  old  data,  which  have  been 
accumulated  using  less  accurate  parameter  values,  is 
subtracted  from  the  appropriate  sum  for  each  of  the 
parameters  given  in  equations  (8)  through  (11).  The 
weight  threshold  is  initially  set  at  200  and  increases 
each  time  it  is  exceeded.  This  procedure  allows  an  in- 
itial fit  to  the  major  clusters  in  the  data  and  a subse- 
quent development  of  more  detailed  cluster  struc- 
ture. 

The  generation  of  a split  hypothesis  is  governed 
by  comparing  scalar  measures  of  multivariate  skew- 
ness and  kurtosis  for  each  cluster  to  thresholds 
derived  from  the  appropriate  distribution  for  these 
measures  computed  under  the  assumption  of  a 
multivariate  normal  distribution.  The  scalar 
measures  of  multivariate  skewness  and  kurtosis  are 
contractions  of  the  skewness  vector  S(,  ) and  the  kur- 
tosis matrix  K(,  t with  respect  to  the  inverse  of  the 
estimated  covariance  matrix  for  cluster  /,  I,'1.  These 


measures  are  given  by 

s.2  = SSi)TEt  lS(,) 

(18) 

A.  = Tr  ( K(,)L\  1 ) 

(19) 

(A“  )2  = Tr(  K(/)  1 ) 

A, 2 

ir  (20) 

Here,  A,  is  the  trace  of  the  normalized  kurtosis  matrix 
for  cluster  / and  (A/V  is  the  trace-free  component  of 
the  square  of  A,. 

If  any  one  of  these  three  statistics  given  by  equa- 
tions (18)  to  (20)  exceeds  its  threshold  value,  the  hy- 
pothesis is  formed  that  the  /th  cluster  may  be  split 
into  two  parts.  The  parameters  for  each  of  the  two 
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new  component  clusters  are  estimated  by  minimiz- 
ing the  squared  differences  between  the  observed 
covariance  matrix,  the  skewness  vector,  and  the  kur- 
tosis  matrix  and  the  corresponding  quantities  for  the 
mixture  distribution  composed  of  the  two  new  nor- 
mal distributions.  The  proportion  and  mean  for  the 
mixture  composed  of  the  subclusters  are  defined  to 
be  exactly  equal  to  the  corresponding  quantities  for 
the  parent  cluster.  That  is,  if  a,  and  ft/ are  the  current 
estimates  of  proportion  and  mean  for  cluster  i and 
Mj.  • and  ft,  are  the  corresponding  initial 
values2 of  tne  subcluster  parameters,  it  is  required 
that 


+ a.- 


(21) 


minimize  by  means  of  a steepest  descent  algorithm  a 
quadratic  form  that  may  be  expressed  as 


* 


(23) 


where  I(,  K<‘\  and  8(,  'are  the  current  estimates  of 
the  covariance  matrix,  the  kurtosis  matrix,  and  the 
skewness  vector,  respectively,  for  cluster  i ; lpx  Kp, 
and  8.  are  the  corresponding  “pooled"  estimates 
from  the  mixture  of  the  subclusters  under  the  restric- 
tions of  equations  (21)  and  (22);  and  at,  a2,  and  a3 
are  arbitrary  constants.  The  norms  are  the  appropri- 
ate matrU  and  vector  norms.  That  is,  if  Mt  is  one  of 
the  symmetric  matrices  in  equation  (23)  and 
V,  - 8(,)  - Sp, then 


and 
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Thus,  the  difference  in  subcluster  proportions  and 
the  difference  in  the  subcluster  mean  vectors  are  left 
as  free  parameters.  The  other  free  parameters  are  the 
independent  elements  of  the  two  subcluster 
covariance  matrices.  Therefore,  a total  of 

l + d + 2 = (J  + 1)2 

parameters  must  be  determined. 

There  are  \d(d  + 1)1/2  equations,  each  of  which 
matches  the  covariance  matrix  and  kurtosis  matrix 
parameters  for  the  parent  cluster  to  the  correspond- 
ing parameters  for  the  subciuster  mixture.  In  addi- 
tion, there  are  d equations  matching  the  skewness 
vector  parameters  for  the  parent  cluster  and  the 
subcluster  mixture.  This  is  a total  of  d2  + 2d  equa- 
tions. Thus,  there  is  one  more  free  parameter  or 
unknown  than  there  are  equations  and  a unique  solu- 
tion is  not  possible. 

The  approach  taken  to  obtaining  a solution  is  to 


Minimization  of  equation  (23)  under  the  restric- 
tions of  equations  (21)  and  (22)  produces  estimates 
for  the  proportions,  mean  vectors,  and  covariance 
matrices  which  define  two  new  multivariate  normal 
clusters.  In  the  generation  of  a split  hypothesis,  the 
statistics  defining  the  multivariate  normal  parent 
cluster  are  not  discarded.  When  the  maximum  likeli- 
hood iteration  cycle  is  begun  again,  it  is  performed 
for  the  previously  existing  clusters,  including  the 
parent  cluster,  and  for  the  two  new  clusters,  which 
may  be  thought  of  as  subclusters  of  the  parent 
cluster.  Thus,  as  split  and  join  hypotheses  are  gener- 
ated, a hierarchical  cluster  structure  or  cluster  tree 
evolves.  Final  decisions  concerning  the  choice  of  a 
parent  cluster  or  its  subclustcrs  to  represent  the  data 
arc  made  on  the  basis  of  likelihood  ratio  tests  as  will 
be  described  later. 

The  generation  of  a join  hypotl  esis  is  the  inverse 
of  the  split  hypothesis  generation  procedure.  That  is. 
if  the  generation  of  a join  hypothesis  for  two  already 
existing  clusters  is  deemed  reasonable,  then  statistics 
for  a new  parent  cluster  are  calculated  from  the 
multivariate  normal  mixture  distribution  defined  by 
the  two  clusters  to  be  joined.  The  new  parent  cluster 


676 


is  inserted  at  the  level  of  the  clusters  to  be  joined  and 
the  clusters  to  be  joined  are  moved  to  the  next  lower 
level  in  the  tree  as  subclusters  of  the  new  parent. 

It  should  be  noted  that  only  clusters  which  have  a 
common  parent  are  eligible  to  be  joined.  The  test  for 
determining  when  a join  hypothesis  should  be  gener- 
ated is  des  gned  to  measure  the  degree  of  overlap  be- 
tween clusters  having  a common  parent  cluster.  (All 
the  clusters  at  the  top  level  of  the  tree  are  assumed  to 
have  a common  parent.)  The  overlap  is  checked  by 
comparing  the  mean  vectors  and  the  diagonal  ele- 
ments of  the  covariance  matrices  for  two  clusters.  A 
heuristic  criterion  is  used  to  perform  this  check.  This 
criterion  is  given  by  equation  (24). 
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a given  parent  cluster  /,  then  the  logarithm  of  the 
likelihood  ratio  of  the  subclusters  to  the  parent  is  ac- 
cumulated at  the  same  time  that  maximum  likeli- 
hood iteration  is  taking  place.  The  form  of  this  likeli- 
hood ratio  is  given  by  equation  .25). 


where  IF,  is  the  current  weight  for  cluster  /'and  A and 
B are  arbitrary  constants  (currently,  A = 0.3  and 
0.18). 

The  first  term  in  the  numerator  is  a weighted  dis- 
tance between  the  mean  vectors  of  clusters  /'  and  j. 
The  weighting  is  accomplished  by  an  average  inverse 
covariance  matrix  for  clusters  / and  j.  The  second 
term  in  the  numerator  is  a measure  of  the  difference 
in  the  diagonal  elements  of  the  two  covariance 
matrices.  The  diagonal  elements  rather  than  the  full 
covariance  matrices  are  used  for  computational 
simplicity.  A more  complete  expression  involving  all 
covariance  terms  would  be  ln[det  2(2,-1].  The 
denominator  is  designed  to  discriminate  against 
small  clusters  in  the  sense  that  R(J  will  be  artificially 
reduced  if  the  weight  of  one  cluster  is  small  relative 
to  the  weight  of  the  other  cluster.  This  factor  is 
designed  to  give  large  clusters  an  opportunity  to  ab- 
sorb small  clusters  if  such  a join  does  not  substan- 
tially affect  the  statistics  of  the  larger  cluster. 

The  ^criterion  is  computed  for  each  cluster  hav- 
ing the  same  parent  as  cluster  /.  If  the  cluster  j for 
which  Ry  is  a minimum  is  less  than  an  empirically 
set  fixed  threshold,  then  a join  hypothesis  for  cluster 
/'  and  j is  generated. 

Final  decisions  concerning  the  acceptance  or  re- 
jection of  split  and  join  hyotheses  are  made  in  terms 
of  likelihood  ratio  tests.  If  there  are  m,  subclusters  for 


where  A,  is  the  likelihood  ratio  for  cluster  /;  ar  p,, 
and  2,  are  the  current  estimates  of  the  parameters  for 
cluster  /';  and  ak  and  lk  are  the  corresponding 
subcluster  parameters.  This  log  likelihood  ratio  is 
tested  against  a threshold  computed  assuming  that  2 
In  A j is  approximately  distributed  as  an  x2  random 
variable  with  degrees  of  freedom  equal  to  a + 1.  A 
one-tailed  test  is  used,  and  the  probability  of  a type  I 
error  is  set  at  0.01 . If  2 In  A, exceeds  the  threshold  set 
by  the  test,  then  the  statistics  for  the  parent  cluster 
are  eliminated  and  the  subclusters  take  the  place  of 
the  parent  cluster. 

It  is  also  possible  that  In  A,  may  become  negative, 
even  though  in  theory  this  should  not  occur.  In  prac- 
tice, negative  values  may  occur  because  of  poor  in- 
itial estimation  of  the  subcluster  parameters  or  lack 
of  convergence  in  these  estimates.  To  avoid  the  ex- 
pense of  maintaining  poor  subclusters,  the 
subclusters  are  eliminated  in  favor  of  the  parent 
cluster  when  In  A,  falls  below  a fixed  negative 
threshold.  This  threshold  is  set  to  a large  negative 
value  to  allow  the  subcluster  statistics  to  converge  if 
they  are  going  to  converge. 

One  other  possibility  in  testing  the  likelihood  ratio 
is  that  the  subcluster  statistics  may  actually  converge 
so  that  the  mixture  distribution  defined  by  the 
subcluster  parameters  reproduces  or  very  nearly 
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reproduces  the  parent  cluster  distribution.  In  such 
cases.  In  Ay  will  remain  at  a low  value  possibly 
slightly  greater  than  or  less  than  zero.  If  this  occurs, 
it  may  be  assumed  that  the  parent  cluster  is  the  most 
economical  description  of  the  data  and  the 
subclusters  may  be  eliminated.  To  test  for  this  situa- 
tion, another  statistic  based  on  the  accumulated 
point  probabilities  under  the  parent  and  subcluster 
hypotheses  is  examined.  Defining 


m. 


% W * £,V  (t*A) 


where  ak.,  nk.,  and  lk.  are  the  current  estimates  of 
the  parameters  for  the  subclusters  of  cluster  /,  the 
statistic  computed  is 


a IM-M)  --<,(*<)  2 
k \p<  (--I'A  *MX') 


(26) 


Sy,  and  A,  are  reset.  Thus,  these  statistics  depend 
only  on  the  data  processed  since  the  last  testing  of 
the  cluster  statistics  for  cluster  i. 

The  present  projram  cycles  through  the  data  a 
fixed  number  of  times.  (The  number  of  passes 
through  the  data  is  controlled  by  an  external 
parameter.)  When  the  desired  number  of  passes  is 
complete,  the  program  clusters  the  data  by  examin- 
ing it  point  by  point  and  assigning  each  data  point  to 
the  cluster  in  the  cluster  tree  for  which  the  prob- 
ability of  occurrence  of  this  data  point  is  the  greatest. 
This  is  the  only  time  in  the  program  that  points  are 
assigned  to  clusters.  When  all  the  points  have  been 
assigned,  a cluster  map  showing  the  cluster  symbol 
for  each  point  is  printed  out.  The  program  also  prints 
out  the  final  values  for  the  parameters  for  each 
cluster  in  the  cluster  tree. 

Figure  1 is  a general  flow  diagram  for  the 
CLASSY  program.  This  is  not  a detailed  flow 
diagram  for  the  program  but  merely  serves  to  sum- 
marize the  information  given  in  this  section  in  a con- 
venient manner. 

The  initial  values  assumed  at  the  beginning  of  the 
program  are  as  follows. 


Equation  (26)  gives  a crude  measure  of  how  much  a 
parent  cluster  differs  from  the  mixture  of  its 
subclasses.  If  Et  becomes  smaller  than  a fixed  em- 
pirically determined  threshold  and  the  log  likelihood 
ratio  is  less  than  a fixed  small  positive  value,  then  the 
subclusters  are  eliminated  in  favor  of  the  parent 
cluster. 

The  one  remaining  test  in  the  portion  of  the  pro- 
gram that  performs  maximization  with  respect  to  the 
number  of  classes  is  a simple  test  on  the  proportion 
a,  of  each  cluster  or  subcluster.  If  this  proportion  falls 
below  a threshold  value,  currently  set  to  0.01,  then 
the  cluster  is  eliminated.  This  test  is  used  primarily 
in  the  interest  of  efficiency  since  very  small  clusters 
do  not  significantly  affect  the  overall  mixture  dis- 
tribution. 

All  the  tests  for  the  generation  of  hypothesized 
new  clusters  and  for  the  elimination  of  clusters  or 
subctusters  occur  at  certain  intervals  during  the  proc- 
ess of  maximum  likelihood  iteration  and  statistics  ac- 
cumulation; namely,  when  the  weight  for  a given 
cluster  has  increased  by  a fixed  amount  or  when  a 
complete  pass  has  been  made  through  the  data  since 
the  last  tests  were  performed.  After  the  tests  have 
been  made  and  any  resultant  restructuring  of  the 
cluster  tree  has  taken  place,  £,  (given  by  eq.  (26)),  Ky, 
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DATA,  PROCEDURES,  AND  RESULTS 

To  evaluate  the  CLASSY  clustering  algorithm,  it 
was  applied  to  both  real  and  simulated  Landsat  data. 
Performance  measures  were  defined  and  calculated 
for  each  trial  of  the  algorithm.  The  measures  were 
compared  with  those  derived  from  applying  the 
ISOCLS  algorithm  to  the  same  data. 


678 


Data  8ata 

Two  different  data  sets  were  used  in  the  compara- 
tive evaluation  of  CLASSY  and  ISOCLS.  The  first 
was  a set  of  Landsat  acquisitions  of  four  different 
LACIE  segments.  Each  LACIE  segment  is  196  pic- 
ture elements  (pixels)  per  line  by  117  lines  and  cor- 
responds to  a 5-  by  6-nautical-mile  area  on  the 
ground.  The  second  data  set  was  a group  of  four 
different  simulated  acquisitions  of  a simulated 
LACIE  segment.  Each  of  these  data  sets  is  described 
separately  in  the  following  paragraphs. 

The  four  LACIE  segments  were  selected  on  the 
basis  of  the  availability  of  ground  truth  at  regularly 
spaced  pixels  in  the  image  and  the  provision  of  a 
representative  sampling  of  LACIE  segments  in 
terms  of  field  structure  and  the  proportion  of  wheat 
present.  Once  the  segments  had  been  chosen,  the  ac- 
quisition that  had  the  greatest  separability,  as 
measured  by  the  Bhattacharyya  distance,  was 
selected.  The  Bhattacharyya  distance  was  computed 
between  wheat  and  nonwheat  classes  where  the  class 
statistics  were  obtained  from  ground-truth  fields. 
The  segment  number  and  location,  the  acquisition 
date  with  the  largest  separability,  and  the  ground- 
truth  percentages  of  wheat  and  small  grains  for  each 
segment  are  given  in  table  I. 


Table  /. — Description  of  LA  CIE  Sample  Segments 


Segment 

Location 

Acquisition 

date 

Ground 

truth. 

percent 

wheat 

Ground 

truth, 

percent 

small 

grains 

1181 

Kansas 

Mar.  10,1976 

23.4 

29.0 

1988 

Kansas 

Nov.  8. 1975 

33.0 

33.0 

1961 

Kansas 

July  18, 1976 

82 

8.2 

1965 

North  Dakota 

Aug  8.1976 

41.6 

47.0 

The  simulated  data  set  consisted  of  four  simulated 
Landsat  acquisitions,  each  196  pixels  by  117  lines. 
This  data  set  was  generated  by  IBM  for  the  Mission 
Planning  and  Analysis  Division  at  the  Johnson  Space 
Center  (ref.  11).  Each  “acquisition"  was  obtained 
first  by  specifying  the  mean  vector  and  covariance 
matrix  for  10  different  classes.  The  class  statistics  for 


each  class  were  specified  so  as  to  simulate  the 
LACIE  data  for  two  wheat  classes  ( Wx  and  Wfi,  two 
barley  classes  (fy  and  flj),  two  classes  of  grass  (Gj 
and  G2),  two  stubble  classes  (S,  and  5^),  and  two 
classes  of  fallow  (F,  and  F2).  The  statistics  for  these 
classes  were  actually  obtained  from  Landsat  data 
representing  an  agricultural  area  in  Hill  County, 
Montana.  Once  the  statistics  for  a given  class  were 
specified,  independent  samples  were  generated  from 
a four-dimensional  multivariate  normal  distribution 
having  those  statistics.  These  samples  were  then 
placed  in  rectangular  fields  arranged  over  the  simu- 
lated segment.  This  process  was  repeated  for  each 
class  and  for  each  of  the  four  acquisitions.  The  ar- 
rangement of  the  simulated  fields  over  the  segment 
was  the  same  for  each  acquisition.  The  pattern  of  the 
simulated  fields  is  given  in  figure  2. 
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FIGURE  2. — Distribution  of  classes  in  simulated  segment. 


Evaluation  Method  and  Procedures 

CLASSY  was  evaluated  using  a comparative 
analysis  method  in  which  the  clustering  results  of 
CLASSY  were  compared  with  those  of  ISOCLS 
using  the  ground  truth  as  a reference.  The  evaluation 
procedure  consisted  of  two  steps. 

1.  The  CLASSY  and  ISOCLS  algorithms  were  ap- 
plied to  each  segment  in  each  data  set.  CLASSY  was 
run  for  three  complete  iterations  through  all  the  data 
in  each  segment.  ISOCLS  was  run  in  the  nearest 
neighbor  mode  with  40  ground-truth  pixels  as  start- 
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ing  vectors.  In  this  mode,  ISOCLS  merely  assigns 
pixels  to  the  nearest  starting  vector  measured  in 
terms  of  L-l  distance  rather  than  operating 
iteratively.  This  mode  was  chosen  for  ISOCLS 
because  this  was  the  manner  in  which  the  algorithm 
was  currently  being  used  in  the  LAC1E  project. 

2.  The  clusters  in  the  line  printer  map  produced 
by  each  algorithm  were  analyzed  by  first  recording 
the  cluster  symbol  and  the  corresponding  ground* 
truth  label  (either  wheat  or  nonwheat)  for  each  pixel 
where  ground  truth  was  available.  These  results  were 
tabulated,  so  that  the  number  of  ground-truth  wheat 
pixels  and  ground-truth  nonwheat  pixels  falling  in 
each  cluster  was  known.  The  clusters  were  then 
labeled  wheat  or  nonwheat  by  majority  rule. 

A measure  of  the  accuracy  of  each  clustering 
algorithm  in  separating  wheat  from  nonwheat  (or  a 
measure  of  the  overall  purity  of  the  wheat  and  non- 
wheat clusters)  was  computed  by  estimating  the 
probability  of  correct  classification  (PCC)  for  the 
labeled  clusters.  This  probability  is  given  by 


W1  m2 

PCC  = £ /•  w no)  ♦ £/■  Mb')  iw  (28) 

/=  i /=i 

where  ml  is  the  number  of  clusters  labeled  "other”; 
m2  is  the  number  of  clusters  labeled  wheat;  P[0j\O) 
is  the  probability  that  a pixel  falls  in  the  /th  “other” 
cluster,  given  that  it  is  other  than  wheat;  P(  W'-l  HO  is 
the  probability  that  a pixel  falls  in  the  rth  wheat 
cluster,  given  that  it  is  wheat;  P(  W)  is  the  a priori 
probability  that  a pixel  is  wheat;  and  P{0)  is  the  a 
priori  probability  that  a pixel  is  other  than  wheat. 
Empirical  proportions  were  used  to  estimate  these 
probabilities  and  a priori  values,  resulting  in  the 
following  estimate: 


m j 

PCC  = j^Z‘\\o  + <29> 


where  NT  is  the  total  number  of  ground-truth  pixels, 
jV0  |0is  the  number  of  ground-truth  “other”  pixels 
falling  in  the  /th  "other”  cluster,  and  Nw, | w is  the 
number  of  ground-truth  wheat  pixels  falling  in  the 
/th  wheat  cluster.  It  is  noteworthy  that,  to  obtain  an 
accurate  estimate  of  PCC  using  equation  (29),  it  is 
necessary  that  several  ground-truth  pixels  fall  in  each 


cluster.  Specifically,  if  there  are  clusters  which  have 
only  one  or  two  ground-truth  grid-intersection  pixels, 
the  estimate  of  PCC  will  be  biased  on  the  high  side. 

As  a part  of  the  analysis,  the  proportion  of  wheat 
was  also  estimated  for  the  labeled  clusters  and  com- 
pared to  the  ground-truth  value.  The  equation  used 
for  this  estimate  is 

m2 

V)  = t r£V  (30) 

T f 


where  Nw  is  the  total  number  of  ground-truth  pixels 
(wheat  anti  other)  falling  in  the  /th  wheat  cluster. 

Estimates  computed  using  equations  (29)  and 
(30)  were  obtained  for  each  algorithm  as  applied  to 
both  the  real  and  simulated  data  sets. 


Results 

The  results  of  these  computations  are  given  in  ta- 
bles II  through  XI.  Tables  II,  III,  V,  and  VI  compare 
CLASSY  and  ISOCLS  results  for  the  LACIE  seg- 
ments examined;  the  corresponding  results  for 
simulated  segment  data  are  given  in  tables  VII 
through  XL 

Table  II  compares  the  number  of  clusters  and  the 
PCC  estimates  for  ISOCLS  ( PCC ,)  and  for  CLASSY 
( PCCc ) as  a result  of  clustering  each  of  the  four 
LACIE  segments  examined  using  both  methods. 
The  PCC  estimates  for  CLASSY  are,  on  the  average, 
about  4 percentage  points  lower  than  those  for 
ISOCLS.  However,  since  the  version  of  ISOCLS  used 


Table  II. — Comparison  of  the  Number  of  Clusters  and 
the  Estimated  Probability  of  Correct  Classification 
Using  Single-Pass  Segment  Data 


Segment 

ISOCLS 

CLASSY 

% 

So.  of 

clusters 

p£c, 

S’o.  of 
clusters 

p£cc 

1181 

40 

0,8410 

7 

0.8052 

-0.0358 

1988 

40 

.8070 

8 

.7661 

-.0409 

1%I 

40 

9230 

11 

.9028 

-.0208 

1%5 

40 

.7419 

9 

.6774 

- 0645 

Average 

40 

.8284 

8.75 

.7875 

- 0405 
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generates  a factor  of  4 to  6 times  as  many  clusters  as 
CLASSY,  many  of  the  ISOCLS  clusters  contain  only 
one  or  two  ground-truth  grid-intersection  points.  As 
discussed  in  the  preceding  section,  this  means  that 
the  PCC  estimates  for  ISOCLS  will  be  biased  high 
relative  to  CLASSY.  In  addition,  each  ISOCLS 
cluster  typically  contains  one  ground-truth  point 
used  as  a starting  vector  for  that  cluster.  Since  the 
label  of  these  starting  vectors  almost  always  agrees 
with  the  cluster  label,  this  amounts  to  a further  high 
bias  in  the  PCC  estimates  for  ISOCLS.  In  the  light  of 
this  bias  in  favor  of  ISOCLS  and  the  economy  repre- 
sented by  the  greatly  reduced  number  of  CLASSY 
clusters,  CLASSY  compares  very  favorably  to 
ISOCLS. 

The  LACIE  segments  used  in  this  study  con- 
tained varying  amounts  of  wheat.  The  ground-truth 
percentages  of  wheat  Pi  W)  and  small  grains  P{SG) 
are  given  in  table  III.  The  estimate  of  the  proportion 
of  wheat  computed  using  the  ground-truth  grid-inter- 
section dots  PD(  WO  is  also  included.  An  estimate  of 
the  proportion  of  wheat  in  the  whole  scene  deter- 
mined from  the  clusters  labeled  wheat  can  be  ob- 
tained using  equation  (30).  The  wheat  proportion 
estimates  resulting  from  applying  this  equation  to 
the  CLASSY  results  /'.(WO  and  ISOCLS  results 
P/i  WO  are  also  given  in  table  III.  Comparing  these 
percentages  to  the  ground-truth  wheat  proportions 
shows  that,  with  the  exception  of  segment  I96S,  the 
wheat  proportion  estimates  are  about  4 to  6 percent 
higher  than  the  ground-truth  wheat  proportion 
values.  These  slightly  high  estimates  may  be  due  to 
the  fact  that,  even  though  only  wheat  ground-truth 
dots  were  used  to  label  clusters,  labeled  wheat 
clusters  may  reasonably  be  assumed  to  include  some 
small  grains.  The  last  column  in  table  III  shows  that 
the  ISOCLS  estimate  was  closer  to  the  ground-truth 


wheat  proportion  for  two  segments  and  the  CLASSY 
estimate  was  closer  for  the  other  two  segments. 

The  imagery  for  segment  1965  was  examined  in 
detail  because  the  wheat  proportion  estimates  for 
both  CLASSY  and  ISOCLS  deviated  considerably 
from  the  ground  truth  and  the  PCC  estimates  for 
both  algorithms  were  correspondingly  low  for  this 
segment.  This  segment  contained  numerous  small 
strip  fields.  Typically,  small-field  regions  accentuate 
misregistration  problems,  and  such  appears  to  be  the 
case  for  this  segment.  The  misregistration  of  the 
ground-truth  reference  acquisition  relative  to  the  ac- 
quisition clustered  reduced  PCC  values  and  distorted 
the  proportion  of  wheat  estimates  for  both 
algorithms. 

To  obtain  an  idea  about  the  relative  performance 
of  CLASSY  and  ISOCLS  when  applied  to  multitem- 
porai  data,  four-channel  “green"  images  were  formed 
for  each  segment  by  applying  the  Kauth  (ref.  12) 
transformation  to  each  of  four  acquisitions  for  a 
given  segment  and  then  selecting  the  green  number 
from  each  acquisition.  (It  was  necessary  to  reduce 
the  16-dimensional  data  to  4 dimensions  since 
CLASSY  is  limited  to  4 dimensions  at  the  present 
time.)  Table  IV  lists  the  four  acquisitions  used  for 
each  segment.  The  results  of  comparing  the  PCC 
values  and  the  wheat  proportion  estimates  for  the 
two  algorithms  are  given  in  tables  V and  VI,  respec- 
tively. Comparing  table  V and  table  II  shows  that  the 
PCC  values  for  both  algorithms  remained  about  the 
same  for  segments  1181  and  1961  and  that  they  in- 
creased significantly  for  segments  1988  and  1965. 
The  average  difference  between  the  CLASSY  and 
ISOCLS  PCC  values  remained  about  4 percent. 
However,  the  CLASSY  PCC  equaled  the  ISOCLS 
PCC  for  segment  1988,  and  the  difference  was  very 
small  for  segment  1961.  The  last  column  of  table  VI 


Table  III. — Comparison  of  Wheal  Proportion  Estimates  for  Labeled  Clusters 
Using  Single-Pass  Segment  Data 


Segment 

Ground  truth 

Ground-truth 

</<WiPD(W) 

ISOCLS 

P|(W) 

CLASSY 

PC(W) 

Dj- 

#,(W)  - ftw) 

0c(W)-0(W) 

|D,|  -|DC| 

P(W) 

P(SG) 

1181 

0.234 

0.290 

0.333 

0.287 

0.303 

0.053 

0.069 

-0.016 

1988 

.330 

.330 

.322 

.397 

.287 

.067 

-.043 

.024 

1961 

.082 

.082 

.097 

.042 

.069 

-.040 

-.013 

.027 

1965 

.416 

.470 

.516 

.526 

.645 

.110 

.229 

-.119 

Average 

.266 

.293 

.317 

.313 

.326 

.047 

.061 

-.021 

Table  IV. — Acquisitions  Used  In  Creating 
Four-Channel  Green  Images 


Segment 

Acquisition* 

1181 

M*r  10, 1976 
Apr.  16,1976 
May  3. 1976 
July  14, 1976 

1988 

Oct.  20, 197S 
May  6. 1976 
June  12, 1976 
Sept.  28. 1976 

1961 

Aug.  IS.  1975 
June  12. 1976 
Aug.  23, 1976 
Sept.  10. 1976 

1965 

May  It.  1976 
July  21. 1976 
Aug.  8. 1976 
Sept.  14. 1976 

shows  that,  when  the  four-channel  green  images 
were  used,  the  wheat  proportion  estimates  from  the 
CLASSY  clusters  were  closer  to  the  ground-truth 
values  than  were  the  1SOCLS  estimates  in  every  case. 

Tables  VII  and  VIII  are  analogous  to  tables  II  and 
III,  except  that  they  give  the  results  for  the  single- 
pass  simulated  data.  The  column  labeled  maximum 
likelihood  PCC  (PCCM)  gives  the  overall  PCC  when 
using  standard  maximum  likelihood  classification 
where  the  statistics  for  each  class  were  computed 
from  fields  in  the  simulated  image  given  the  class 
label  for  each  field.  Note  that  the  PCC  estimates  for 
CLASSY  were  higher  than  those  for  ISOCLS  in  two 
of  the  four  passes.  In  fact,  on  pass  2,  where  the  sepa- 
rability was  greatest,  the  PCC  for  CLASSY  equaled 


Table  V. — Comparison  of  the  Number  of  Clusters  and 
the  Estimated  Probability  of  Correct  Classification 
Using  the  Four-Channel  Green  Image  Data 


Segment 

ISOCLS 

CLASSY 

-Skr 

No.  of 
dusters 

p£c, 

No.  of 
dusters 

p6cc 

1181 

40 

0.8667 

4 

0.8000 

-0.0667 

1988 

40 

.9357 

16 

.9357 

0 

1961 

40 

.9167 

23 

.9097 

-.0070 

196$ 

40 

.806$ 

13 

.7290 

-.077$ 

Average 

40 

8814 

14 

.8436 

-.0378 

the  maximum  likelihood  PCC.  On  the  average,  the 
PCC  for  CLASSY  was  1.4  percent  higher  than  that 
for  ISOCLS. 

The  proportion  estimate  computed  from  the 
labeled  clusters  is  given  in  table  VIII.  Again,  the  esti- 
mate from  CLASSY  was  closer  to  the  true  value  in 
two  of  the  four  passes.  However,  the  average  in- 
dividual ISOCLS  estimate  was  about  2 percent  closer 
to  the  true  value. 

The  results  for  the  simulated  data  using  band  1 
from  each  of  the  four  passes  are  given  in  table  IX. 
Band  1 was  selected  arbitrarily  to  assess  the  use  of 
multitemporal  data.  Note  that  the  PCC  estimate  for 
CLASSY  was  1 .0,  meaning  that  none  of  the  CLASSY 
clusters  contained  a mixture  of  wheat  and  nonwheat 
grid-intersection  pixels. 

Using  the  simulated  data  makes  it  possible  to 
identify  a cluster  with  a certain  class  in  the  data  by 
determining  which  class  contributes  the  majority  of 
pixels  to  the  cluster.  After  such  an  identification,  the 
generating  statistics  for  the  class  may  be  compared 
with  the  cluster  statistics  produced  by  CLASSY.  Ta- 
ble X presents  the  results  of  such  a comparison  for 


Table  VI. — Comparison  of  Wheat  Proportion  Estimates  for  Labeled  Clusters  Using 
Four-Channel  Green  Image  Data 


Segment 

Ground  truth 
P<W>  P(SCi) 

ISOCLS 

P|(W) 

CLASSY 
Pc(  W) 

fy(Wl  - f*(W> 

- ftwi 

ID,|  -|DC| 

1>8I 

0.234 

0.230 

0.292 

0.241 

0 058 

0007 

0.051 

1988 

.330 

.330 

.316 

342 

— 014 

.012 

.002 

1961 

.082 

082 

.066 

069 

-.016 

-.013 

003 

1965 

.416 

.470 

.625 

.565 

209 

.149 

.060 

Average 

266 

293 

.325 

.304 

059 

039 

029 
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Table  VU. —Comparison  of  the  Number  of  Clusters  and  die  Estimated  Probability  of 
Correct  Classification  Using  Single-Pass  Simulated  Data 


Past  PCCM  tSOCLS CLASSY  PCCm-p6c,  PCCM  - p£cc  p£cc-  p£C| 

No.  of  p£q  No.  of  P$Cc 
clusten  clutters 


l 

0.935 

40 

0.91 39 

5 

0.9043 

0.021 

0.030 

-00096 

2 

.986 

40 

.9713 

5 

.9857 

015 

.000 

.0144 

3 

.970 

40 

.9761 

8 

.9522 

-.006 

.018 

-.0239 

4 

928 

40 

.8852 

7 

.9187 

.043 

.009 

0335 

Average 

955 

40 

.9366 

625 

.9402 

018 

.014 

.0144 

Table  Vlll— Comparison  of  the  Wheat  Proportion  Estimates  for  Labeled  Clusters 
Using  Single-Pass  Simulated  Data 


Pass 

P<W) 

#1<W> 

PC(W) 

?(<wf  - 0(W> 

$C(W)  - ftw) 

ID,  | - I Dc | 

1 

0.3398 

0.2536 

-0.0097 

-0.0862 

-0.0765 

2 

3398 

.3254 

.3541 

-.0144 

.0143 

0001 

3 

3398 

.3636 

,2917 

0238 

- 0481 

-.0243 

4 

3398 

3254 

.3349 

-.0144 

- 0049 

Average 

.3398 

.3361 

.3086 

-.0147 

-.0312 

-.0226 

the  pass  2 simulated  data,  whereas  table  Xi  gives 
similar  results  for  the  clustering  using  band  1 from 
each  of  the  four  passes. 

In  the  pass  2 CLASSY  results,  four  of  the  five 
clusters  could  be  clearly  identified  with  one  of  the 
generating  classes  or  distributions.  A comparison  of 
the  mean  vector  and  covariance  matrices  shows  a 
remarkable  correspondence  between  the  CLASSY 
statistics  and  the  generating  statistics.  Cluster  3 was 
about  equally  divided  between  grass  1 and  grass  2. 


TABLE  IX. — Probability  of  Correct  Classification 
Using  Multipass  Simulated  Dam 


Data 

ISOtLS 

t£4SS> 

p£cc  - p£c, 

No.  of  p£t  | 
clusters 

.Vo.  of 
dusters 

p £cc 

Band  1 from 
each  of  4 
passes 

40  0 9809 

7 

1 0000 

0.0191 

Only  the  statistics  for  grass  1 are  shown  in  the  table. 
Similarly,  cluster  2 was  a mixture  of  stubble,  fallow, 
and  barley  2.  The  statistics  for  each  of  these  classes 
are  very  similar  for  this  pass.  The  statistics  for  stub- 
ble I are  given  as  a representative  example  of  that 
group  of  classes. 

The  da:a  from  band  1 of  each  of  the  four  simu- 
lated passes  had  more  separability;  thus.  CLASSY 
was  able  to  distinguish  more  classes.  The  comparison 
of  the  generating  statistics  and  the  CLASSY  statistics 
is  presented  in  table  XI.  Only  the  variance  terms 
from  the  multipass  covariance  matrix  were  available. 
Again,  there  is  remarkable  correspondence  between 
the  CLASSY  statistics  and  the  generating  statistics. 


CONCLUSIONS 

The  main  conclusion  of  this  paper  is  that  the  per- 
formance of  the  CLASSY  clustering  algorithm  corn- 
pars  favorably  with  that  of  ISOCLSon  both  the  real 
and  simulated  L ACIE  segment  data.  In  terms  of  per- 
formance. these  results  were  obtained  despite  the 
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Table  X.— Comparison  of  Ouster  Statistics  for  Pass  2 Simulated  Data 


I'liairr  Identification 

Generating  ilatlillct 

CLASSY  siattuta 

number 

Mean 

Covariance  matrix 

Mean 

Covariance  matrix 

vector 

vector 

A 

Wheat  1 

0.91 

1.21 

1.21 

3.24 

24 

27.29 

34 

24 

1.77 

28.14 

EB 

-65 

1.75 

a « 

. 

J 

Wheat  2 

■»  R 

F.i.ss 

"o.82 

O.o9 

1702 

69 

III 

-.48 

26  3) 

El 

-.48 

1.2) 

-.47 

• 

-1.19 

1.41 

1 

Barley  1 

"l  55 

1.74 

122 

25  80 

174 

316 

1.52 

ij.98 

1.22 

1.52 

1.65 

24.19 

96 

1.12 

91 

B ■ 

• 

3 

Gras*  1 

Era 

*1.31 

2.07 

054 

(gras*  2) 

Eo 

91 

123.37 

.54 

91 

5E3 

1 -II 

-.29 

h - 

L 

2 

Stubble  1 

HRjI 

(stubble  2. 

2364 

64 

1.12 

fallow. 

2422 

.77 

1.51 

barley  2) 

2312 

69 

.66 

B m 

B 

ill 

1.21  1.04  013  -019 

-.65 

104  2 87  - 10  -.95 

1.75 

27.45 

13  -.10  184  176 

3.15 

28.26 

-.19  -.95  1.76  3.50 

; 

b a 

m m 

a 

» a 

18.76 

-1.19 

1713 

80  1.54  - 47  -1  20 

141 

26  36 

-03  - 47  1 46  1.50 

3.25 

27.97 

- 50  -1  20  1 50  3 51 

* 

» a 

* ; 

vfj 

22.97 

200  197  1 83  1 41 

1.12 

25  45 

197  3 59  2 36  1.77 

91 

25.27 

183  2.36  2 92  1 84 

1.19 

1 41  1.77  1 84  2 22 

a 

6b  m 

• 

1 15  1 48  0 55  0 22 

-.29 

Mg! 

148  410  101  28 

2318 

55  1 01  140  64 

I2)_ 

22.52_ 

22  28  64  1 24 

a 

r w 

p *» 

Fjrn 

66 

2443 

44  1 17  38  29 

T71 

24  18 

31  38  141  92 

231 

22.77 

B a 

22  29  92  1 68 

fact  that  CLASSY  reduces  the  number  of  clusters  by 
a factor  of  4 to  6 as  compared  to  iSOCLS.  This  per- 
formance indicates  that  CLASSY  is  indeed  approx- 
imating the  empirical  mixture  density  rather  than 
just  breaking  up  the  data  space  into  small  homo- 
geneous areas  as  does  ISOCLS.  This  conclusion  is 
further  substantiated  by  noting  the  high  degree  of 
correspondence  between  the  CLASSY  cluster 
statistics  and  the  generating  statistics  of  classes  in  the 
simulated  data.  When  data  from  band  I of  each  of 
the  4 simulated  acquisitions  was  clustered  using 
CLASSY.  5 of  the  10  classes  were  very  accurately 
identified.  The  remaining  classes,  whose  statistics 
were  very  close  together,  were  broken  into  two 
reasonable  groups.  It  appears,  therefore,  that  the 
CLASSY  algorithm  may  well  provide  a solution  to 
the  fundamental  problem  of  clustering— the  deter- 
mination of  the  inherent  number  of  classes  in  the 
data. 
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Table  XI.— Comparison  of  Cluster  Statistics  for  Band  I for  Each  of  Four  Passes  of  the  Simulated  Data 


Chalet  Identification 

Generating  natinm 

CLASSY  ttatititci 

number 

Mean 

Covariance  matrix 

Mean 

Covariance  matrix 

m m 

r i 

_ 

5 Wheat  1 

26  93 

106 

26  84 

1.27  0.69  1.42  161 

2036 

0.91 

20.27 

69  1.21  1.25  162 

17.39 

2.15 

17.22 

1.42  1.25  2.32  2.65 

17.27 

3.30 

1702 

161  162  2.65  3.49 

* 

k»  sJ 

d 

2 Wheat  2 

25.79 

7.03 

25.90 

*1.22  0.94  0 78  098* 

1155 

0.12 

18  76 

.94  1.23  78  87 

1615 

0.47 

1688 

.78  .78  .85  67 

1112 

1.76 

17.97 

.98  .87  .67  1 80 

4 Barley  1 

2141 

2.16 

»4t 

2.30  1 56  3.03  2.18 

23  30 

4.16 

22  71 

1 56  1.81  2.69  2 17 

2201 

4.15 

2256 

303  2.69  5.33  3 80 

1701 

4,47 

17  44 

2.18  2.17  3.86  3.58 

L J 

m 

L J 

» # 

3 Barley  2 

2123 

133  1 

28  40 

163  -008  1 79  1 05* 

22.71 

0.77 

22.71 

-08  ,79  -.40  -.09 

22.37 

1.18 

22  56 

1 79  - 40  2.54  1 23 

17.34 

161 

17.44 

105  -.09  1.23  1 86 

• J 

> J 

J 

1 Grass  1 

2S.6f 

n si 

25J2 

*2.69  0.87  1.76  2.17* 

(gras*  2. 

20.83 

131 

2120 

.87  1.39  .74  98 

stubble  1 ) 

20.10 

ISO 

2035 

1 76  .74  1.71  1 65 

20  60 

162 

20.72 

2.17  9*  1.65  2 43 

6 Fallow  1 

24.59 

0.67  1 

[Ubsl 

075  0.38  042  0.48*1 

22.41 

0.52 

2245 

.38  .72  .68  .09 

23  22 

090 

23  21 

42  .68  1 06  04 

2156 

0.66 

21.67 

.48  09  04  75 

b» 

r 

i 

a 

m m 

7 Siubole  2 

24  33 

1.17 

24  34 

1.31  038  -001  -014 

(Mlow  2) 

2221 

0.67 

22.25 
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Appendix 


The  equation  used  to  obtain  iterative  estimates  of  But 
the  a priori  class  probabilities  or' proportions,  a,,  is 
derived  befinning  with  equation  (9),  which  is  re* 
peated  here  in  a slightly  more  expanded  form. 


, I f (»>KJ.) 

‘ " fix  p (A) 


(Al) 


m • 

h * Wk + Ew* 

/•i 

i+i 


• aPtk ♦ (*  - *i)  it* 


<A4) 


where 


Here,  define 


m 

p (**)  M E («*h-r/) 

/■i 


Since  ^does  not  depend  on  k,  a,  may  be  canceled 
from  both  sides  of  the  equation  to  obtain 


(A2)  So, 


where,  for  convenience,  the  functional  notation  has 
been  simplified. 

Now. 
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assuming  a/  1.  Breaking  this  sum  up  into  those 
terms  which  are  positive  and  those  which  are  nega- 
tive  results  in 


o = £ p‘t-  n q-i-  + £ Pik  - q‘k  C A8> 


Pik>qik  k Pik^ik 


Pk 


Now,  a,  ;s  eintroduced  as  follows: 


0 = - 


pik  ~ ?i*| 


pik>qik 


Pk 


i1  - ai) 


*)  ( E 

\Pik<qik 


Pik  ~ qik 

Pk 


jj 


This  is  the  iterative  equation  used  to  obtain  propor- 
tion estimates  in  CLASSY. 

Equation  (A10)  may  also  be  put  into  a form  il- 
lustrating the  nature  of  the  update  term  to  obtain 


ai  = ai  + 


/,  \ Plk 

(* ' j E~zn 

N-  Y Y — 

^ Pv  Pk 

Pik>qik  * Pik  < qik  * 


(All) 


This  equation  illustrates  that  direct  functional 
iteration  using  equation  (A10)  amounts  to  adding  a 
correction  term  given  by 


»/  ('  - «i) 

N - £ — 

rf  Pk 

Pik><tik 


Y Pik  ~ ‘iik 

tx  pk 

Y 

pk 

Pik<qik  * 


(A9) 


If  we  now  solve  for  the  a-  s which  are  outside  the 
square  brackets  in  terms  of  the  a-  s,  p- s,  and  q-s  in- 
side the  square  brackets,  the  following  is  obtained. 


* £ 
Pjk>‘>,k 


Pik  qik 
Pk 


Pik  qik 


Pik  > ^k 


Pk 


(>  ai)  £ 


p^  qik 


Pik<«ik 


Pk 


to  the  old  value  of  at  in  order  to  obtain  the  new  value 
of  at. 

As  a way  of  comparing  the  iterative  equation  for 
proportion  estimates  used  in  CLASSY  (eq.  ( A10))  to 
the  standard  maximum  likelihood  iterative  equation 
(eq.  (AI)),  one  may  rework  the  standard  equation  so 
that  the  nature  of  the  update  term  is  apparent.  Using 
equation  (A6),  one  obtains 


1 = 1 + 
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(A10)  This  equation  reduces  exactly  to  equation  (AI). 
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A comparison  of  equations  (All)  and  (A12) 
shows  that  the  difference  is  in  the  term  N versus 


N - T — 

Pi 

Plk><fik  * 


S Plk 

Plk<l>ikPk 


Thus,  the  iterative  equation  used  in  CLASSY  (eq. 
(All))  will  amplify  the  correction  for  a,  if  there  are  a 


significant  nui..ber  of  points  such  that  0 < pik  < 1 
and  0 < qt<  1.  This  corresponds  to  the  case  where 
cluster  / is  a ‘'mixed"  cluster;  that  is,  there  is  a signifi- 
cant amount  of  overlap  between  cluster  / and  other 
clusters.  Since  it  is  precisely  these  “mixed"  clusters 
for  which  the  standard  iterative  equation  (eq.  (Al) 
or  (A12))  converges  slowly,  the  iterative  equation 
used  for  proportions  in  CLASSY  (eq.  (A10)  or 
(AH))  should  converge  more  readily. 
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Linear  Feature  Selection  With  Applications 

H.  P.  Decell,  Jr.,a  andL.  F.  Guseman,  JrP 


INTRODUCTION 

The  Large  Area  Crop  Inventory  Experiment 
(LACIE)  is  concerned  with  the  use  of  satellite* 
acquired  (Landsat)  multispectral  scanner  (MSS)  data 
to  conduct  an  inventory  of  some  crop  of  economic 
interest  such  as  wheat  over  a large  geographical  area. 
Such  an  inventory  requires  the  development  of  ac- 
curate and  efficient  algorithms  for  data  classification. 
The  use  of  multitemporal  measurements  (several 
registered  passes  during  the  growing  season)  in- 
creases the  dimension  of  the  original  measurement 
space  (pattern  space)  and  thereby  increases  the  com- 
putational load  in  classification  procedures.  In  this 
connection,  the  cost  of  using  statistical  pattern 
classification  algorithms  depends,  to  a large  extent, 
on  reducing  the  dimensionality  of  the  problem  by 
use  of  feature  selection/combination  techniques. 
These  techniques  are  employed  to  find  a subspace  of 
reduced  dimension  (feature  space)  in  which  to  per- 
form classification  while  attempting  to  maintain  the 
level  of  classification  accuracy  obtainable  in  the  orig- 
inal measurement  space.  The  most  meaningful  per- 
formance criterion  that  can  be  applied  to  a classifica- 
tion algorithm  is  the  frequency  with  which  it 
misclassifies  observations;  that  is,  the  probability  of 
misclassification.  Consequently,  one  should  attempt 
to  select/combine  features  in  such  a way  that  the 
probability  of  misclassification  in  feature  space  is 
minimized. 

In  the  sequel,  several  ways  in  which  feature  selec- 
tion techniques  have  been  used  in  LACIE  are  dis- 
cussed. In  all  cases,  the  techniques  require  some  a 
priori  information  and  assumptions  (e.g.,  number  of 


aUniversity  of  Houston,  Houston,  Texas.  This  author  was  sup- 
ported in  part  by  NASA  Johnson  Space  Center  under  contract 
NAS  9-15000. 

^Texas  A & M University,  College  Station,  Texas.  This  author 
was  supported  in  part  by  NASA  Johnson  Space  Center  under  con- 
tract NAS  9-14689. 


classes  or  form  of  conditional  class  density  func- 
tions) about  the  structure  of  the  data.  In  most  cases, 
the  classification  procedure  (e.g.,  Bayes'  optimal)  has 
been  chosen  in  advance.  Dimensionality  reduction  is 
then  performed  so  as  to  (1)  choose  an  optimal 
feature  space  in  which  to  perform  classification  and 
(2)  determine  a transformation  to  apply  to  measure- 
ment vectors  prior  to  classification.  In  all  that 
follows,  the  transformations  used  for  dimensionality 
reduction  are  linear;  that  is,  the  variables  in  feature 
space  are  always  linear  combinations  of  the  original 
measurements. 

As  mentioned  previously,  the  most  meaningful 
performance  criterion  for  a classification  procedure 
is  the  probability  of  misclassification  (denoted  in  the 
sequel  by  G ).  However,  if  the  dimension  of  feature 
space  (and  therefore  measurement  space)  is  greater 
than  one,  then  G is  difficult  to  compute  without  addi- 
tional class  structure  assumptions  (e.g.,  equal 
covariance  matrices).  As  a result,  several  numeri- 
cally tractable  criteria  have  been  developed  in  con- 
junction with  LACIE  which  provide  some  informa- 
tion concerning  the  behavior  of  G.  These  criteria  are 
discussed  in  the  next  section.  In  a subsequent  sec- 
tion, a compendium  of  recent  results  on  linear 
feature  selection  techniques,  most  of  which  are 
available  only  in  scattered  NASA  contract  reports,  is 
presented.  The  final  section  includes  a discussion  of 
the  use  of  these  techniques  in  LACIE,  an  outline  of 
some  of  the  investigations  underway  in  the  use  of 
linear  feature  selection  techniques,  and  a discussion 
of  some  related  open  questions. 


MATHEMATICAL  PRELIMINARIES 

Let  Il|,  n2, . . . , Ilm be  distinct  classes  (e.g.,  crops 
of  interest)  with  known  a priori  probabilities  a,,  a2, 
. . . , am,  respectively.  Let  x - (*,,  x2, . . . , x„)re  Rn 
denote  a feature  vector  of  measurements  (e.g..  Land- 
sat  MSS  data  from  either  a single  pass  or  several 


mm  Mb  iwttNTiowm  mm 
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registered  passes)  taken  from  an  arbitrary  element  of 


The  resulting  classification  procedure,  called  the 
Bayes’ optima! classifier , is  defined  as  follows  (ref.  1): 

y „ Assign  an  element  to  II,  if  its  vector  x of 

i measurements  belongs  to  /?,,  1 « / as  m. 


Suppose  that  the  measurement  vectors  for  class  II 
are  characterized  by  the  /dimensional  multivariate' 
normal  density  function 


It  is  assumed  that  the  n x 1 mean  vector  p,  and  the 
n x n covariance  matrix  I,  for  each  class  n(are 
known  (with  2,  positive  definite),  1 ^ i ^ m.  The 
symbol  | A | is  used  to  denote  the  determinant  of  the 
matrix  A.  The  /dimensional  probability  of  mis- 
classification,  denoted  by  G,  of  objects  from 

m 

u II. 

1=1 

is  given  (refs.  1 and  2)  by 


The  Bhattacharyya  coefficient  for  classes  / and  j 
(1  « i,J*  m)  is  given  (ref.  3)  by 


It  has  been  shown  that 


G<  Z £ I ‘wi2  f j s P 

l=l  /=<+  l Tt" 

The  quantity  p is  usually  called  the  Bhattacharyya 
distance  (or  the  average  Bhattacharyya  distance). 

There  have  been  various  attempts  to  utilize  cer- 
tain functions  of  p(i.j)  and  p to  generate  Bhat- 
tacharyya-related  separability  measures.  For  further 
variations  on  this  theme,  we  refer  the  reader  to  the 
complete  listing  of  references  and  in  particular  to 
reference  4. 

The  divergence  (ref.  5)  between  classes  /and  j(  1 ^ 
/.  j =S  m)  is  given  by 


G = 1 - 


L 


max  a.p.(x)dx 
R"  KKm 


= 1 


E «i  / 

« = 1 R. 


Pj(*)dx 


Sj ) (i,. 1 ■ v)  ] 

')(«,  »,)(«,  »,)r] 

and  the  average  interclass  divergence  is  given  by 


where  the  sets  Rr  1 « i m,  called  the  Bayes’ deci- 
sion regions , are  defined  by 


R,  - 


x c : ap.(x) 


max  a-Py(x) 
I <ij<.  m 


^ i m 


m - 1 m 

D “ E £ w> 

<=i  /=i 
i*i 
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where 


or,  equivalently,  as  shown  in  reference  6,  by 


i m 


m(m  - 1) 


P(x)  = 


E a^x> 


where 


The  measure  //satisfies  the  inequality 


*<  - E ( V W) 

/=1 

hi 


and 


5,y  = *t  - 


As  in  the  case  of  p(i,j ) and  p,  various  functions  of 
D(i,j ) and  D have  been  proposed  as  class  sepa- 
rability measures. 

Kanal  (ref.  4)  provides  an  excellent  exposition  of 
such  measures  (e.g..  Shannon  entropy,  Vajda's 
average  conditional  quadratic  entropy,  Devijver’s 
Bayesian  distance,  Minkowsky  measures  of  non- 
uniformity,  Bhattacharyya  bound,  Chernoff  bound, 
Kolmogorov  variational  distance,  Lissack  and  Fu's 
generalization  of  the  latter,  Ito's  approximating  func- 
tions, and  the  Jeffreys-Matusita  distance).  This  work 
contains  304  references  and  is  perhaps  the  only  com- 
prehensive exposition  of  the  subject  through  early 
1974.  A more  recent  nonparametric  separability 
measure  due  to  Bryant  and  Guseman  (ref.  7)  will  be 
outlined  at  the  end  of  this  section. 

Devijver  (ref.  8)  develops  a bound  on  G called  the 
Bayesian  distance.  He  gives  an  excellent  development 
of  the  concept  and  its  relationship  to  the  aforemen- 
tioned separability  measures.  His  results  are  quite 
general  with  regard  to  the  class  densities  p,(x)  and 
class  a priori  probabilities  at,  1 as  / ^ m.  The  Baye- 
sian distance  is  defined  to  be 


// 


-E* 


i=t 


a,2P,(x)2 

p(x)2 


H < G < — + — 

m m 


< s/TF 


Following  the  philosophy  discussed  in  the  in- 
troduction, the  intractable  nature  of  the  expression 
for  G (i.e.,  although  in  many  instances  unnecessary, 
attention  is  being  restricted  to  a finite  family  of  nor- 
mally distributed  pattern  classes)  was  one,  if  not  the 
single,  reason  for  developing  more  tractable  pattern 
class  separability  measures.  These  measures  could 
then  be  used  in  lieu  of  G to  determine  mappings 
from  pattern  space  to  feature  space  in  which  the 
classification  of  patterns  is  equivalent  to  (G  is 
preserved)  or  “nearly  equivalent  to”  classification  of 
patterns  in  pattern  space.  Two  fundamental  ques- 
tions that  arise  are  these.  First,  what  (if  any)  relation 
do  the  class  separability  measures  bear  to  G?  Second, 
can  one  develop  tractable  algorithms  based  on  the 
separability  measures  to  determine  the  dimension- 
reducing  mappings? 

In  connection  with  these  questions,  only  linear 
mappings  B of  the  measurement  space  Rn  onto  Rk 
for  k < n will  be  considered.  This  is  equivalent  to  re- 
quiring that  flbe  a A:  x n rank-A  matrix.  This  class  of 
mappings  certainly  includes  those  of  the  “feature 
subset  selection”  type  since  the  selection  of  any  A- 
feature  subset  (i.e.,  any  k components  of  x e /?")  can 
be  accomplished  by  selecting  the  appropriate  k x n 
matrix  B consisting  of  only  0's  and  l's.  The  class  of 
k x n rank-A  matrices  is  more  general  in  the  sense 
that  linear  combinations  of  the  features  are  permis- 
sible. 

In  all  that  follows,  it  will  be  assumed  that  B is  a 
A x n rank-A  matrix  and  that  X is  a normally  dis- 
tributed vector-valued  random  variable.  An  observa- 
tion on  X will  be  denoted  by  x «■  Af(w).  It  is  well 
known  that  if  X — N(ji£)  then  Y — BX  — N(Bn, 
BIB1). 


i 


| 

i 


* 
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The  transformed  measurements,  y — Bn,  for 
class  n,  are  normally  distributed  with  density  func- 
tion 


pt(y,B)  = (2 n)~2  |fi2,fir| 


The  quantity  p(B)  is  called  the  B-Bhattacharyya 
distance  or  the  B-average  Bhattacharyya  distance. 

In  addition,  it  has  been  shown  by  Decell  and 
Quirein  (ref.  6)  that  G “ G{B)  if  and  only  if  p « 

p(B). 

The  B-divergence  between  classes  i and  j (1  « ij 
<5  m)  is 


■(Blfi7)  ~'(y  -Bp,)] 

and  the  resulting  probability  of  misclassification  is 
given  by 


G(B)  = 


max  <*p((yji)dy 
\<i<m 


= 1 


pfy.B)dy 


Db(U ) «ytr  |[fi2,Br  - B^B1] 

•[(s£/»r)  ‘ - ( ) - 1 ] | 

| IWr) - (*2/»r)_1] 

\BPi  - BPf)  {bp,  - Bpt)  T 
and  the  B-interclass  divergence  is 


where  the  transformed  Bayes’  decision  regions  are 

given  by  m~ 1 m 

W)  = £ E W'i) 

<=i  /= t 

R,(B)  ={yeRk  :aiPi(y,B)  l*i 


= max  OjPjiy.B)  J , 1 < i < m 
l</<m  * 


The  B-Bhattacharyya  coefficient  for  classes  i and  j 
is  given  by 


Pf(y<B)Pj(y,B) 


or,  equivalently  (ref.  6), 


1 I 

D(B)  = “ tr 

* i 


m 


E 


( BlfiTYx(BSfiT) 


ndjn-J) 

2 


where 


It  has  been  shown  by  Decell  and  Quirein  (ref.  6)  T \ 

that  for  each  B Si  ~ 2-.  I2/  + ) 

i*t 

m - 1 m 

G « G(B)  < £ £ = PM 

i=l  /=/+! 
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and 


bU  s * ~ **/ 

Although  there  is  no  explicit  relationship  between  G 
and  D (or  G(B)  and  it  was  shown  by  Deceit 
and  Quirein  (ref.  6)  that  D — D(B)  if  and  only  if  G “ 
(KB). 

In  the  present  setting  and  with  the  obvious  general 
meaning  of  the  definition,  the  B-Bayeslan  distance  is 
defined  to  be 


or,  perhaps,  to  choose  <P(B)  - £HB)  and  find  i 
such  that 


<t>  ( B ) m ma xD(B) 
B 


In  seeking  an  extremum  of  4>,  it  is  natural  to  con- 
sider the  differentiability  of  d>  with  respect  to  the  ele- 
ments of  B.  In  the  sequel,  use  is  made  of  the  Gateaux 
differential  of  4>  at  B with  increment  C,  denoted  by 
64>(£;Q  and  defined  (if  the  limit  exists)  by 


m 


m « £ 

/=! 


,1 


« i2Pi(y.B)2  j 

p{y,B)2 


5 <b(B\C)  = lim 
s-0 


<t>(B  + sC)  - 4>(jg) 


where 


p(y.B ) = 


m 


E 

<= 1 


^pfy.B) 


where  C is  a k x n matrix.  If,  for  a given  k x n 
matrix  B of  rank  k,  the  previously  defined  limit  ex- 
ists for  each  k x n matrix  C,  then  4>  is  said  to  be 
Gateaux  differentiable  at  B.  Similarly,  when  the  limit 
exists,  the  following  is  defined. 


It  has  been  shown  in  reference  9 that  G(B)  — G if 
and  only  if  H(B)  — H.  In  this  connection,  the 
authors  of  this  paper  plan  to  extend  the  variational 
results  of  the  next  section  to  include  Bayesian  dis- 
tance. 

In  the  next  section,  related  new  results,  some  of 
which  concern  questions  raised  earlier,  will  be  out- 
lined, and  the  connection  between  linear  feature 
combination  and  the  classical  concept  of  statistical 
sufficiency  will  be  explained. 


RECENT  RESULTS  IN  LINEAR 
FEATURE  SELECTION 

In  what  follows,  we  will  be  concerned  with  finding 
an  extreme  value  of  some  function  d>  (of  the  reduc- 
tion matrix  B ).  For  example,  it  may  be  desired  to 
choose  <t>(B)  - G(B)  and  find  if  such  that 


4>  {b  ) * min  G(B ) 
B 


pXy,B  + sC)  - p.  ( y,B ) 
bP((y.B,C)  = lim-™- — 

i— 0 


where  C is  a k x n matrix.  For  an  excellent  discus- 
sion of  Gateaux  differentials,  see  reference  10. 

Theoretical  results  related  to  minimizing  G{B)  for 
two  multivariate  normal  classes  with  equal  a priori 
probabilities  and  a one-dimensional  feature  space 
were  initially  presented  by  Guseman  and  Walker1 
(ref.  11).  The  associated  computational  procedure 
was  presented  by  Guseman  and  Walker  (ref.  12). 

The  following  results  for  the  general  case  of  m 
n-dimensional  normal  classes  with  arbitrary  a priori 
probabilities  and  a one-dimensional  feature  space  ap- 
pear in  reference  13. 


1L.  F.  Guseman.  Jr.,  and  H.  F.  Walker,  “On  Minimizing  the 
Probability  of  Misclassification  for  Linear  Feature  Selection."  JSC 
Internal  Tech.  Note  JSC -08412,  NASA  Johnson  Space  Center, 
Houston,  Tex.,  Aug.  1973. 
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Lamina.  Let  B be  a nonzero  1 x n vector.  Then 
(omitting  subscripts) 


G{ B),  then  6 must  satisfy  the  vector  equation 


*iy(*B«) 


C1Bt 

B2Br 


Bn) 


C£Br 

(B£Br)2 


( y - Bn)2 


for  each  1 x n vector  C. 

Theorem  1 . Let  B be  a nonzero  1 x n vector  for 
which  a,p£y,B)  * a.p^y.B)  for  / * j.  Then,  G is 
Gateaux  differentiable  at  B,  and 


5C(B;C)  = 


m /* 

“/  / ?>P^y&£)dy 

/= i JRtm 


Theorem  2.  Let  B be  a nonzero  1 x n vector  at 
which  G assumes  a minimum.  Then,  G is  Gateaux 
differentiable  at  6. 

By  substituting  the  expression  for  8/>,0'.B;C) 
given  by  the  lemma  into  the  expression  from 
theorem  1,  and  using  integration  by  parts,  the  follow- 
ing result  is  obtained. 

Theorem  3.  Let  B be  a nonzero  1 x n vector  for 
which  ctjP'iy.B)  a,pjly, B)  for  / * j.  Then,  G is 

Gateaux  differentiable  at  B,  and 


m CIBr 

«C(B;C)  « V apXy.B) ‘-—(y 

ft 


Bnt)  * Ctit 


ft  j(B) 


where  the  notation  | R,( B)  denotes  the  sum  of  the 
values  or  the  function  at  the  right  endpoints  of  the 
intervals  comprising  /?,< B)  minus  the  sum  of  its 
values  at  the  left  endpoints. 

If  B is  a nonzero  1 x « vector  that  minimizes 


3C(B)  „ 

dB 


(6G  (B;Cj) 
6G  (B;C„) 


where  Cy,  I«y  « n,  is  a 1 x n vector  with  a 1 in  the 
,/th  slot  and  0’s  elsewhere.  Using  the  preceding  for- 
mula for  9Q(B)/3B  resulting  from  theorem  3,  a 
numerically  tractable  expression  for  the  variation  in 
the  probability  of  misdassification  G with  respect  to 
B is  obtained.  The  use  of  this  expression  in  a com- 
putational procedure  for  obtaining  a nonzero  1 x « B 
that  minimizes  G was  developed  by  Guseman  and 
Marion  (ref.  14). 

If  6 is  a nonzero  1 x n vector  that  minimizes  G, 
then  the  entries  pJB)  in  the  error  matrix  P(B)  for 
the  optimal  classification  procedure  determined  by 
the  regions  R,(B)  can  be  readily  computed  from  the 
expression 


The  linear  feature  selection  procedure  for 
minimizing  G(B)  has  been  extended  to  the  case 
where  the  density  function  for  each  class  is  a convex 
combination  of  multivariate  normals.  This  extension 
allows  for  the  design  of  a one-dimensional  “class 
A — not  class  A”  classification  procedure  that  could 
be  used  (for  example)  to  classify  wheat(s)  versus 
nonwheat(s).  The  associated  computational  pro- 
cedure for  this  extension  was  developed  by  Guseman 
and  Marion  (ref.  IS). 

Deceit  and  Quirein  (ref.  6)  develop  explicit  ex- 
pressions for  8Z)(B;C)  and  8p(B;C)  in  terms  of  B 
and  the  known  means  and  covariance  matrices  p , 
and  I h 1 < / < m.  These  expressions  immediately 
provide  9(D(B))/9B  and  3(p(B))/dB  for  use  in 
a Davidon-Fietcher-Powell  (ref.  16)  iteration  scheme 
for  determination  of  an  extremum  value  of  D( B)  and 
p(B),  respectively. 
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The  explicit  expressions  are 


WB)) 

dB 


m 


■ 

•( 

■ 


BS,Br)  (B£,Br)  1 


and 


■M. 

dB 


V (» 

f-!  dB 


where 


values  of  Z>((/A|Z)C/).  In  reference  18,  these  results 
were  refined  in  the  sense  that  any  extremal  transfor* 
mation  can  be  expressed  in  the  form  B - (lk\Z)Hp 
...  //,,  where  min  (Ac,  n - *}  and  //,isaHouse- 

holder  transformation  / - 1 p.  The  latter  result 

suggests  constructing  a sequence  of  transformations' 
(/A|Z)//|  , (lk\Z)H2Hl ....  such  that  the  values  of 
the  class  separability  criterion  (e.g.,  G(B),  p(B), 
Z>(B>)  evaluated  at  this  sequence  comprise  a 
bounded  monotone  sequence  of  real  numbers.  The 
construction  of  the  fth  element  of  the  sequence  of 
transformations  requires  the  solution  of  an  n-dimen* 
sional  optimization  problem.  Recall  that  1\H),  the 
Householder  transformations  (refs.  19  and  20)  //-  / 
- 2xxr,  x c Rn  with  l|x  ||  — 1,  is  a compact  con- 
nected subset  of  the  unitary  matrices  for  which 
H7  ■ H ■ H~x.  Some  of  these  results  are  outlined 
beginning  with  the  definition  (for  a case,  say,  when  it 
is  desired  to  maximize  4>): 


h , 


l.u.b.  <t> 
HtT(H) 


H 


£/) 


Theorem  4.  For  each  positive  integer  /,  let  the 
element  H,  of  1\H)  be  chosen  such  that 


- BW  + [b(£,  ♦ 2,)Br] 


'*(h  + £y)  I [(B£rBr)  BS, 


+ (B2/Br)  ’bsJ 


It  is  also  shown  in  reference  6 that,  in  general,  an  ab- 
solute extremum  of  6(B).  p(B),  and  Z)(B)  always 
exists.  For  any  one  of  the  given  functions  6(B). 
p(B),  or  Z)(B),  the  absolute  extremum  is  attained  at 
B “ (/JZ)(/,  where  lk  is  the  k x k identity  and  (/is 
some  n x n unitary  matrix,  thus  parameterizing  the 
aforementioned  extreme  problems  on  the  compact 
group  of  unitary  matrices.  In  reference  17,  it  is 
shown  that  the  nature  of  the  eigenvalues  of  U in  no 
way  provides  any  information  about  the  extreme 


\\*)  "l  =iib  %lZ) 


then 

(“  *(4h  «,  ",  ** 

", 

t:>  ♦|/J/|  //,  «,«  v ♦(/Jx)  «„,//, 

H for  every  II  < 7\ll) 

{i)  V,|') 

H for  every  We  7X11) 

(4)  *('*!*)  ",  ",  <r  i ", 

i **('*10 

for  every  II  t 7 \ll , ind ,,  « 0 r 7 

Theorem  5.  The  sequence 

"r  “ 1 

L 
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is  bounded  above  and 


£*(41*)  • Ub-  *('.!*)  «,  », 


These  theorems  give  rise  to  a sequential  monotone 
procedure  Tor  possibly  obtaining  a 6-extremal  rank-* 
linear-combination  matrix.  At  each  stage  in  this  pro- 
cedure, the  extremal  problem  is  a function  of  only  n 
variables.  One  can  conjecture,  under  certain  condi- 
tions, that  the  process  should  terminate  in  at  most 
min(*.  n — *)  steps.  The  conjecture  is  clearly  in  tine 
with  the  min(*. « - *)  representation  of  the  actual 
6-extremal  solution.  Certainly,  the  conjecture  could 
further  depend  on  any  pathological  behavior  of  6 
and  Tally2  constructs  such  a pathological  failure 
point.  Tally3 * *  shows  that  the  procedure  actually  con- 
verges to  a 6 extremum  provided  6 is  7(//)-sloped. 
(See  definition  1.)  Some  of  these  results  will  be  out- 
lined. Let  U denote  the  set  of  unitary  matrices  and 
T(H)  the  Householder  transformations. 

Definition  1.  6 will  be  called  1\H  )-sloped  pro- 
vided that  Ut  U and  6(0)  < 6max  imply  then  ex- 
ists some  H e 7 \H)  (dependent  on  U)  such  that 


HU)  < HHU)  < 6max  • l.u.b.  HU) 

U 

Definition  2.  A sequence  {£/,}£]  in  U will  be 
called  ^-convergent provided  (6(  Ut) } JS.  i converges. 

Definition  3.  A sequence  [U,)jii  in  li  will  be 
called  a 6- Householder  sequence  provided  that 
He  H and  / , an  integer,  imply 

(1) »  ((/,)  (W,.,) 

(2)  * (HI/,)  < * (t/,„) 


2W.  Tally.  “On  ihe  Convergence  of  Optimal  Linear  Combina- 
tion Procedures."  Comp.  A Maths,  with  Appls.  (To  be  published). 

3W.  Tally.  “A  Convergence  Criterion  for  Optimal  Linear 

Combination  Procedures."  Comp.  A Maths,  with  Appls.  (To  be 

published). 


Proposition  1.  Each  6-Househotder  sequence 
I mj&i  is  6-convergent  and 


Hm  6(0,)  • HO)  • l.u.b.6(0,) 
t t 


for  some  U a U. 

Proposition  2.  Each  6-Householder  sequence 
converges  to  6mix  if  and  only  if  6 is  7(//)-sloped. 

Proposition  3.  If  ( U,\ J® { is  a 6-Householder  se- 
quence and  6 is  7(//)-sloped,  then  exactly  one  of  the 
following 

(1)  (6((//)Jj£|  is  strictly  monotonic  (and  con- 
vergent to  6mtx) 

(2)  for  some  integer  *, 

l.u.b.4>  (HUk)  <6  (</*) 

H 

(in  which  case,  6 (£/*)  s6mix!) 

These  techniques  have  been  applied  to  the  func- 
tions 6(B)  “ 0(B)  and  6(B)  “ p(B),  respectively, 
by  Decell  and  Mayekar  (ref.  21)  and  Deceit  and 
Marani  (ref.  22)  using  Cl  flight  line  data. 

In  each  case,  explicit  expressions  for  (3/dx) 
tfX(/JZ)//)J  andO/Sx)  (p«/*|Z)W)), where  //- 
/ - 2xxr,  ||x||  “ 1,  have  been  developed  for  m-pat- 
tern  classes  (equal  proportions)  and  have  been  used 
sequentially,  according  to  the  aforementioned 
theorems,  to  calculate  the  extreme  values  and  the 
unitary  matrices  (as  products  of  elements  of  7 (//))  at 
which  the  extreme  values  occur.  Some  of  the  results 
are  outlined  in  what  follows. 

Let 

hi  9 h * h 

4 ■ V (4 I*) T t(4l*)  "V'  (41*)  r]'‘ 
■ V (4 1*)  r [(41*)  »V'  (4!*)  r]  ‘ 

and 


h * (4 1*)  r [(4l*)  HhH  (4 1*)  r] 
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Let 


lr[o(i'.K)")] 


e»-[»re*(/,|z)  q„  (/,|2)««r]r 
- Kfy  ('*1*1  - o#  (W«»r] 

and  let  and  tu  be  similarly  defined  by 

substituting,  respectively.  and  Lu  for  Qv  in  the 
expression  for  (?<,.  /.  J - 1 . . . , m.  The  resulting  ex- 
pressions are 


/■!/•/♦  I 


M1 


*E  |(Wi  '\)r  {Mt  -Af,)|. 


where 


M,  - *«'<?,(/* \g) 

(?,-[{s,Br  S,Br(BX,Br)  ‘(BS,Br)((BS,Br)  >] 
B-  (/*|Z)(/  2n«r) 


where 


6U  a'(»t  - **/)  Uh  *,)  r- 

tr  (•)  « trace  of  (•)  and  M ■ det(-) 

UW)  «*jnit\z) r) 


and 

c#‘-Tln|(,*Fl"V'  ('«F)r 

*|ln|(;,|Z)//S,«(/,!Z)r| 


Peters  et  al.4  approach  the  problem  of  finding  a 
minimum  of  G(  B)  from  the  point  of  view  of  treating 
the  mapping  B:  Rn  - Rk  (for  some  * « n)  as  a 
statistic,  and  they  provide  necessary  and  sufficient 
conditions  that  such  a B be  a sufficient  statistic  in 
the  classical  sense  of  Halmos  and  Savage  (ref.  23), 
Lehmann  and  SchefTe  (ref.  24),  Bahadur  (ref.  23), 
LeCam  (ref.  26).  and  Kullback  (ref.  :>).  Although 
their  results  are  much  more  general  than  required  for 
dealing  with  the  dimensionality  reduction  problem 
for  a finite  number  of  normal  populations,  the  ap- 
plication they  provide  for  such  families  actually 
allows  one  to  write  down  the  optimal  dimension- 
reducing  k x n statistic  B such  that  CRB)  “ G 
(whenever  such  a B exists).  Moreover,  they  also 
guarantee  that  there  is  no  other  B of  smaller  rank 
(i.e.,of  rank  less  than  k)  for  which  G(B)  - G.  Their 
application  to  the  problem  will  simply  be  stated;  the 
reader  is  referred  to  Peters  et  al.4  for  the  more 
general  applications  to  exponential  families  (e.g., 
Wishart  and  normal  multivariate  sampling). 


+ 4ln|(/*lZ)//S///(/*lZ)r|  + 7>“2 


4B.  C.  Peter*.  Jr..  R Redner.  and  H.  P.  Decell.  Jr..  “Charac- 
terizations of  Linear  Sufficient  Statistic*."  submitted  to  Sankhya. 
A.  1976. 
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Let  N{$ */*!<).  I m 0, 1 m - l,be  an  ft- variate 

normal  family  with  Mo  ■ 0 and  Xq  ■ /,  each  mem- 
ber havint  density 


pfin)  • (2»)  1 JsJ  l«xP  [-i-  (a  - i*)  7^-ij(a  - p,)] 


The  requirement  Mo  " © and  Ig  ™ /imposes  no  loss 
of  generality  since  there  exists  a nonsingular  matrix 
Mq  for  which  MqIqMqT  - / and  a change  of  coordi- 
nate system  defined  by  the  transformation  x — 
A/0(x  - Mo>  allows  one  to  recover  the  sufficient 
statistic  in  the  original  coordinate  system. 

Thooram  6.  Let#o  - 0,  Sq  - /,  and 


M ■ [<*lM  K ilrt  ;IS2  'I  Fm  . '] 


Matrix  B is  a linear  sufficient  statistic  for  the  given 
finite  ft-variate  normal  family  if  and  only  if  range 
(B1)  “ range  (Af).  Moreover,  A “ rank  Af  is  the 
smallest  integer  for  which  there  exists  a k x n suffi- 
cient statistic  for  the  given  famUy. 

Again,  note  that  theorem  6 completely  determines 
the  greatest  dimensionality  reduction  possible  such 
that  G(B ) - G.  Moreover,  as  will  be  shown  by  ex- 
ample in  what  follows,  there  are  any  number  of  ways 
of  finding  a if  such  that  range  iB1)  • range  (A/).  In 
fact,  the  theorem  states  that  if  rank  M - «,  then 
there  is  no  dimension-reducing  sufficient  statistic 
(i.e.,  G(B)  > G for  every  k x n matrix  B for  which 
k < »). 

The  following  result  due  to  Decdl  et  al.5  provides 
one  means  of  calculating  (and  determining  the  exis- 
tence of)  the  aforementioned  sufficient  statistic  if  for 
which  G(B)  - G. 

Tbeorom  7.  Let  Il(  be  ea  ft-variate  normal 
population  with  a priori  probability  a,  > 0,  mean  ft/- 
and  covariance  I,;  / « 0, 1, . . . , m - 1 (with  Mo  “ 
X0-  /),and  let 

I k.  ,l£,  /l-:  . '] 


be  a full-rank  ( ■ A < ft)  decomposition  of  St.  Then, 
the  ft-variate  Bayes  procedure  assigns  x to  Il/  if  and 
only  if  the  A-variate  Bayes  procedure  assigns  f*x  to 
II,  Moreover,  A is  the  smallest  integer  for  which 
there  exists  a A x « matrix  T preserving  the  Bayes 
assignment  ofx  and  7k  to  IV /- 0,1 m — I. 

These  results  completely  characterize  the  nature 
of  data  compression  for  the  Bayes  classification  pro- 
cedure for  normal  classes  in  the  sense  that  A is  the 
dimension  of  greatest  allowable  dau  compression 
consistent  with  preserving  Bayes  population  assign- 
ment. Moreover,  the  theorem  provides  an  explicit 
expression  for  the  compression  matrix  7 depending 
only  on  the  known  population  means  and 
covariances.  The  statistic  T ■ F7  given  by  the 
theorem  is  by  no  means  unique  (e.g.,  for  any  non- 
singular A x A matrix  A.  T ■ AF7 will  do).  It  is  also 
true  that  there  may  be  more  efficient  methods  for 
calculating  the  statistic  f(yet  to  be  determiner'  than 
the  method  of  fiill-rank  decomposition  of  Af 

It  should  be  noted  that  the  matrix  A#  has  an  “ex- 
cellent chance”  of  having  rank  equal  to  n.  Even  in 
the  case  of  two  populations  (m  — 2),  there  may  well 
be  n linearly  independent  columns  among  the 
2(ft  + I)  columns  of  Af.  Consequently,  there  do  not 
exist  an  integer  A and  a A x « rank-A  compression 
matrix  T preserving  the  Bayes  assignment  of  x and 
7k. 

Peters  (ref.  27)  treats  the  problem  of  determining 
sufficient  statistics  for  mixtures  of  probability 
measures  in  a homogeneous  family.  The  reader  is 
referred  to  Teicher  (refs.  28  and  29).  Yakowitz  (re!. 
30),  and  Yakowitz  and  Spragins  (ref.  31)  for  the 
treatment  of  homogeneous  families. 

When  the  linear  feature  selection  techniques  men- 
tioned previously  are  used  in  a L AClE-type  applica- 
tion, they  are  based  on  the  assumptions  that  each 
class  conditional  density  function  is  multivariate 
normal  and  that  the  associated  parameters  (m<-£<. 
1 « / « m)  are  known  or  can  be  estimated.  In  some 
cases,  either  the  normality  assumptions  may  be  vio- 
lated or  else  the  determination  of  the  number  of 
classes  present  and  their  associated  parameters  is  not 
possible.  The  question  then  arises  as  to  how,  without 
having  samples  from  each  class,  one  might  perform  a 
dimensionality  reduction  without  losing  much  of  the 
“separation"  present  in  measurement  space.  For  ex- 


$H.  P DaceU.  It ..  P.  L Odell,  end  W A.  Coberly.  "Linear 
Dimeiuion  Reduction  end  Bayei  Clauiftcatton,"  J.  American 
Statin.  Amoc  , submitted  Jan.  1971. 
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ample,  one  might  be  interested  in  displaying  a 
registered  multipass  Lendsat  data  set  on  a three-color 
display  device  without  a priori  knowledge  of  class 
structure  in  the  data. 

Each  of  the  previous  linear  feature  selection  tech- 
niques uses  a statistical  definition  of  the  word  “sepa- 
ration." The  following  procedure,  due  to  Bryant  and 
Guseman  (ref.  7),  makes  no  statistical  assumptions 
about  the  data.  In  addition,  no  labeled  subsets  (train- 
ing data)  are  required.  In  this  sense,  the  linear  feature 
selection  technique  outlined  in  the  following  is  dis- 
tribution free.  Basically,  the  problem  can  be  stated  as 
follows. 

Given  distinct  (prototype)  vectors  x(,  x2 xp 

in  A” and  k (I  «S  k < /»), determine  a linear  transfor- 
mation A:  A"  — A*  which  minimizes 

f\A)*  E (IK  */||  \\a*,  '4*/|| ) 2 

!</</</> 


where  the  norms  ||  x,-  xy||  and|j/4x,-  A*j\\  are  the 
euclidean  norms  in  A”  and  A*,  respectively.  Let  m - 
pip  - 1 )/2  and  let ! :t : 1 « i K m)  denote  the  m dis- 
tinct differences  of  the  prototypes  xr  If  A - ( a^k  K 

z,  - (zrt zM)r.  and  o'  - (fly, Oj„)r,  then  the 

gradient  of  fat  A is  given  by 

• AS  - A7\A) 


where  S is  the  n x n matrix 


and  TiA)  is  the  n x n matrix 


Standard  optimization  techniques  can  be  used  to  ob- 
tain A.  which  minimizes  F. 

For  a given  data  set  (c.g..  a Landaat  sample  seg- 
ment). there  art  several  ways  to  choose  the  pro- 


totype vectors  x,,  1 M / * m.  For  example,  one  might 
choose  cluster  centers  from  the  output  of  a clustering 
algorithm. 


CONCLUOINQ  REMARKS 

There  are,  of  course,  ad  hoc  feature  selection  pro- 
cedures based  on  specific  problem  knowledge  and 
empirical  studies.  An  example  of  such  a procedure  is 
the  transformation  of  Kauth  and  Thomas  (ref.  32) 
used  in  the  analysis  of  Landsat  data.  This  transfor- 
mation is  based  on  an  empirical  data  study  and  is  de- 
scribed by  an  orthogonal  coordinate  change  UR4  — 
A4.  Application  of  the  transform  U to  Landsat 
measurements  simply  produces  a reduced  feature 
space  of  dimension  2 (brightness-greenness).  This  is 
essentially  accomplished  at  each  Landsat  measure- 
ment X *■  (x,,xj,  x},  x4) r by  the  mapping: 


The  Kauth-Thomas  transform  has  proven  to  be  of 
value  in  LACIE  applications  (eg.,  physical  in- 
terpretation. dimensionality  reduction,  scatter  plots). 
As  one  would  expect,  the  Kauth-Thomas  transform 
is  not  a sufficient  statistic  nor  will  it.  in  general, 
preserve  Landsat  Bayes  class  assignment  in  feature 
space. 

Feature  selection  techniques  are  currently  being 
studied  as  a tool  for  “optimum  pan”  selection  prob- 
lems in  LACIE.  The  basic  objective  is  to  develop  a 
technique  for  a priori  selection  (based  on  some  sepa- 
rability criterion)  of  subsets  of  Landsat  acquisitions 
for  analysis  to  separate  wheat  from  nonwheat  when 
given  an  adequate  sample  of  labeled  wheat  and  non- 
wheat LACIE  segment  pixel  data.  There  are 
preliminary  results  in  this  direction  due  to  Guseman 
and  Marion  (ref.  33)  using  one-dimensional  feature 
selection  that  minimizes  G(B\. 

In  still  another  LACIE  application,  studies  arc 
being  performed  on  parametric  and  nonparametric 
feature  selection  techniques  that  allow  analyst/ 
interpreters  to  better  separate  spring  wheat  from 
other  small  grains  in  a reduced  feature  space  (e.g., 
brightness-greenness).  In  this  connection,  labeled 
wheat  and  other  small-grains  LACIE-scgmcnt  pixel 
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data  and  ancillary  data  are  being  used  to  estimate  the 
distribution  functions  for  spring  wheat  and  other 
spring  small  grains.  Feature  selection  methods  are 
being  used  to  find  a priori  statistically  optimum 
features  and  associated  discriminant  functions. 
These  will  be  compared  to  the  brightness  and  green* 
ness  features  currently  used  at  the  NASA  Johnson 
Space  Center. 

Methods  for  estimating  ctass  proportions,  based 
on  the  linear  feature  selection  procedure  for 
minimizing  G(fl),  have  been  developed  by  Guseman 
and  Walton6  (ref.  34).  In  both  papers,  the  proportion 
estimation  techniques  rely  on  the  fact  that  one  can 
readily  compute  the  error  matrices  associated  with 
the  optimal  classifier  produced  by  the  linear  feature 
selection  procedure. 

Other  results  of  general  related  interest  appear  in 
Babu  and  Kalra  (ref.  3S),  Kadota  and  Shepp  (ref.  36), 
Marill  and  Green  (ref.  37),  Swain  and  King  (ref.  38), 
Tou  and  Heydom  (ref.  39),  Watanabe  et  al.  (ref.  40), 
Wee  (ref.  41),  and  Wheeler  et  al.  (ref.  42). 
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Feature  Extraction  Applied  to  Agricultural  Crops  As 

Seen  by  Landsat 

R.  J.  Kauth,0  ft  ft  Lambeck W.  Richardson ,a  G.  S.  Thomas ,a  and  A.  ft  Pentland a 


INTRODUCTION  AND  SCOPE 

The  physical  interpretation  of  the  spectral-tem- 
poral structure  of  Landsat  data  can  be  conveniently 
described  in  terms  of  a graphic  descriptive  model 
which  has  been  named  the  Tasseled  Cap.  This  model 
of  Landsat  data  has  been  a rich  source  of  develop- 
ment not  only  in  crop-related  feature  extraction  but 
also  for  data  screening  and  for  haze  effects  correc- 
tion. The  authors  will  first  describe  this  model 
qualitatively  and  indicate  its  applications  and  then 
use  it  to  analyze  several  feature  extraction 
algorithms. 


The  Tasseled  Cap 

An  examination  of  scatter  plots  of  Landsat  data 
from  a number  of  different  sites  (fig.  1)  discloses 
several  distinct  characteristics  of  the  data.  When 
plots  from  multispectral  scanner  (MSS)  channels  2 
and  3 are  examined,  a roughly  triangular  shape  above 
the  diagonal  of  the  scatter  plot  can  be  seen.  In  a scene 
the  size  of  a LACIE  sample  segment,  this  triangle  is 
seldom  “full"  of  data — usually  some  part  of  it  is 
missing  so  that  one  might  easily  miss  seeing  the 
shape.  However,  if  scatter  plots  from  several  seg- 
ments are  overlaid,  as  shown  in  figure  2,  the  overall 
pattern  becomes  visible.  One  of  the  authors,  G. 
Thomas,  hypothesized  a physical  explanation  of 
these  patterns  as  follows  (ref.  1). 

“In  order  to  achieve  a better  understanding  of  just 
what  is  portrayed  in  the  cluster  patterns  and  why  a 
general  or  'complete'  cluster  pattern  has  the  shape  it 
does,  ERIM's  vegetative  canopy  model  (ref.  2)  was 
called  into  play.  As  it  happened,  the  necessary  model 
inputs  from  a certain  type  of  vegetation,  Ionia  wheat 
(a  variety  grown  in  Michigan)  were  readily  available. 
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FIGURE  1. — General  cluster  patterns  in  channels  2 and  3.  (a) 
Livingston  County,  Illinois,  July  16,  197.1.  (b)  Layette  County, 
Illinois,  June  It,  1973. 
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FIGURE  2.— Overlay  of  figure  1 cluster  patterns  of  Fayette 
County.  July  16, 1973. 


And  so,  two  soil  reflectances  were  selected,  one  to 
simulate  a darker,  perhaps  more  organic  or  moist  soil 
and  the  other  to  simulate  a lighter  colored,  perhaps 
sandier  or  drier  soil  (for  more  information  on  the  im- 
portance of  soil  moisture  on  soil  reflectance,  see 
Blanchard  et  al„  1974  (ref.  3),  and  Parks  et  al.,  1974 
(ref.  4))  and  a construction  made  of  the  phenology  of 
a sample  of  wheat,  Ionia  variety  with  two  very 
different  soil  backgrounds  (fig.  3).  As  may  be  seen, 
the  soil  background  plays  a dominant  role  in  the 
bidirectional  reflectance  of  a stand  of  Ionia  wheat  un- 
til the  onset  of  plant  maturity.  If  the  bare  soil  points 
are  connected  by  a line,  hereafter  called  the  bare  soil 
line,  the  outline  of  the  phenology  of  Ionia  wheat  is 
very  similar  to  the  outline  of  the  ‘complete’  cluster 
pattern.  It  is  not  unreasonable  to  suspect,  therefore, 
that  location  within  a cluster  pattern  represents,  to  a 
degree,  vegetative  state  of  development  as  modified 
by  such  factors  as  soil  reflectance,  stress  of  various 
kinds,  mixtures  of  vegetation  and  so  on."  (Soil  reflec- 
tances are  taken  from  reference  5.) 

These  same  observations  are  expressed  as  an  ar- 
tistic conception  in  figure  4(a),  showing  a scatter  plot 
of  band  5 versus  band  6.  Band  5 is  centered  in  the 
chlorophyll  absorption  band  of  green  vegetation 
around  0.65  micrometer,  whereas  band  6 is  centered 
on  the  cellulose  reflectance  peak  of  green  vegetation 
around  0.75  micrometer.  The  signal  from  green 
vegetation  is  thus  found  to  be  small  in  band  5 and 


large  in  band  6.  Such  a point  is  indicated  in  figure 
4(a)  by  the  designation  “green  stuff." 

The  main  variability  found  in  soils  is  their  bright- 
ness. Hence,  the  signals  from  bare  soils  are  dis- 
tributed primarily  along  a line  radiating  from  the 
origin. 

Additional  observations  may  be  made.  When 
looking  at  a scatter  plot  of  band  4 versus  band  5,  one 
sees  the  data  take  a narrow  cigar  shape,  as  shown  in 
figure  4(b).  The  physical  explanation  is  that  band  4 is 
centered  on  the  cellulose  reflectance  around  0.55 
micrometer  but  extends  significantly  into  the 
chlorophyll  absorption  region  so  that  bands  4 and  5 
are  highly  correlated.  (There  are,  however, 
differences  between  bands  4 and  5,  which  will  be  dis- 
cussed later.) 

When  looking  at  figure  4(b),  one  imagines  he  is 
seeing  the  “edge"  of  a three-dimensional  object  em- 
bedded in  four-dimensional  Landsat  space.  A scatter 
plot  of  band  6 versus  band  7 would  show  the  same 
thing.  Does  this  mean  that  the  three-dimensional  ob- 
ject is  in  fact  nearly  planar;  i.e.,  two  dimensional? 
Can  the  data  be  projected  in  such  a way  as  to  enable 
seeing  its  structure  more  clearly? 

Such  considerations  led  one  of  the  authors1  (ref. 
6)  to  define  axes  of  maximum  variation  in  the  Land- 
sat  data  and  to  ascribe  physical  interpretation  to 
these  axes.  First,  a collection  of  points  along  the 
diagonal  soil  line  was  chosen  from  scatter  plots  and  a 


'r.  J.  Kauth.  “Soil  Reflectance,"  NASA  Johnson  Space 
Center  Memo  TF3-75-5-190.  Apr.  10.  1975. 


FIGURE  3.—  Phenology  ter  wheat  (Ionia  variety)  based  on 
canopy  model, 
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(b) 

FIGURE  4.— Scatter  plot  comparisons  of  MSS  tend  reflectance, 
(a)  Bands  S and  6 (fairly  uncorrelated),  (b)  Bands  4 and  5 (high- 
ly correlated). 


times  called  the  green  fold),  which  appears  to  be  a 
line  over  which  the  Tasseled  Cap  is  folded. 

When  a particular  plant  canopy  has  reached  its  ap- 
propriate stage  of  maximum  green  development,  it 
may  then  turn  yellow.  Yellowing  is  often  accom- 
panied by  darkening,  either  in  the  vegetation  itself  or 
as  a result  of  drooping  leaves  which  cause  shadowing 
to  increase.  By  yellowing,  withering,  or  being  cut 
down,  every  canopy  eventually  returns  to  the  soil. 
These  varieties  of  the  return  path  in  signal  space  are 
the  "tassels"  of  the  cap. 

In  Landsat  data,  the  nearness  of  the  yellow  region 
to  the  side  of  the  Tasseled  Cap  makes  it  almost  in- 
distinguishable. Hence,  for  most  purposes,  the 
agricultural  information  is  substantially  contained  in 
a plane  defined  by  unit  vectors  in  the  brightness  and 
green-stuff  directions  but  with  a small  amount  of 
variation  in  the  yellow-stuff  direction  perpendicular 
to  the  plane.  A fourth  direction,  “non-such,"  is 
orthogonal  to  the  other  three  and  contains  primarily 
noise  variation.  The  unit  vectors  describing  these 
four  directions  together  form  an  orthogonal  (rota- 
tion) matrix  called  the  Tasseled  Cap  (or  Kauth- 
Thomas)  transform.  Numerical  values  are  given  in 
appendix  A. 

Some  further  comments  regarding  the  yellow- 
stuff  direction  in  Landsat  signal  space  are  in  order. 
Three  physical  effects  cause  signal  variations  with 
significant  components  in  this  one  direction.  The 
first  is  the  effect  for  which  the  direction  is  named; 


single  axis  was  fitted  to  these  (four  dimensional) 
points.  This  axis  was  termed  “brightness.”  Then,  a 
point  near  the  green  peak  was  chosen  and  used  with 
the  Gram-Schmidt  procedure  (ref.  7)  to  produce  a 
second  axis  that  was  normal  to  the  first  axis  and  that, 
together  with  the  first,  spanned  the  soil  line  and  the 
point  labeled  “green  stuff."  This  second  axis  was 
called  “greenness.” 

Figure  5 expands  the  illustration  to  three  dimen- 
sions. Soil  sample  points  fall  near  a line  and,  further, 
fall  predominantly  in  a planar  region  surrounding 
that  line.  Plants  start  out  on  bare  soil  and  grow 
toward  the  region  of  green  stuff.  Among  the  variety 
of  green  plant  canopies,  some  have  large  amounts  of 
shadow,  which  shifts  the  observation  directly  toward 
the  origin.  Trees  are  an  example  of  a green  canopy 
with  a fair  amount  of  shadow,  as  shown  by  the 
“badge  of  trees.”  The  variety  of  shadowing  in  various 
canopies  creates  a region  called  the  green  arm  (some- 


BAND6 
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i.e.,  the  yellowing  of  vegetative  canopies.  In  this  pro- 
cess, the  chlorophyll  absorption  disappears  and  the 
reflectance  in  band  5 increases  significantly.  The 
reflectance  of  bands  4 and  6 also  increases  some- 
what. Mixed  in  with  the  increased  reflectance  of  the 
actual  plant  material  is  additional  shadowing  because 
of  drooping  leaves  in  some  canopies  or,  alternatively, 
further  decreases  of  shadowing  due  to  laying  out  of 
the  leaves  or  heads  in  others  (see  reference  6 for  an 
example).  However,  these  shadow  effects  cause 
changes  in  brightness  and/or  greenness,  which  are 
already  established  directions.  The  component  that 
is  not  already  represented  by  brightness  and  green- 
ness is  defined  as  the  yellow  direction.  This  direction 
is  dominated  by  the  difference  between  MSS  bands  5 
and  4. 


The  second  effect  is  the  difference  between 
average  and  red  soils.  As  seen  by  Landsat,  the  domi- 
nant direction  of  variation  in  soil  reflectance  is  a scal- 
ing factor  applied  to  all  bands.1  The  next  most  impor- 
tant component  is  the  difference  caused  by  red  soils. 
The  existence  of  a second  important  soil  reflectance 
component  suggests  the  concept  of  the  “plane  of 
soils"  mentioned  previously  in  the  discussion  of 
figure  S.  (It  is  fun  to  refer  to  crops  planted  upon  and 
growing  out  of  the  plane  of  soils!)  The  direction  of 
the  second  component  of  soil  reflectance  in  Landsat 
data  is  nearly  parallel  to  the  yellow  direction.  Land- 


*R.  J.  kauth,  "Soil  Reflectance,”  NASA  Johnson  Space 
Center  Memo  TF3-75-5-190.  Apr.  10, 1975. 
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FKil  RF  6.— Cluster  scalier  plots  of  several  Kansas  sample  segments  In  transformed  coordinates,  (a)  Grant  County,  May  27, 1974.  (b) 
Haskell  County,  May  27,  1974.  (c)  Morton  County.  May  27.  1974.  (d)  Finney  County,  May  26.  1974. 
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sat  sensors  cannot  distinguish  between  the  visual  col- 
ors red  and  yellow. 

The  third  effect  is  due  to  haze  over  a scene,  which 
causes  changes  in  brightness  and  greenness  and  in 
the  negative  yellow  direction.  This  effect  will  be  dis- 
cussed in  more  detail  later. 

The  last  two  effects  combine  and  make  it  difficult 
to  observe  canopy  yellowing.  Hence,  attention  is 
usually  concentrated  on  developing  crop  signatures 
based  on  the  greenness  and  brightness  directions. 

Figure  6 shows  some  typical  cluster  plots  of  Land- 
sat  data  from  9-  by  11-kilometer  (5-  by  6-nautical- 
mile)  sample  segments.  In  these  figures,  the  abscissa 
is  brightness  and  the  ordinate  is  greenness  or  “green 
stuff.”  Notice  that,  in  any  particular  scene,  portions 
of  the  Tasseled  Cap  structure  may  be  missing,  de- 
pending on  the  crops  planted  and  their  state  of 
development  and  on  the  types  of  soil.  (The  numbers 
on  the  ellipses  are  for  identification  only.) 


Additional  Structural  Characteristics 
of  the  Data 

Signals  from  vegetation  and  soil  form  the  Tasseled 
Cap.  Water,  clouds,  and  cloud  shadow  also  have  their 
proper  places  in  the  four  dimensions  of  Landsat  sig- 
nal space  and  can  be  described  relative  to  the  posi- 
tion of  the  Tasseled  Cap  and  the  points  already 
defined.  Haze  and  varying  angles  of  solar  illumina- 
tion affect  the  data  structure  in  systematic  ways. 

Figure  7 shows  that  clouds  are  located  along  the 
brightness  axis  but  are  shifted  in  the  negative  yellow 
direction  as  well.  Figure  7(a)  is  an  artistic  conception 
of  a scatter  plot  showing  the  data  projected  onto  the 
brightness  versus  greenness  plane,  whereas  figure 
7(b)  shows  the  data  projected  onto  the  yellow  versus 
non-such  plane.2  Haze  can  be  thought  of  as  inter- 
mediate or  thin  clouds.  When  haze  is  present  over  a 
scene,  the  contrast  in  the  scene  itself  is  reduced  while 
a portion  of  a cloudlike  signal  is  added.  The  inter- 
mediate condition  between  no  haze  and  completely 
hazy  (i.e..  cloudy)  is  sketched  in  figure  8 as  a shaded 
region.  It  can  be  seen  that,  as  the  haze  amount  is  in- 


2Earlier,  it  was  mentioned  that  non-such  is  (he  direction 
orthogonal  to  the  other  axes  which  "fills  out"  the  four-dimen- 
sional space  of  the  Landsat  data  Variations  in  the  non-such  direc- 
tion are  mostly  noise  although  some  information  regarding  water 
and  snow  appears  to  be  contained  tn  this  direction  also. 
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FIGURE  7.— Schematic  diagram  of  the  Tasseled  Cap,  showing 
the  positions  of  water,  clouds,  and  cloud  shadows,  (a)  Projection 
on  brlgbtness/greenness  plane,  (b)  Projection  on  yellow/non- 
such plane. 


creased,  the  triangular  shape  of  the  Tasseled  Cap 
shrinks  toward  the  point  of  the  clouds. 

This  diagram  demonstrates  why  haze  has  a severe 
effect  on  signature  extension  capability.  The  entire 
data  structure  shifts  out  along  the  brightness  axis  and 
shrinks  in  the  greenness  direction;  but  worse,  the 
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FIGURE  8.— Schematic  diagram  of  Tasseled  Cap  as  seen 
through  base,  (a)  Projection  on  brightness/greenness  plane,  (b) 
Projection  on  yellow/non-such  plane. 
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shift  in  the  negative  yellow  direction  is  sufficient  to 
move  the  Tasseled  Cap  sidewise,  completely  off  its 
haze-free  image. 

The  angle  of  incidence  of  the  Sun's  illumination 
on  a scene  viewed  by  Landsat  sensors  also  affects  the 
data  structure  significantly.  To  a first  order  of  correc- 
tion, the  radiance  incident  at  the  Landsat  detector 
system  changes  directly  with  the  cosine  of  the  solar 
zenith  angle  at  the  Earth  point  viewed.  (This 
theorem  would  be  exactly  true  if  the  atmospheric 
haze  and  the  ground  both  behaved  as  Lambertian 
reflectors.)  To  make  the  data  gathered  under 
different  Sun  zenith  angles  commensurate,  the 
ERIM  procedure  is  to  correct  all  data  to  the  standard 
zenith  angle  of  39°,  which  is  typical  of  Landsat  data 
over  Kansas  in  April. 

Up  to  this  poir*.  the  four-dimensional  structure  of 
the  Landsat  data  hi  been  discussed  qualitatively  and 
geometrically,  using  the  Tasseled  Cap  as  a tool  for 
visualizing  the  data.  The  major  effects  of  haze  and 
angle  of  solar  illumination  on  the  data  structure  have 
been  indicated.  Numerical  values  can  be  and  are  as- 
sociated with  this  qualitative  description  (appendix- 
es A and  B).  The  immediate  question  is,  how  can  this 
knowledge  be  exploited  to  gain  useful  information 
from  the  Landsat  data? 


PREPROCESSING/FEATURE  EXTRACTION 

The  preprocessing  and  correction  steps  that  take 
place  before  signatures  can  be  extracted  should  be  an 
integral  part  of  feature  extraction.  The  objectives  of 
preprocessing  and  feature  extraction  may  be  several 
(ref.  8).  Their  purpose  may  be 

1.  To  make  the  data  more  comprehensible  by  ad- 
justment to  standard  conditions  of  observation 

2.  To  eliminate  or  flag  bad  or  noisy  observations 
in  the  data 

3.  To  make  the  data  more  comprehensible  by  ex- 
tracting physical  features  or  by  projecting  in  such  a 
way  as  to  display  the  physical  structure  of  the  data 

4.  To  compress  the  data,  retaining  most  of  the  in- 
formation and  averaging  over  noise  and  redundancy 

5.  To  make  the  distributions  of  the  derived 
features  fit  some  convenient  model,  such  as  the 
multivariate  normal  distribution 

Several  factors  should  be  mentioned.  First,  all  of 
the  preceding  objectives  may  not  be  met  by  the  same 
preprocessing  transformations;  in  fact,  in  some 
cases,  they  may  be  mutually  exclusive.  Sometimes,  it 


may  be  desirable  to  perform  transformations  on 
parallel  paths  effecting  different  objectives;  for  ex- 
ample, a linear  transformation  might  be  desirable  for 
producing  projections  which  can  be  examined  by  a 
researcher  to  gain  insight,  whereas  a nonlinear  pro- 
jection might  be  used  to  cause  the  preprocessed  data 
to  fit  a normal  distribution. 

Further,  the  term  “preprocessing”  may  be  some- 
what misleading  in  that  it  appears  to  define  a com- 
puter architecture  in  which  first  the  preprocessing 
steps  are  performed,  then  the  classification  steps  are 
performed,  and  then  a proportion  estimate  is  made. 
In  fact,  however,  all  the  different  conceptual  steps 
constitute  merely  one  functional  relationship  be- 
tween the  data  and  the  desired  output  and  could  in 
practice  be  performed  in  one  step.  The  preprocess- 
ing/feature extraction  steps  discussed  here  are  con- 
ceptual and  might  be  implemented  in  a variety  of 
computer  architectures. 

Knowledge  of  the  Landsat  data  structure  and  its 
physical  effects  permits  the  institution  of  two  very 
useful  preprocessing  steps;  namely,  screening  and 
correcting  the  data  for  the  effects  of  solar  illumina- 
tion angle  and  haze. 


Screening 

Since  the  structure  of  the  agricultural  data  is 
known,  certain  classes  of  materials  can  be  identified 
immediately  without  special  training.  Thus,  decision 
surfaces  have  been  established  to  identify  water, 
clouds,  cloud  shadow,  dense  haze,  cloud  shadow  over 
water,  and  haze  over  water,  in  addition,  picture  ele- 
ments (pixels)  which  are  far  outside  the  domain  of 
any  known  material  class  are  labeled  “bad"  pixels, 
and  pixels  without  undesirable  characteristics  are 
labeled  “good." 

To  perform  the  screening,  a sequence  of  decisions 
is  made  for  each  pixel,  such  as 

Is  pixel  bad? Y— label  bad 

Is  pixel  cloud? Y— label  cloud 

Tach  decision  in  turn  depends  on  a sequence  of 
tests.  For  example,  on  which  side  of  certain  planes  in 
the  Tasseled  Cap  transformed  space  does  the  pixel 
fall  after  being  corrected  for  the  major  effect  of  solar 
zenith  angle?  If  the  pixel  falls  either  above  16  counts 
or  below  —8  counts  in  the  non-such  direction,  it  is  a 
bad  pixel.  The  complete  sequence  of  tests  is  given  in 
appendix  A. 
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Correcting 

As  shown  in  figure  8,  the  effect  of  haze  is  to 
shrink  the  size  of  the  Tasseled  Cap  and  to  shift  it  in 
the  negative  yellow  direction  completely  off  its  haze- 
free  position  in  data  space.  This  distortion  can 
severely  limit  the  capability  to  extend  crop  sig- 
natures from  one  site  to  another. 

If  the  data  points  are  projected  first  onto  the  two- 
dimensional  brightness-greenness  plane,  the  prob- 
lem becomes  less  severe.  Even  then,  however,  there 
will  be  significant  differences  between  hazy  and 
haze-free  signatures.  The  proper  approach  is  to 
measure  the  amount  of  haze  present  and  adjust  the 
data  back  to  some  reference  condition. 

Fortunately,  the  measurement  of  the  amount  of 
haze  is  possible  by  measuring  the  average  shift  of  the 
data  in  the  negative  yellow  direction.  The  method  is 
to  measure  the  average  yellow-stuff  value  of  good 
pixels  corrected  for  Sun  zenith  angle  and  compare 
this  value  to  a reference  value.  The  estimate  is  made 
simultaneously  with  the  screening  process.  To  ac- 
tually correct  the  data,  an  atmospheric  model  which 
describes  the  effect  of  haze  is  needed.  The  entire  pro- 
cedure of  making  the  haze  measurement  and  apply- 
ing the  atmospheric  model  was  developed  by  Lam- 
beck  and  is  called  the  XSTAR  algorithm3  (ref.  9). 
The  procedures  for  applying  the  XSTAR  algorithm 
are  given  in  appendix  A. 


Solar  Zanlth  Anglo  Correction 

Solar  zenith  angle  effects  are  corrected  implicitly 
during  the  screening  process  in  order  to  correctly 
screen  out  bad  pixels  and  to  calculate  the  haze  diag- 
nostic for  the  XSTAR  alg-Aithm.  The  correction  is 
applied  to  data  during  the  process  of  haze  correction, 
so  that  the  XSTAR  algorithm  can  in  effect  operate 
on  a more  standardized  set  of  data.  The  form  of  the 
corrected  data  is 


where  0 is  the  Sun  zenith  angle,  0O  «•  39°,  X is  the 
data  vector,  and  X*  is  the  data  vector  corrected  for 
Sun  angle. 


Satellite  Calibration  BPeete 

The  authors  have  found  it  useful  to  incorporate 
data  from  both  Landsat-l  and  Landsat-2  passes, 
although  Landsat-2  was  the  primary  available 
satellite.  The  two  satellite  sensors  have  slightly 
different  calibrations,  which  would  hardly  be 
noticeable  except  when  machine  processing  is  u- 
tempted.  Hence,  it  was  necessary  to  find  a transfor- 
mation which  would  convert  the  Landsat-l  data  to  be 
compatible  with  the  Landsat-2  data.  The  procedure 
by  which  this  was  accomplished  is  detailed  in  appen- 
dix B.  Briefly,  it  consists  of  comparing  pairs  of  Land- 
sat-1  and  Landsat-2  observations  on  the  same  sample 
segments  on  successive  (9-day  separated)  passes. 

To  actually  perform  the  fit,  it  was  necessary  to  ac- 
count for  the  differing  haze  levels  and  differing  Sun 
zenith  angles  for  the  two  observations  and,  in  effect, 
to  assume  a nonlinear  model  for  each  of  the  satellites 
(although  a useful  linear  relationship  between  the 
two  was  found). 


Episodic  Events— Drought  Estlmstion 

Knowledge  of  the  structure  of  the  data  allows  one 
to  estimate  approximately  the  degree  of  green 
vegetative  development  over  an  area.  This  estimate 
has  been  used  to  monitor  the  status  of  drought  condi- 
tions in  the  U.S.  Great  Plains  (see  the  paper  by 
Thompson  and  Wehmanen  entitled  “Application  of 
Landsat  Digital  Data  for  Monitoring  Drought"). 


Analyst  Aids 

The  greenness  and  brightness  values  of  pixels 
have  been  used  to  create  analyst  aids,  assisting  the 
analysts  to  view  relevant  aspects  of  the  temporal 
growth  pattern  of  a crop  (see  the  paper  by  K.  Abot- 
teen  and  Bizzell  entitled  "The  Classification  and 
Mensuration  Subsystem"  and  the  paper  by  Heydorn 
entitled  “Classification  and  Mensuration  of  LACIE 
Segments”). 


3P  F.  Limbeck,  ** Implementation  of  ihe  XSTAR  Haze  Cor- 
rection Algorithm  and  Associated  Preprocessing  Steps  for  Land- 
sat  Data."  ERIM  Memo  IS-PFL-1272.  Mar  18. 1977. 
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FEATURE  EXTRACTION  PHILOSOPHIES 
EXAMINED 

Since  feature  extraction  is  so  largely  a Gestalt  pro- 
cess, the  features  extracted  naturally  vary  greatly 
from  individual  to  individual  and  from  one  group  of 
researchers  to  another.  Historically,  there  have  been 
many  different  attempts  at  feature  extraction  from 
Landsat  data.  Each  of  these  can  be  analyzed  in  terms 
of  the  Tasseled  Cap  model  structure  discussed  pre- 
viously, even  including  such  effects  as  the  solar 
zenith  angle  and  haze.  The  Tasseled  Cap  description 
is  particularly  appropriate  for  this  analysis  because 
the  Kauth-Thomas  transform  is  merely  a rotation  of 
the  data— a way  to  peer  in  at  the  data  structure  from 
some  unusual  directions.  The  euclidean  shape  of  the 
dr  ta  structure  (i.e.,  the  distances  between  various  ob- 
servable features  in  the  four-dimensional  Landsat 
data  space)  is  in  no  way  disturbed.  On  the  other 
hand,  once  the  structure  is  understood,  it  is  recog- 
nizable, even  under  some  nonlinear  transformations 
of  the  data.  In  the  following  discussion,  two  feature 
extraction  schemes,  the  band-7-to-band*S  ratio  and 
the  Delta  Classifier,  are  briefly  analyzed. 


Band  Ratio  Paaturaa 

Throughout  the  history  of  remote  sensing,  various 
ratio  schemes  have  been  proposed  and  used  with 
some  success.  From  the  physical  understanding  of 
the  Landsat  data,  what  can  be  said  about  these 
schemes?  Conceptually,  what  could  they  ac- 
complish? 

For  illustration,  consider  a simplified  two-band 
case,  say  bands  $ and  7.  Figure  9(a)  shows  the 
Tasseled  Cap  projected  onto  these  bands.  The  dotted 
lines  are  the  equiratio  contours  of  /?-$,  the  ratio  of 
band  7 to  band  5.  Thi  s,  if  only  RlS  is  retained,  the 
spectra)  description  of  the  scene  is  reduced  from  two 
dimensions  to  one  dimension. 

This  ratio  is  extremely  insensitive  to  changes  in 
direct  illumination  on  a scene,  as  illustrated  in  figure 
9(b).  An  increased  solar  zenith  angle  results  in  a 
nearly  proportional  decrease  in  both  bands  7 and  5. 
The  ratio  remains  constant  for  each  observation, 
and  the  dotted  lines  pass  through  the  same  set  of 
points  relative  to  the  overall  structure  of  the  Tasseled 
Cap, 

How  well  does  this  ratio  describe  the  crop 
development?  According  to  Kanemasu  (ref.  10),  the 


FIGURE  9.— Tbe  TmhM  Cap  projected  onto  MSS  band*  S and 
?.  (a)  Ratio  of  band  7 to  band  5 with  bigb  Sun  elevation.  (bl 
Ratio  of  band  7 to  band  5 with  Ion  Sun  elevation  (Im  illumina- 
tion). 


ratio  of  these  bands  is  ai  .creasing  function  of 
development  through  a major  portion  of  the 
development  cycle  and  is  quite  independent  of  solar 
illumination  angle.  (Data  arc  not  presented  for  the 
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period  of  time  after  full  green  development  is 
reached.)  Thus,  it  would  appear  that  not  much  infor- 
mation is  lost  by  retaining  only  the  ratio  of  these  two 
bands. 

Are  there  other  external  effects  to  which  the  ratio 
A75  would  be  positive?  The  answer  is  yes.  Figure  8 
shows  the  effect  of  ha2e  on  the  Tasseled  Cap.  As  the 
amount  of  haze  over  the  scene  increases,  contrast  is 
reduced  nearly  proportionally  in  all  bands  so  that 
again,  as  in  the  case  of  the  varying  solar  zenith  angle, 
the  Tasseled  Cap  becomes  smaller.  But,  at  the  same 
time,  the  haze  scatters  sunlight  directly  back  to  the 
observer,  adding  additional  signals  in  all  bands.  The 
result  is  that  the  Tasseled  Cap  shrinks  toward  a point 
(called  XSTAR)  not  located  at  the  origin  but  located 
in  the  general  neighborhood  of  cloud  signals.  Figure 
8 shows  this  effect.  The  dotted  lines  in  this  case  do 
not  remain  invariant  in  their  position  on  the 
Tasseled  Cap;  and,  in  fact,  the  ratio  /?75  is  greatly  in- 
fluenced by  the  amount  of  haze. 

Evidently,  the  esthetically  pleasing  procedute  is 
first  to  correct  the  data  for  the  effects  of  haze  and 
then  to  extract  a ratio  if  desirable  for  describing  crop 
development.  Notice  that  the  XSTAR  algorithm  (or 
any  haze-correction  algorithm  which  relies  in  part  on 
measuring  the  yellow  shift  of  the  data  structure) 
already  contains  a correction  for  the  Sun  zenith 
angle,  which  accounts  by  far  for  the  largest  part  of 
the  variation  in  illumination  on  the  crops. 

Why  might  it  be  desirable  to  take  a ratio  after  hav- 
ing already  applied  the  Sun-angle  and  haze  correc- 
tion? Because  one  might  believe  that  most  of  the 
crop  development  information  contained  in  the 
brightness-greenness  plane  is  in  fact  implicit  in  a 
ratio,  so  that  the  remaining  degrees  of  freedom  that 
are  discarded  contain  very  little  information.  This  is 
a conjecture  which  is  not  yet  adequately  tested.  Ac- 
cording to  Malila  (ref.  6f.  both  spring  wheat  and 
barley  are  well  described  by  the  (Kauth-Thomas 
transform)  greenness  up  to  the  time  of  heading;  but, 
after  needing,  brightness  changes  become  important 
to  the  description  and  are  in  fact  the  major  source  of 
possible  discrimination  between  spring  wheat  and 
barley. 


The  Delta  Classifier 

The  Delta  Classifier  extracts  a development 
feature  which  combines  aspects  of  both  greenness 
and  brightness  and  uses  it  in  a decision  logic  based  on 


several  acquisitions  at  different  times  during  the 
growing  season.  The  feature  extraction  occurs  in 
three  steps.  In  the  first  step: 

Bl  - Bl  * 32 
}\  A 

, Bl  - Bl  ♦ 32 
h a 

, B3  - BA  ♦ 32 
A A 


where  A - Bl  - 04  + 96. 

Notice  that  a fourth  feature  similarly  defined 
would  have  the  term  (Bl  - BA)  occurring  in  both 
numerator  and  denominator.  Thus,  the  quantities /t, 
/2,/3,  and  A do  not  contain  all  the  information  in  the 
original  four-dimensional  Landsat  signal  space. 
Notice  that  these  features  are  somewhat  independent 
of  a constant  scale  factor  correction,  such  as  the  co- 
sine of  the  solar  zenith  angle,  in  all  the  Landsat 
bands,  and  they  are  exactly  independent  of  an  addi- 
tive correction  applied  to  all  bands. 

In  the  Delta  Classifier,  the  quantity  A is  ignored 
and  the  features./,,  ,/2.  /3  enter  a second  step  of 
feature  extraction.  In  the  second  step  of  feature  ex- 
traction. the  features  are  plotted  on  triangular  graph 
paper.  In  this  process,  an  additional  degree  of 
freedom  is  lost.  The  two  new  features  are 


xdel  • (/,  /2)^f 

YDEL  » 

Figure  10(a)  shows  the  relationship  between  the 
two  feature  sets  extracted  in  steps  1 and  2.  Figure 
10(b)  shows  the  major  elements  of  the  Tasseled  Cap 
projected  onto  the  XDEL.YDEL  plane  of  the  Delta 
Classifier.  The  elements  shown  -are  the  origin,  the 
mean  of  soils,  the  line  of  soils,  the  mean  of  the  green 
arm,  and  the  point  XSTAR.  Also  shown  are  the 

points  32R,.  32R2  . 32R,.  and  32R4.  where  R, 

R4  are  the  unit  vectors  for  brightness,  greenness, 
yellowness,  and  non-such.  respectively. 
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flCUE  IB. — Th*  Drlu  U mlfler.  <a)  RrldMilp  between  the  two  feature  seta  extract'd  In  Kept  I and  2.  (b)  Malar  lementx  af  the 
Tanelcd  Cap  projected  onto  the  XDEL.YDEL  plane. 


Figure  1 1 shows  the  effect  of  the  two  major  exter- 
nal influences,  solar  illumination  angle  and  haze,  on 
the  Tasseled  Cap,  as  projected  onto  the  Delta 
Classifier  two-dimensional  plot.  The  effect  of  a co- 
sine ratio,  cos  (39°)/cox  (0)  — 0.66.  is  shown  by  the 
dashed  outline.  The  Tasseled  Cap  shrinks  toward  the 
origin.  The  effect  of  a haze  amount  (compared  to  the 
standard  haze  condition)  having  an  optical  depth  of 

0.42  to  0.55  micrometer  is  shown  as  a dash-dot  out- 
line. This  is  about  a;  large  an  amount  of  haze  as  one 
could  hope  to  correct  successfully  using  the  XSTAR 
algorithm.  It  has  the  effect  of  reducing  contrast  by 
approximately  33  percent,  with  the  Tasseled  Cap 
shrinking  toward  the  point  XSTAR,  as  discussed 
earlier. 

Finally,  in  the  third  step  of  feature  extraction,  for 
the  purpose  of  classifying  winter  wheat  multitem- 
porally  during  four  successive  phases  of  the  growing 
season,  only  the  XDEL  feature  is  used.  In  general, 
this  classification  procedure  is  based  on  the  assump- 
tion that,  for  identification  as  winter  wheat,  the  crop 
will  be  emerged  during  the  first  biowindow  (i.e., 
XDEL(I)  > 0);  will  have  significant  green  develop- 
ment during  the  second  or  third  biowindow  (i.e., 
XDEL(2)  or  XDEL<3)  > XDEL(l));  and  will  enter 
a stage  of  brightening  and  loss  of  greenness  during 
the  fourth  biowindow  (i.e.,  XDEL(4)  < 0).  (An  in- 
crease in  XDEL  corresponds  to  an  increase  in  green- 
ness or  a decrease  in  brightness.) 


Figure  12  shows  a typical  trajectory  of  wheat  dur- 
ing the  four  biowindows  that  passes  the  preceding 
tests.  The  following  statements  are  of  key  impor- 
tance in  understanding  the  operation  of  the  Delta 
Classifier  with  respect  to  the  growth  cycle  of  winter 
wheat. 

1 . During  the  long  first  biowindow,  which  encom- 
passes both  fall  and  spring  emergence,  the  solar  angle 
changes  significantly.  The  requirement  XDEL  > 0 is 
essentially  a ratio  test  simMar  to  the  band-7-to-band-5 
ratio,  as  can  be  seen  by  comparing  figure  12  with 
figure  9(a).  Thus,  the  criterion  line  is  substantially 
independent  of  the  angle  of  solar  illumination,  as  are 
the  points  near  it.  such  as  point  I in  figure  12. 

2.  The  effect  of  haze  on  point  I is  to  move  it 
toward  the  point  XSTAR,  which  means  very  little 
change  in  XDEL.  This  is  true  for  any  point  just 
slightly  above  XDEL  “ 0.  For  points  farther  above 
it.  the  efTect  of  haze  is  to  reduce  the  value  of  XDEL 
but  never  to  a value  smaller  than  zero. 

3.  During  the  second  and  third  biowindows,  the 
XDEL  values  of  winter  wheat  arc  normally  large 
and.  even  when  the  haze  level  is  substantial,  still 
easily  exceed  the  XDEL  of  point  I. 

4.  During  the  fourth  biowindow,  the  XDEL  value 
of  point  4 is  increased  by  the  effect  of  haze. 
However,  the  typical  “brightening**  of  the  wheat  at 
harvest  is  often  sufficient  to  overcome  haze  effects, 
and  point  4 still  appears  at  a negative  XDEL  value. 
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FIGURE  II. — Fifed  «T  hIu  UlwnlmttM  hI  Ium  mi  On 
ThhM  Cm  m project  *4  miio  Itw  XDEL.YDEL  plan*  of  dw 
Della  Clenlfler. 


FIGURE  II.— Typical  trajectory  of  wheal  forint  lb*  four 
Mewing***,  a*  projected  onto  lb*  XDEL.YDEL  plan*  of  lb* 
Della  Claaolfler. 


In  summary,  a unique  combination  of  events  en- 
ables the  Delta  Classifier  to  correct  Tor  both  illumina- 
tion and  haze  effects  at  critical  times  during  the 
wheat  growth  cycle,  while  Tailing  to  do  so  at  other 
times. 

SUMMARY  AND  RECOMMENDATION  i 

The  authors  have  summarized  their  knowledge  of 
the  spectral-temporal  structure  of  agricultural  scenes 
as  viewed  by  Landsat  and  have  shown  how  this 
knowledge  is  used  to  screen  data,  to  correct  for 
systematic  external  effects,  and  to  obtain  insight  into 
the  operation  of  various  feature  extraction 
algorithms.  With  respect  to  the  problem  of  extracting 
features  in  the  data  which  enhance  ability  to  view  the 
crop  development,  a number  of  investigation  areas 
need  to  be  pursued. 

1.  Optimal  feature  extraction  with  explicit  atten- 
tion to  system  noise  should  be  considered.  The 
Kauth-Thomas  transform  merely  routes  the  dau 
structure  so  it  can  be  viewed  in  different  ways.  The 
us.;  of  a single  linear  (greenness)  feature  in  classifica- 
tion seems  to  work  reasonably  well  in  some  cases, 
but  there  is  no  support  for  a contention  that  this  is  an 
optimal  feature.  Note  that  the  noise  in  Landsat  bands 
4 to  6 is  approximately  proportional  to  the  square 
root  of  the  signal  but  is  a consunt  in  band  7. 
Transformations  which  make  system  noise  an  in- 
variant function  of  feature  value  might  be  con- 
sidered. 


2.  Temporal-trajectory  features  need  to  be  tried  in 
which  each  feature  is  a measure  of  likeness  between 
the  pixel  and  one  or  the  other  crop  development  pro 
totypes;  e.g.,  likenesses  to  winter  wheat,  grassland, 
alfalfa,  and  corn.  Essentially,  these  likeness  features 
would  constitute  models  of  each  major  crop  or  con- 
fusion category  (see,  for  example,  reference  II). 

3.  These  features  (models)  should  be  derived  as 
conditional  functions  of  some  of  the  important  an- 
cillary conditions  which  could  be  observed,  such  as 
predicted  crop  yield  for  the  crop  in  question  or  esti- 
mated planting  date.  The  feature  definitions  could  be 
based  on  a combination  of  crop  modeling,  field 
measurements  dau.  and  Landsat  dau.  Landsat  data 
must  be  the  final  basis  on  which  the  features  are  es- 
ublished.  However.  Landsat  data  are  acquired  ir- 
regularly; to  create  a general  model  of  temporal 
development,  the  missing  observations,  in  effect, 
must  be  estimated.  To  accomplish  this  estimate, 
some  continuity  condition  which  can  be  derived 
from  Held  measurement  dau  is  required. 
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Appendix  A 

Coefficients  of  the  Transformations 


A number  of  transformations  and  data  correction 
steps  have  been  mentioned;  each  one  implies  certain 
coefficients  or  parameters.  In  this  appendix,  the  pro- 
cedures with  numerical  values  for  the  automatic 
detection  of  garbled  data,  clouds,  water,  and  cloud 
shadows;  for  the  standardization  of  Landsat  calibra- 
tion and  Sun-angle  correction;  and  for  implementing 
the  XSTAR  haze-correction  algorithm  are  sum- 
marized. 

The  procedures  documented  herein  were  . ele- 
mented at  the  Environmental  Research  Institute  of 
Michigan  (ERIM)  in  August  1977,  The  significant 
changes  from  previous  procedures  documented  in 
ERIM  Memorandum  IS-PFL-1272  (Mar.  18,  1977) 
are  as  follows. 

1.  Revision  of  the  calibration  adjustment  used  for 
Landsat-2  full-frame  data  from  computer-compatible 
tapes  (CCT’s)  produced  on  or  after  July  16,  1975 
(ref.  12)  to  fit  the  observed  correspondence  with 
Landsat-2  LACIE  segment  data  (ERIM  Memo  IS- 
DR-1867,  July  28,  1977) 

2.  Incorporation  of  the  improved  SCREEN  pro- 
cedure (refs.  9 and  13)  for  detecting  garbled  data, 
clouds,  snow,  cloud  shadows,  and  water  in  Landsat 
multispectral  scanner  data 

3.  Reversal  of  the  sign  of  the  XSTAR  haze 
parameter  y to  correspond  with  reference  9 

The  steps  for  implementing  the  procedures  are  as 
follows. 

Step  /.—Recalibrate  the  Landsat  data  as  required. 

Three  distinctly  different  calibrations  (counts  ver- 
sus radiance  values)  for  Landsat  data  currently  exist. 
These  calibrations  pertain  to  the  following  Landsat 
data  products. 

1.  Landsat-1  data  (LACIE  segment  and  full- 
frame  imagery) 

2.  Landsat-2  full-frame  data  from  CCT’s  pro- 
duced on  or  after  July  16, 1975 

3.  Landsat-2  full-frame  data  from  CCT’s  pro- 
duced before  July  16,  1975,  and  Landsal-2  LACIE 
segment  data 

The  XSTAR  haze-correction  algorithm  is 
specifically  adjusted  for  Landsat-2  LACIE  segment 
data.  Other  Landsat  data  can  be  recalibrated  to  simu- 
late this  same  calibration  by  employing  the  appropri- 
ate multiplicative  and  additive  transformation 
selected  from  the  following. 


Let  ,v(  represent  the  Landsat  signal  in  channel  i.  If 
recalibration  is  necessary,  for  each  channel,  set 

x{  = AfXj  + B( 


Then,  set 


Step  la.— For  Landsat-1  data  (LACIE  segment  or 
full  frame  (app.  B)),  use 


B = 


(-5.79 
1.19 
-2.91 
3.01 


Step  lb.—  For  Landsat-2  full-frame  data  from 
CCT’s  produced  on  or  after  July  16, 1975  (ref.  12  and 
ERIM  Memo  IS-DR-1867),  use 


Step  h. — For  Landsat-2  LACIE  segment  data  or 
for  Landsat-2  full-frame  data  from  CCT’s  produced 
before  July  16,  1975,  no  recalibration  is  necessary. 

Step  2— Perform  Sun-angle  correction. 

Let  .v,  represent  the  Landsat  signal  in  channel  /, 
following  step  1.  Let  9 represent  the  solar  zenith 
angle  for  the  data  acquisition.  Then,  for  each  channel 
of  the  acquisition,  set 


COS  dg 
COS  d X‘ 


Then,  set 

xi  = xi 
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The  data  will  now  appear  to  have  been  acquired  for 
the  solar  zenith  angle  *o-  The  parameters  of  XSTAR 
have  been  adjusted  for  90  — 39°. 

Step  3.— identify  garbled  data,  clouds,  water,  and 
cloud  shadows. 

Let  R be  a rotation  matrix  (i.e.,  RT  ■»  ft-1, 
det  (/?)«■  I ) defined  by 


Step  3b.— A pixel  is  labeled  as  cloud  [f  not  labeled 
garbled  and  if  both  the  following  conditions  apply. 

*t  > *Ctnax  (zCmex  “ 100  ) 

H + *l/10  < *Cmti  " “7'5) 


(0.33231  -0.28317  -0.89952  -0.01594  \ 

.60316  -.66006  .42830  .13068  \ 

.67581  .S7735  .07592  -.45187  I 

.26278  .38833  -.04080  .88232/ 


The  columns  of  R are  unit  vectors  characterizing  the 
axes  of  a rotated  Landsat-2  data  space  akin  to  the 
Tasseted  Cap  data  space  but  particularly  oriented  to 
suit  the  XSTAR  algorithm  (ref.  9). 

Let  .v,  represent  the  Landsat  signal  in  channel  i, 
following  step  2.  Let  i and ,/  correspond  to  rows  and 
columns  of  the  R matrix.  For  each  pixel  of  an 
acquisition,  and  for  each  value  of  j (J  «£  j 4), 
calculate 


7 


4 


£v, 

M 


Step  3u. — A pixel  is  labeled  as  garbled  data  if  any 
one  of  the  following  conditions  applies. 


, > - 
“4  "4 max 


(Zimax  = 


*4  < 


(Z4w/b 


U) 

(23m«  * 4) 

: i +0,ii750::i  < 'j „iin  (zimin~  t4) 


:3  0.09375.-,  > 


+ z 


/,0  < z2m/n 
t/»*  > "5  max 


(Z2  min 


(*' 


S max”  *-,v') 


t>max 


20) 
1 56  j 
8) 


Step  3c.— A pixel  is  labeled  as  diffuse  cloud  (dense 
ha2o)  if  not  labeled  garbled  or  cloud  and  if  both  the 
following  conditions  apply. 

■'  zUmwc  (zHmax  = 

23  * fl  < zHmtn  (zHmin  * ~*'25) 

Sufficiently  dense  coverings  of  snow  tend  to  be 
placed  in  the  cloud  or  diffuse  cloud  category. 

Step  3d.— A pixel  is  labeled  as  wafer  if  not  labeled 


garbled,  cloud,  or  diffuse  cloud  and  if  all  the  follow' 
ing  conditions  apply. 

:t  < 21»>1 

(?k'l  = 75  ^ 

z2  + 2t/»6  < zH'2l 

(2W21  * ' 

*4  < 2H>4 

( 2IV4  = *-S) 

Z2  + 24  < 2W24 

t2U*24  = 4l55 

Z2  + r,/2  + z%  + 5 X 

r4  < 2W2134(2H>2134  = 1 

If  a pixel  is  labeled  water  on  the  basis  of  satisfying 
the  preceding  conditions,  a subcategory  cioud  shadow 
over  water  may  be  identified  if,  in  addition,  the 
following  is  applicable. 


0.4  X 


rt  > :ws 


{zws=  12.2) 


The  subcategory  cloud  shadow  over  water  is  some, 
times  an  artifact  under  the  category  cloud  shadow 
(usually  caused  by  striping  effects  in  the  data). 

Step  3e. — A pixel  is  labeled  as  cloud  shadow  if  not 
labeled  garbled,  cloud,  diffuse  cloud,  or  water  and  if 
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both  the  following  conditions  apply. 

- 0.4  X i,  - 06  X *3  - 06  X *4  > (*5I|}4  . I*) 

*,  " 04  X tj  < *J,J  t *54  2 ■ 

Step  4. — Compute  XSTAR  scene  diagnostic  signal 
value. 

Let  represent  a scene  diagnostic  Landsat  signal 
value  in  channel  / for  a single  data  acquisition.  Set  x( 
equal  to  the  average  signal  value  of  all  pixels  not 
labeled  as  garbled,  cloud,  diffuse  cloud,  water,  or 
cloud  shadow.  (If  especially  subtle  effects,  such  as 
nonuniform  haze,  are  present  in  the  scene,  a bias  will 
be  introduced  into  the  XSTAR  haze  diagnostic  pro- 
cedure that  will  lead  to  overcorrection  or  undercor- 
rection of  the  data.  Some  types  of  nonagricultural 
data-— as  yet  not  studied — may  also  introduce  a bias.) 

Step  5.— Determine  amount  of  change  from 
reference  haze  condition  (ref.  9). 

Let  $,be  the  Landsat  scene  diagnostic  signal  value 
in  channel  /.  Let  a,  and  x,'  be  coefficients  of  the 
XSTAR  algorithm  for  channel  i.  Let  y represent  the 
change  in  optical  thickness  from  the  reference  condi- 
tion. Let  Y*  be  the  yellow  value  characterizing  the 
reference  haze  condition.  Numerical  values  are  as 
follows. 


Using  and  f*,  calculate  the  following. 
4 

« * Ev(Ji  v)*» 

»=i 


v)*« 


c = 


>'* 


y = 


For  extremely  hazy  conditions,  the  quantity  under 
the  radical  in  the  equation  for  y can  be  negative.  In 
such  cases,  set  the  square  root  equal  to  zero;  i.e.,  y “ 
-(A/a).  The  solution  given  for  y is  obtained  (ref.  9) 
by  approximating  eat>  in  the  XSTAR  correction  by  a 
quadratic  expression  and  then  solving  for  y such  that 
the  yellow  value  of  xt  after  the  XSTAR  correction 
will  be  Y*. 

As  an  alternative,  in  the  event  that  the  quadratic 
expression  for  eaiy  is  inaccurate  because  of  an  ex- 
treme change  from  the  reference  haze  condition,  an 
iterative  solution  for  y is  possible.  The  need  for  such 
a solution  will  be  indicated  by  the  presence  of  a nega- 
tive quantity  under  the  radical  or  by  the  obtaining  of 
|y|  > O.S.  For  the  iterative  solution,  set 


^ (J, - 

s,  - s; 

Then,  repeat  the  solution  for  a,  A,  c,  and  y using  the 
new  value  of  Next,  increase  the  new  value  of  y by 
y'.  If  the  quantity  under  the  radical  is  again  negative, 
or  if  |y  — y'|  > 0.5,  the  procedure  might  be  repeated 
once  more.  If  the  iterations  do  not  converge,  discard 
the  data  acquisition  as  unusable  or  seek  other 
remedies.  Current  experience  indicates  that  the  need 
for  iterating  should  be  rare. 

Step  6.— Apply  XSTAR  correction. 

Steps  1 to  5 can  be  accomplished  with  a single  pass 
through  the  data.  A second  pass  is  required  for  step 
6.  Given  a successful  solution  foi  y in  step  5,  XSTAR 
may  then  be  applied  to  correct  each  pixel  of  the  ac- 
quisition as  follows. 

Let  .v,  represent  the  Landsat  signal  in  channel  /, 
following  step  2.  Then,  for  each  pixel,  calculate  the 
following. 


» V 

xt  - e 1 


(*>  xi  ) 


+ *. 


xt  = x ; 


This  correction  may  be  applied  to  all  pixels  within 
the  acquisition.  (However,  garbled  data,  clouds,  or 
snow  may  convert  to  signal  levels  outside  the  normal 
dynamic  range.) 

To  minimize  roundoff  or  truncation  errors,  the 
data  analyst  should  retain  inlet  mediate  results  from 
steps  1 to  6 in  floating-point  rather  than  integer  for- 
mal. 
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Appendix  B 

A Transformation  to  Make  Landsat-1  and 
Landsat-2  MSS  Data  Compatible 


INTRODUCTION  AND  DATA  BASE 

It  was  desired  to  estimate  coefficients  of  a 
transformation  which  would  convert  Landsat-1  data 
to  Landsat-2  data.  In  order  to  make  the  estimate,  it 
was  desirable  to  use  the  identical  scene  observed 
under  identical  conditions  by  both  satellites.  The 
nearest  procedure  in  practice  is  to  observe  pairs  of 
data  over  the  same  scenes  separated  by  9 days.  An  in- 
itial selection  of  about  25  such  pairs  was  made,  but 
natural  attrition  reduced  this  ultimately  to  8 pairs.  As 
will  be  shown,  this  is  a quite  minimal  set  and  the  fit- 
ting procedure  should  be  repeated  with  a larger 
sample. 

Pairs  were  eliminated  from  consideration  if  either 
member  contained  “patchiness"  of  cloud  or  haze,  as 
evidenced  in  the  histogram  output  of  program 
SCREEN,  or  if  the  haze  levels  of  the  pair  were 
thought  to  be  markedly  different  from  each  other  as 
evidenced  by  the  yellow  level  of  the  mean  of  soils 
calculated  by  program  SCREEN.  The  relative  yellow 
level  for  each  pass  was  judged  against  all  non- 
“patchy"  passes  for  the  same  satellite  by  plotting  a 
separate  yellow-level  histogram  over  those  passes  for 
each  satellite.  The  finally  accepted  pass  pairs  are 
listed  in  table  B-l.  Of  the  eight  cases,  four  are  cases  in 
which  the  Landsat-2  pass  precedes  the  Landsat-1 
pass. 


T.AHU  B-L — Pass  Pairs  Used  in  Landsat-1  to 
Landsat-2  hitting  Procedure 


C list' 

Scgmenr 

1 ,mtl\ai-l 
(l  li  Jan 

I.UIIllsUh' 

<i.:i  </«/«• 

Sun 

zenith 

angle. 

«,.</<%• 

Sun 

:cniih 

angle. 

,leg 

i 

lO.w 

155 

164 

37 

32 

2 

11*30 

101 

92 

47 

46 

3 

1154 

153 

162 

37 

31 

4 

1855 

82 

73 

52 

52 

5 

1855 

“82 

91 

52 

46 

t> 

1857 

101 

92 

46 

44 

7 

1861 

101 

92 

46 

45 

8 

1882 

153 

162 

37 

31 

Name  pau>  used  twice. 


The  data  used  for  fitting  consisted  of  the  four- 
band  “mean  of  soils"  and  the  four-band  “mean  of  the 
green  arm,"  both  outputs  of  program  SCREEN. 
These  data  and  their  averages  are  shown  separately 
for  each  Landsat  band  in  table  B-II. 


Tahi.I  B-ll. — Diagnostic  Data  Used  in  Fitting 


Hand 

< d\c 

Soil  mean  ( 1 Tt 

linen  aim 
mean 

<»r> 

4 irraKc" 

l.l 

t: 

i.i 

t.: 

l.l 

/..* 

4 

1 

41.5 

41.7 

25.1 

24.4 

33.3 

33.05 

2 

40  2 

36.8 

28.0 

22.0 

34.1 

29.4 

3 

36.1 

36.3 

26.0 

23.6 

31.05 

29.95 

4 

34.2 

30.3 

26.9 

22.1 

30.55 

26.2 

5 

34.2 

36.3 

269 

26.3 

30.55 

31.3 

6 

39.7 

37.6 

28.7 

25.0 

34.2 

31.3 

7 

43.6 

41.2 

35.0 

32.0 

393 

36.6 

8 

34.9 

35.1 

25.6 

24.6 

30.25 

29.85 

5 

1 

40.3 

47.1 

16.9 

25.4 

30.1 

36.25 

2 

396 

41.7 

24.3 

23.3 

31.95 

32.50 

3 

31.3 

38.0 

166 

20.3 

23.95 

29.15 

4 

33.5 

34.3 

25.4 

258 

29.45 

3005 

5 

33.5 

41.9 

25.4 

29.7 

29.45 

35.80 

6 

40.7 

44.8 

25.2 

28.8 

32.95 

36.80 

7 

46.3 

49.7 

368 

40.7 

41.55 

45.20 

8 

29.2 

36.4 

16.2 

206 

22.70 

28.50 

6 

1 

40.5 

49.5 

41.5 

50.5 

33.30 

50.0 

2 

38.8 

41.7 

40.7 

40.5 

39.75 

41.1 

3 

32.8 

40.5 

46.3 

54.6 

39.55 

47.55 

4 

32.3 

34.3 

32.5 

33.9 

32.4 

34.1 

5 

32  3 

42.5 

32.5 

45.8 

324 

44.15 

6 

388 

45.1 

44  3 

46  1 

41.55 

45.6 

7 

445 

500 

406 

44  3 

42  55 

47.15 

8 

34.3 

41.0 

43.5 

59.6 

38.9 

50.30 

7 

1 

17.1 

20.5 

22.0 

25.3 

19.55 

22.90 

2 

17.0 

17.5 

21.6 

20.1 

1930 

18.80 

3 

140 

16.1 

25.8 

26.7 

199 

21.40 

4 

14  5 

15.2 

16.5 

16.6 

155 

15  9 

5 

14  5 

18.0 

16.5 

223 

15  5 

20.15 

6 

169 

18  9 

24.2 

22.7 

20.55 

20.80 

7 

19  8 

209 

19.1 

19.3 

19.45 

20.1 

8 

14.8 

16.6 

231 

28.6 

18.95 

22.6 

*A«mc  - o i * wti/:. 
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MODELS 


TABU  B-ltL—Racn  Mean  Stunned  Error 
Three  Models 


Several  different  models  were  used  for  fitting.  In 
general,  the  models  are  of  the  form 


s ^ S 

where  x2  is  the  Landsat-2  signal;  *|  is  the  Landsat-1 
signal;  f»,  and  02  are  the  Sun  zenith  angles  of  Land- 
sat-1 and  Landsat-2  cases,  respectively;  a is  a set  of 
parameters  which  must  be  estimated;  and  X is  an 
error. 

Three  specific  models  were  tried,  as  follows, 
a.  Ratio  model— assumes  for  each  band 


Bund 

Mikkt  ,1 

Mat ieib 

Stuk), 

4 

1.11 

i m 

n ft) 
rf.dt 

S 

2.06 

».» 

IJ6 

6 

2.99 

2.24 

1.66 

1 

4.78 

339 

120 

In  general,  one  can  write 

2 <=  Oq  + fijJt  + O^y 

and  identify 


x 


2 


COS  0 2 
A cos  0J 


+ X 


i.e.,  that  there  is  no  difference  in  signal  offset  be- 
tween the  two  satellites  and  that  the  .radiance 
returned  from  the  scene  is  an  inverse  function  of  the 
cosine  of  the  Sun  zenith  angle, 
b.  Offset  model— assumes 


%sB2 


z * * 


2 


al 


* A 


co$02 

COS0J  ' 1 


a2  * -ABX 


COS  6j 


c.  Three-parameter  offset  model — assumes 


x2  = A2Ax 


cos  9 2 

1 — 3^*1  + B , 
cos  0(  • 2 


, cos  0, 

A.  A,1  — -^B. 

2 1 COS0,  1 


In  order  to  minimize  the  noise  of  observations  of 
the  soil  and  green  arm  points,  the  authors  used  their 
averages  from  table  B-1I.  Thus,  x,  is  the  average  LI  in 
table  B-U.  Using  these  tabulated  values  and  regress- 
ing r on  x and  y gives  the  results  shown  in  table  B-IV. 

Interest  generally  will  be  in  converting  Landsat-1 
data  to  resemble  Landsat-2  data  at  the  same  solar 
zenith  angle.  Therefore,  the  model  will  simplify  to 
the  form 


where  At  is  the  responsivity  of  Landsat-1,  <42  is  the 
responsivity  of  Landsat-2,  Bx  is  the  offset  for  Land- 
sat-1, and  is  the  offset  for  Landsat-2.  Thus, 

cos  0,  cos  0. 

%*  1—  A — * u J.  n A — I) 


Model  “a”  requires  a fit  to  one  parameter  per 
band;  model  “b"  requires  two  parameters  per  band; 
and  model  "c”  requires  three  parameters  per  band. 
The  residual  error  per  band  after  fitting  is  shown  in 
table  B-Ill.  Model  “c”  is  considerably  the  best  fit  for 
band  7 and  is  slightly  better  for  the  other  bands. 


x2  * Axx  + B 

where  fl—  JJ*  — ABX\B is  also  given  in  table  B-IV. 


Tabu:  B-IV. — Regression  Coefficients  for 
Three-Parameter  Model  c 


Band 

*0  “ ®2 

«l  - A 

m 

M 

1 

1 

> 

w 

»l 

B 

4 

-19.48 

1.04 

13.69 

-1316 

-5.79 

5 

-24.99 

1.00 

26.18 

-26.18 

1.19 

6 

-74.70 

1.09 

71.79 

-65.86 

-2  91 

7 

-21.53 

82 

24.54 

-29.93 

3.01 
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Compensation  for  Atmospheric  Effects 
In  Landsat  Data 

P.  F.  Lambeckfi  ondJ.  F,  Potter b 


INTRODUCTION 

It  is  well  known  that  variations  in  Sun  angle  and  in 
atmospheric  aerosol  and  water-vapor  levels  change 
the  spectral  signatures  collected  by  multispectral 
scanners  (refs.  1 and  2).  It  has  also  been  shown  that 
the  background  reflectance  changes  signatures 
(ref.  3).  These  changes  have  a deleterious  effect  on 
classification  accuracy.  Hence,  even  before  the  begin- 
ning of  LACIE,  the  remote-sensing  community  pur- 
sued the  development  of  preprocessing  techniques 
for  removing  or  reducing  the  variations  in 
multispectral  data  caused  by  such  changes.  In  the  ini- 
tial design  of  LACIE  and  throughout  its  operation,  it 
was  anticipated  that  some  of  these  techniques  would 
be  incorporated  into  LACIE  procedures  once  they 
had  been  demonstrated  to  work  within  the  necessary 
constraints  of  the  experiment  (refs.  4 to  6).  Two  of 
these  techniques  (Sun-angle  correction  and  mean- 
level  adjustment1)  were  actually  tried  as  components 
of  alternative  LACIE  systems,  but  the  results  were 
not  satisfactory.  This  failure  could  be  attributed  to  in- 
herent limitations  in  these  techniques  and  to  the 
difficulty  in  identifying  sufficiently  similar  training 
and  classification  areas.  This  latter  problem  (parti- 
tioning, stratification,  and  sampling  strategy  for 
training)  is  discussed  in  a separate  symposium  paper 
on  signature  extension  (see  the  paper  by  Kauth  and 
Richardson  entitled  “Signature  Extension  Methods 
in  Crop  Area  Estimation”).  The  following  discussion 
documents  some  of  the  progress  of  the  supporting 
research  community  and  support  contractors  at  the 


Environmental  Research  Institute  of  Michigan,  Ann  Arbor, 
Michigan. 

^Lockheed  Electronics  Company,  Houston,  Texas. 

1 Mean-level  adjustment  is  a technique  for  normalizing 
LACIE  Landsat  data  so  that  the  mean  values  for  different  seg- 
ments are  equal. 


NASA  Johnson  Space  Center  in  developing  preproc- 
essing algorithms  to  support  LACIE. 

Most  of  the  research  efforts  in  developing 
preprocessing  techniques  to  support  LACIE  were 
based  on  the  assumption  that  changing  observation 
conditions  cause  multiplicative  and  additive  changes 
in  each  multispectral  data  channel  (ref.  7).  Conse- 
quently, two  major  options  were  available:  (1)  to 
develop  data  transformations  (e.g.,  ratioing)  which 
would  tend  to  cancel  out  these  effects  and  (2)  to 
develop  methods  for  estimating  the  appropriate 
multiplicative  and  additive  factors  and  then  to  apply 
these  directly  to  the  data.  For  the  most  part,  the  tatter 
option  was  taken.  For  Landsat  data,  four  multiplica- 
tive and  four  additive  factors  needed  to  be  deter- 
mined. 

Initial  attempts  to  estimate  the  eight  correction 
factors  for  Landsat  data  did  not  rely  on  any  prior 
knowledge  of  how  the  multiplicative  and  additive 
factors  might  be  interrelated  but,  instead,  relied 
solely  on  statistical  characteristics  of  the  distribu- 
tions of  data  to  be  preprocessed.  This  approach  led  to 
the  development  of  cluster-matching  algorithms 
(refs.  7 to  10)  and  distribution-matching  algorithms 
(refs.  11  and  12)  which  attempted  to  extract  signifi- 
cant statistical  measures  from  appropriate  subsets  of 
the  data  distributions.  A number  of  these  algorithms 
were  tested  by  Lockheed  Electronics  Company 
(LEC),  and  the  results  are  given  in  reference  13  and 
in  the  symposium  paper  by  Minter  entitled 
“Methods  of  Extending  Crop  Signatures  From  One 
Area  to  Another.”  These  approaches  were  found  to 
produce  unstable  results  at  times  and,  therefore,  they 
were  considered  unsuitable  for  use  in  LACIE. 

Efforts  were  also  undertaken  to  develop  preproc- 
essing algorithms  using  mathematical  models  to 
define  interrelations  between  the  required 
multiplicative  and  additive  correction  factors  such 
that  just  a few  statistical  characteristics  of  a Landsat 
data  distribution  would  be  sufficient  to  drive  the 
mathematical  model  and  to  calculate  the  preproc- 
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easing  corrections.  The  two  most  significant  of  these 
algorithms  are  the  Atmospheric  Correction 
(ATCOR)  computer  program  developed  by  LEC 
(ref.  14)  and  the  XSTAR  haze  correction  algorithm 
developed  at  the  Environmental  Research  Institute 
of  Michigan  (ERIM)  (ref.  IS).2  These  algorithms  are 
discussed  in  the  second  and  third  sections  of  this 
paper,  respectively,  and  both  are  discussed  further  in 
the  fourth  section. 


ATCOR:  AN  ALGORITHM  TO  CORRECT 
LAND8AT  OATA  FOR  THE  EFFECTS  OF 
HAZE,  SUN  ANGLE,  AND  BACKGROUND 
REFLECTANCE 


Introduction 

The  radiance  measured  by  the  Landsat 
multispectral  scanner  (MSS)  in  a given  channel  /, 
where  / — 1, 2, 3, 4,  is  determined  primarily  by  four 
quantities. 

1.  The  reflectance  p7of  the  target  (i.e.,  the  ele- 
ment of  the  Earth’s  surface  in  the  field  of  view)  in 
channel  / (This  quantity  is  actually  a function  of  the 
wavelength  X but  is  assumed  to  be  constant  over  the 
bandwidth  of  channel  /.) 

2.  The  solar  zenith  angle  0O 

3.  The  haze  level  r H in  the  atmosphere 

4.  The  average  reflectance  p } of  the  adjacent  areas 
of  the  Earth's  surface  outside  the  field  of  view, 
assumed  to  be  constant  over  the  bandwidth  of  chan- 
nel / 

In  this  paper,  the  haze  level  rH  is  defined  as  the 
haze  optical  depth  at  wavelength  0.S  micrometer. 
(See  reference  16  for  discussion  of  optical  depth  and 
other  concepts  from  radiative  transfer  theory.)  The 
haze  optical  depth  at  other  wavelengths  X is  denoted 
by  T/y(X).  Normally,  in  the  analysis  of  Landsat  data, 
one  wishes  to  classify  certain  objects  on  the  Earth's 
surface  on  the  basis  of  their  reflectance  p/.  These  ob- 
jects may  be  in  the  same  Landsat  image  or  in  several 
different  images  separated  in  space  and  time.  Varia- 
tions in  0O,  tw>  and  0)  within  a xene  or  from  one 
scene  to  another  change  the  data  and  therefore 
reduce  classification  accuracy. 


2P.  F.  lambeck,  “Revised  Implementation  of  the  XSTAR 
Haze  Correction  Algorithm  and  Associated  Preprocessing  Steps 
for  Landsat  Data."  ERIM  Memo  IS-PFL-I9I6.  Nov.  1977. 


The  ATCOR  algorithm  is  a method  for  simulating 
the  effects  of  such  variations  and  correcting  for 
them.  Simulation  end  correction  are  really  the  same 
process  since  correction  consists  of  simulating  the 
MSS  response  for  values  of  the  Sun  angle,  haze  level, 
and  background  reflectance  that  are  different  from 
the  actual  values.  To  simulate  the  effect  of  changes  in 
0O,  th,  and  £/,  one  must  compute  the  MSS  response 
as  a function  of  these  variables  and  of  p/. 

An  atmospheric  model  was  J?veloped  and  the 
Van  de  Hulst  adding  method  (see  the  appendix)  was 
used  to  compute  the  radiances  at  the  MSS  for  a range 
of  values  of  ph  0<j , t h>  and  Ff.  This  computation  was 
done  for  all  wavelengths  in  the  MSS  bands  in  steps  of 
0.01  micrometer,  and  the  resulting  radiances  corre- 
sponding to  each  band  were  then  multiplied  by  the 
MSS  response  function  and  integrated  over  wave- 
length to  obtain  the  instrument  response  for  that 
band.  It  was  found  that  the  Landsat  gray-scale  levels 
L/  could  be  written  as 

*7  “ pl  + Bt{pne0'Tn) 


where  /4;and  S/are  coefficients  that  are  computed 
and  tabulated  for  a full  range  of  values  for  £},  0O,  and 
rH.  Using  this  table,  it  is  a simple  matter  to  deter- 
mine the  effect  on  the  Landsat  data  (i.e.,  L /)  of  a 
change  in  any  or  all  of  these  parameters. 

These  results  allow  one  to  make  corrections  for 
changes  in  0O,  rH,ot  0}if  values  for  these  quantities 
are  known  for  the  segments  to  be  corrected. 
Generally,  0O  is  known  but  r^and  jSJ  are  not  known. 
However,  if  rH  is  known,  can  be  calculated  using 
the  table  described  in  the  preceding  paragraph.  The 
ATCOR  computer  program  estimates  rH  using  the 
method  described  later  in  this  section,  computes  p~h 
and  interpolates  in  the  table  of  At  (ph9 0,rH)  and 
Bf  (pheQ,TH)  to  find  the  correction  coefficients  to 
make  the  desired  correction. 


Th*  Atmospheric  Model 

The  atmospheric  model  includes  scattering  by  the 
molecular  atmosphere  and  by  haze.  A factor  which 
may  be  very  important  is  scattering  by  cirrus  clouds 
(ref.  17).  This  parameter  could  also  be  included  in 
the  model.  However,  the  method  used  to  determine 
the  level  of  "haze”  in  the  atmosphere  (see  the  section 
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on  the  ATCOR  program)  in  fact  estimates  the  total 
effect  of  all  aerosols  in  the  atmosphere  and  cannot 
distinguish  between  haze  and  cirrus  clouds. 
Therefore,  it  did  not  seem  worth  while  to  model  the 
effect  of  cirrus  clouds  separately.  ATCOR  partly  cor- 
rects for  the  effects  of  cirrus  clouds  since  they  con- 
tribute to  the  “haze"  level  th.  However,  because  the 
model  assumes  this  contribution  is  from  haze  parti- 
cles in  the  lower  atmosphere,  the  correction  is  less 
than  optimal, 

It  is  assumed  that  the  atmosphere  consists  of  two 
homogeneous  layers:  a Rayleigh  scattering  molecular 
layer  on  top  and  a Mie  scattering  haze  layer  next  to 
the  Earth's  surface.  This  two-layer  model  is  expected 
to  be  a good  approximation  for  the  atmosphere  since 
most  of  the  haze  is  in  the  lower  1 kilometer  of  the 
Earth’s  atmosphere,  whereas  only  about  1 1 percent 
of  the  molecular  atmosphere  is  in  this  region.  The 
two-layer  model  greatly  simplifies  the  calculations. 
Water-vapor  and  other  gaseous  absorption  is 
neglected,  although  it  can  be  important  in  channel  4 
of  the  Landsat  data  (ref.  2). 

To  define  the  atmospheric  model,  one  must 
define  the  scattering  diagrams  (i.e.,  phase  functions; 
ref.  16)  and  the  optical  depths  for  the  two  layers. 
These  quantities  completely  define  the  scattering 
properties  of  the  layers.  They  are  well  known  for  the 
Rayleigh  case  (ref.  16)  and  will  not  be  discussed  in 
detail  here.  For  the  haze  layer,  the  scattering 
diagrams  and  optical  depths  were  calculated  from  the 
Mie  theory  using  a haze  model  by  Reeser  (ref.  18). 
The  model  is  intended  to  represent  a continental- 
type  haze  and  assumes  spherical  particles  with  a size 
distribution  given  by 


f(r)  » 90  001  mictomeicr  < t < 0.1  micrometer 

90 

« 0.1  micrometer  < r < 10.0  micrometers 

ioV 


where  r is  the  parUcle  radius.  This  distribution  cor- 
responds to  100  particles/cm3.  The  real  part  of  the  in- 
dex of  refraction  varied  from  1.54  to  1.56  in  the 
wavelength  interval  of  interest,  0.4  micrometer « X 
< U micrometers.  The  imaginary  part  of  the  index 
was  taken  to  be  zero  since  absorption  is  neglected. 
Scattering  diagrams  for  this  model  were  computed 
for  several  wavelengths;  the  one  for  X - 0.8 
micrometer  is  shown  in  figure  1.  The  scattering 
diagram  changes  only  a small  amount  with 
wavelength;  therefore,  the  one  shown  in  figure  1 was 


FIGURE  I.— Scattering  diagram  far  bam  at  wavelength  0.8 
micrometer. 


used  for  all  wavelengths.  This  procedure  considera- 
bly reduces  the  computational  effort  involved. 

The  calculations  described  in  this  paper  were 
made  for  haze  levels  of  0.0,  0.424,  and  0.848.  The 
variation  of  r^(X)  with  wavelength  for  the  cases  rH 
■»  0.424  and  rH  — 0.848  are  shown  in  figure  2.  The 
variation  with  X of  the  Rayleigh  optical  depth  rR(k) 
is  also  shown  in  figure  2. 


Radiance  et  the  Sensor 

To  compute  the  MSS  response  for  various  values 
of  P/,  *o>  7 H'  and  ona  first  calculates  the  corre- 

sponding radiance  at  the  sensor.  The  method  for 
doing  this  is  rather  complicated  and  is  described  in 
the  appendix.  There,  it  is  shown  (eq.  (72))  that  the 
radiance  at  the  MSS  can  be  written  in  the  form 


JV(/M.H0,X)  ■ H0F(\)  a(p,n0.*)p(\)  ♦ fc(p.fi0.X)j  (|) 


FIGURE  Z.— Rayleigh  and  hue  optical  depth*. 
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where  mo  - cos  0O.  Here,  the  dependence  on  the 
wavelength  k is  explicitly  indicated,  but  the  depen* 
dence  on  the  haze  level  rHi*  not.  The  quantity  F[k) 
is  the  incident  solar  radiance  at  the  top  of  the  at* 
mosphere  divided  by  ir.  For  each  of  the  3 values  of 
r^,  the  coefficients  a(p&0,k)  and  Kp^k)  were 
computed  for  71  values  of  X (from  0.4  to  1.1 
micrometers  in  units  of  0.01  micrometer),  2 5 values 
of  mo  (24  gauss  points  plus  mo  ■ 10),  and  SO  values 
of  fi  (0.0  to  0.49  in  units  of  0.01). 


Th*  LsndsstMSI  Data 

The  Landsst  MSS  data  (i.e.,  the  MSS  gray*scale 
levels  L/)  are  given  by 

L;  ■ *,N,  + (2) 


where  a,  and  0/are  constants  given  in  uble  I and  St 
is  the  equivalent  spectrally  flat  radiance  defined  by 


fN(p.P'HQ,\)sI(k)dk 

A(X)dX 


(2) 


Thus. 


*7  - -4/(?/.M0)P/  + */(**,) 

(7) 

where 

0)  " °w(S5>o) 

(8) 

B{  pT/.P0)  " albli^l,fio)  + 

(9) 

To  simplify  the  notation  in  the  preceding  analysis, 
the  parameter  rH  was  not  explicitly  indicated. 
However,  in  the  rest  of  this  paper,  it  will  be  indicated 
explicitly  for  the  A,  and  B,  coefficients;  that  is. 


ai$'Wh)  “ Ai(h'»o)  <10> 

bi{?I'H'th)  " */p/A>)  (ID 


Here,  S/X)  is  the  response  function  for  band  / of  the 
MSS.  In  principle,  /rand  p are  functions  of  X.  If  one 
assumes  they  are  constant  and  equal  to  pt  and  p, 
across  a given  band,  then 


A complete  set  of  A^p^j^)  and  B^P/jt^j  H) 
was  computed  using  equations  (5),  (6),  (8),  and  (9) 
for  the  range  of  values  given  previously  for  p j,  m0, 
and  rH.  Also,  a complete  set  of  the  coefficients  Q 
given  by 


Ni  0 ai(*i'N )f>i  + bi{er»o)  (4)  c/p/^o-T//)  " AifF/#o '7h)*i  + Bt(7/*o *7/)  (,2) 


where 


was  computed.  These  were  required  for  the  ATCOR 
program,  described  in  a following  section. 


ai 


fS}(X)dk 


Corrections  for  Changes  In  tun  Angle, 
Haze  Level,  and  Background  Reflectance 


M0  fb(p, .U0.X) mSfik)d\ 
fS(k)d\ 


Assume  that  Landsat  data  are  available  for  a seg* 
, mem  corresponding  to  the  values  ph  m<j,  and  r H and 
it  is  desired  to  “correct”  these  data  so  that  they  cor* 
respond  to  some  other  set  of  “standard”  values  py, 
Mo.  and  r'H  for  these  parameters.  With  the  first  set  of 
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Table  i— Coefficients  for  Relating  Landsat-2  Data  to 
the  Equivalent  Spectrally  Flat  Radiance 


/ 

•/ 

4/ 

(a) 

1 

49S0 

-40 

2 

7471 

-4.5 

3 

•699 

-S3 

4 

4961 

-IJ 

•The  unlu  of  a/ ere  cm**”‘*” 


parameters,  a target  of  reflectance  P/  gives  rite  to  a 
gray-scale  level  X/  given  by 


*/  “ Ajffi.t*Q.ijj}pj  * Bl(Fl'tt0'rtt)  ^ 


With  the  second  set  of  parameters,  the  same  target 
would  give  rise  to  a gray-scale  level  X'/  given  by 

*1  ■ ai^Wo’1h)pi  * *,(Wh)  (14) 


Eliminating  p;  from  equations  (13)  and  (14),  one 
obtains 


x)  ■ AjX,  + B, 


where 


(15) 


(16) 


B,  • Bjffi.UQ.ifj}  - AjBj  ffi.UQ.ijj ) 


and,  in  making  the  kind  of  corrections  described  in 
this  paper,  the  most  difficult  task  is  to  determine  die 
values  of  Fi  and  rH. 

The  ATCOR  program  described  in  the  next  so 
tion  was  designed  to  provide  approximate  velum  for 
F)  and  tjj  and  to  interpolate  in  the  tablw  of  the  Af 
and  Bj  coefflcientt  to  obtain  the  appropriate  coeffi- 
cients to  correct  the  dam. 


Tim  ATCOR  Program 

The  ATCOR  program  is  based  on  the  assumption 
that  it  is  possible  to  obtain  a reasonable  estimate  for 
the  reflectance  of  those  portions  of  the  Earth’s  sur- 
face that  correspond  to  the  darkest  pixels  in  a given 
Landsat  segment  The  haze  level  can  then  be  deter- 
mined from  the  brightness  of  these  pixels.  This  ques- 
tion is  examined  in  detail  in  reference  19.  For  the 
present  discussion,  it  will  be  assumed  that  a reason- 
able estimate  for  the  Earth’!  reflectance  correspond- 
ing to  the  “darkest  pixels”  can  be  made. 

In  the  ATCOR  program,  band  I is  used  to  deter- 
mine the  haze  level  because  the  assumed  haze  model 
indicates  that  the  effect  of  haze  is  greatest  in  this 
band.  The  set  of  darkest  pixels  is  obtained  by  taking 
from  each  line  of  Landsat  data  the  pixel  that  has  the 
lowest  value  in  band  1.  An  average  minimum  value, 
called  Xijn^ttf),  is  obtained  by  averaging  the  values 
of  X|  for  these  pixels.  Also,  the  average  value  X,  for 
all  the  band  I data  in  the  segment  is  computed.  It  is 
assumed  that  the  reflectance  corresponding  to 
the  darkest  targets  (i.e.,  corresponding  to  the  value 
X|<min(r/,))  is  known.  Next,  one  calculates  the 
average  minimum  values  X|<m|n(ry),  J ■ 1, 2. 3,  cor- 
responding to  the  same  reflectance  and  to  the 
three  haze  levels  for  which  coefficients  were  calcu- 
lated (namely  rt  * 0.0,  r2 " 0.424,  and  r j *■  0.848). 
They  are  obtained  in  the  following  manner.  Using 
the  table  for  C^Ft^rf  generated  by  equation  (12), 
an  interpolation  is  performed  to  find  the  value  F\j 
of  p,  for  which  C,(pj>Mo>T7>  " *t  This is  done  for 
/-  1,2,  and  3.  Then,  using  the  tables  for  Ax  and  B,. 
an  interpolation  is  performed  to  determine  the 
coefficients  and  £|(Pi>Mo>r./)- 

Finally.  X,<mjn(r  j)  is  determined  from  the  equation 


Thus,  if  the  values  of  Fh  and  rH  for  a segment 
are  known,  the  data  can  easily  be  corrected  to  corre- 
spond to  any  other  values  of  these  parameters.  Nor- 
mally. Mo  *s  known  but  P}  and  rH  are  not  known; 


* (,8> 
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Using  the  three  calculated  values  Tor  Xt Mn(r  j\  cor* 
responding  to  J m 1, 2, 3,  the  value  of  rH  that  gives 
the  value  obtained  previously  for  X(jnjn(ry)  is  deter* 
mined  by  interpolation.  This  value  is  the  estimate  of 
T«- 

Once  an  estimate  of  rHhu  been  obtained,  an  esti- 
mate of  the  value  of  0)can  be  mule.  The  flrst  s ep  is 
to  calculate  the  average  values  for  all  the  data  in  the 
segment  for  bands  2,  3,  and  4 to  provide  X/  for  all 
four  bands.  Then,  pt  is  determined  by  interpolating 
to  find  the  value  of  pt  for  which  C^p^r^  - X"/. 
Finally,  the  program  interpolates  in  the  tables  for  A, 
and  B,  to  obtain  A^p^rH)  and  B^p^rH), 
which  are  printed  out  and  can  then  be  used  with 
equations  (16)  and  (17)  to  make  the  desired  correc- 
tions. 

The  ATCOR  program  was  tested  on  a data  set 
consisting  of  seven  pairs  of  acquisitions  over  three 
sites  in  Kansas  (ref.  13  and  Minter's  paper).  Each 
pair  consisted  of  two  acquisitions,  1 day  apart,  of  the 
same  site.  The  objective  was  to  determine  whether 
ATCOR  could  correct  for  haze  level  differences 
when  the  target  was  the  same.  One  acquisition  was 
selected  as  the  “training  segment"  and  the  other  as 
the  “recognition  segment."  The  recognition  segment 
was  classified  with  the  LARS  VS  classifier  using 

1.  Local  training 

2.  Signatures  from  the  training  segment  corrected 
by  ATCOR 

3.  Uncorrected  signatures  from  the  training  seg- 
ment 

To  correct  the  training  segment  signatures,  both 
segments  were  processed  by  ATCOR  to  obtain  the 
corresponding  values  of  p"h  mo<  and  rH,  and  then 
equations  (16)  and  (17)  were  used  to  compute  the  At 
and  B}  coefficients.  These  were  then  used  to 
transform  the  training  data.  The  results  showed  that 
ATCOR  generally  improved  the  classifications,  by  a 
substantial  factor  in  tome  cases.  Another  test  of 
ATCOR  was  performed  by  IBM.}  In  this  test,  the 
training  and  recognition  segments  were  not  the 
same.  ATCOR  generally  improved  the  results  but 
only  by  a small  amount.  However,  since  there  ap- 
parently were  only  small  haze  differences  between 
the  training  and  recognition  segments  in  most  cases, 
large  improvements  could  not  be  expected.  This  test 
is  further  discussed  in  reference  20. 

Although  ATCOR  was  designed  to  correct  for 
changes  in  haze  level  (rw).  Sun  angle  (0O),  and  back- 
ground reflectance  (Pt).  iu  principal  application  in 
LACIE  to  date  has  been  to  develop  Sun-angle  correc- 


tion tables  for  use  in  the  LACIE  clustering 
algorithm. 


XGTAR:  AM  ALGORITHM  TO  CORRECT 
LANDGAT  DATA  FOR  THE  EFFECT?  OF 
HAZE  AND  EUN  ANGLE 

The  XSTAR  preprocessing  algorithm  is  the  result 
of  a combination  of  physical  intuition,  empirical  ob- 
servation, and  a formulation  based  on  the  ER1M 
radiative  transfer  model  for  an  atmosphere  with  no 
absorption  (ref.  21).  The  algorithm  is  derived  as 
follows. 

Letting  primes  denote  quantities  which  character- 
ize a standardized  measurement  condition,  one  first 
represents  the  optical  thickness,  r)  (for  each  MSS 
channel  /),  for  this  condition  by 


rRI  * V 


(19) 


with  r'm  representing  the  Rayleigh  optical  thickness 
and  with  a/y’  representing  the  aerosol  optical  thick- 
ness in  each  MSS  channel  such  that  y is  a scalar 
parameter  (independent  of  channel  number)  related 
to  the  amount  of  hsze  in  the  atmosphere  and  at  is  a 
corresponding  function  of  channel  number  (or 
wavelength)  which  is  assumed  to  be  independent  of 
the  amount  of  atmospheric  haze.  For  Landsat-2  data, 
by  definition 


for  the  Landsst  spectral  bands  I through  4.  The 
parameter  y'  can  then  be  seen  to  characterize  the 
aerosol  optical  thickness  (for  the  standardized  condi- 
tion) in  a hypothetical  spectral  band  for  which  at  «■ 
I.  The  values  for  a/  were  calculated  from  the  esti* 


*S.  O.  Wh*et*».  "Retult*  of  Signature  E Mention  Experiment." 
IBM  Memo  IBU-RES.2J-I4.  July  29, 1976,  Alio,  wt  the  paper  by 
Miruer 
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mated  Landiat  in-band  optical  thickness  for  an  at- 
mosphere with  a horizontal  visual  range  of  23 
kilometers  (a  relatively  clear  atmosphere). 

Similarly,  for  an  observed  condition,  the  optical 
thickness  rt  is  represented  by 


Tt  “ *ri  ♦ 4 y)  (2X) 


Since  the  Rayleigh  optical  thickness  is  Independent 
of  the  amount  of  atmospheric  haze  (rKI  - r*,),  one 
may  write 

r,  ■ T/  ♦ a,y  (22) 


The  parameter  y then  measures  the  chanse  in  optical 
thickness  from  the  standardized  condition. 

Representing  the  MSS  signals  for  the  observed 
and  standardized  conditions  by  X and  X’/,  respec- 
tively, and  assuming  that  other  variables  in  the  radia- 
tive transfer  equation  are  restricted  so  that  the  only 
significant  variable  is  atmospheric  optical  thickness, 
the  equation  relating  the  signal  X/to  its  standardized 
value  X/(ref.  IS)  becomes 


Xj  * eVX,  ♦ (l  - eV)x;  ♦ />(«,?)  (23) 


In  general,  the  quantities  X*  and  fta/y)  are  both 
functions  of  the  scanner  calibration  and  of  the  il- 
lumination geometry,  viewing  geometry,  optical 
thickness,  and  background  albedo  of  the  standard- 
ized condition  (ref.  IS).  The  polynomial  function 
P{a/y)  is  also  a function  of  ayy,  with  its  first  term 
proportional  to  («/y)2,  and  thus  represents  higher 
order  effects  of  changes  in  optical  thickness. 

The  XSTAR  algorithm  is  based  on  the  mathemat- 
ical form  of  equation  (23).  excluding  the  higher  order 
terms  represented  by  Aa/y). 


Xj  - ♦ (l  - e#,t)x;  (24) 


Alternatively,  one  may  write 

(*»  - *?)  ■ «*'>,  - x/) 


From  equation  (2S),  it  is  apparent  that  the  vector  X* 
specifies  a point,  or  an  origin,  in  t k .ignal  space  rela- 
tive to  which  the  remainder  of  the  signal  space  ex- 
pands or  contracts  according  to  the  effect  of  each 
multiplicative  factor.  The  existence  of  the  point  es- 
tablished by  X*  has  led  to  the  name  XSTAR  for  the 
resulting  preprocessing  algorithm. 

For  Landsat-2  LACIE  segment  dau  (and  for 
Landsat-2  full-frame  dau  from  computer-compatible 
tapes  (CCTs)  produced  before  July  16, 197$)  which 
are  acquired  for  a Sun  zenith  angle  of  39*.  by  defini- 
tion2 (ref.  IS) 


for  the  Landsst  spectral  bands  I through  4.  For  other 
Sun  zenith  angles,  a cosine  Sun-angle  correction  must 
be  applied  to  the  data  before  applying  the  XSTAR 
correction;  hence,  equation  (2$)  then  becomes 


* ('  - <•“'>?  cm 


with  Mo  ■ cos  39*  and  with  Mo  representing  the  co- 
sine of  the  Sun  zenith  angle  for  the  dau  acquisition 
to  be  corrected. 

Fot  Landsat-i  data  (and  for  Landsat-2  full-frame 
dau  from  CCTs  produced  on  or  after  July  16. 197$). 
corrections  to  simulate  the  Lar.dsat-2  LACIE  seg- 
ment dau  calibration  first  need  to  be  applied  before 
using  equation  (27)  or  any  subsequent  equations. 
These  corrections  are  defined  in  ERIM  memo 
IS-PFL-I9I6. 


2P.  F.  Lambecfc.  “Revieod  Implementation  of  the  XSTAR 
Hew  Correction  Algorithm  and  AMOciated  Preprocetainc  Step* 
for  landiat  Data."  ERIM  Memo  IS-PFLI9I6.  Nov  1977. 
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To  apply  the  XSTAR  preprocessing  algorithm  to 
Landsat  data,  one  needs  to  determine  the  appropriate 
value  for  y,  which  measures  the  amount  of  correc- 
tion required.  Fortunately,  Landsat  data  distribu- 
tions tend  to  lie  within  a two-dimensional  hy- 
perplane,  when  displayed  in  the  four-dimensional 
Landsat  data  space,  and  this  hyperplane  shifts  its 
position  according  to  the  effects  of  atmospheric  haze. 
The  direction  in  which  this  shifting  is  most  easily 
discernible  is  specified  by  the  unit  vector  $ (ref.  IS). 


The  $ direction  is  equivalent  to  the  tasseled-cap 
“yellowness”  direction  (see  the  symposium  paper  by 
Kauth  et  al.  entitled  “Feature  Extraction  Applied  to 
Agricultural  Crops  as  Seen  by  Landsat”),  and  it 
measures  the  component  of  the  shift  of  the  data  hy- 
perplane which  is  perpendicular  to  the  usual  orienta- 
tion of  the  plane.  For  the  standardized  condition,  the 
average  signal  value,  measured  in  the  $ direction,  is 
represented  by  Y*,  with 


Y*  = -1 1.2082  Landsat  counts  (29) 


(This  Y*  value  has  been  chosen  to  represent  a typical 
atmospheric  condition,  not  necessarily  a clear  one.) 
Thus,  one  calculates  the  value  for  y that  will  shift  the 
mean  signal  value  (X/)  for  the  data  acquisition  to  be 
corrected  such  that  the  corrected  mean  signal  value, 
measured  in  the  $ direction,  will  equal  Y*. 


4 

£ 

/=! 


<*/T  **0 
e — 


X,  ♦ (l 


"‘y)xf 


Y*  (30) 


If  cafy  is  expanded  as  a series  in  ascending  powers  of 
aty  and  if  third  order  and  higher  order  terms  are  ig- 
nored, one  may  estimate  y by  calculating 


For  extremely  hazy  conditions,  the  quantity 
under  the  radical  in  equation  (34)  can  be  negative.  In 
such  cases,  the  square  root  may  be  set  equai  to  zero; 

i.e.. 


The  mean  signal  value  (X/)  used  in  equations  (31) 
through  (33)  should  be  calculated  from  pixels  that  do 
not  represent  clouds,  snow,  nonuniform  haze  con- 
centrations, cloud  shadows,  and  water,  so  that  the 
estimate  for  y will  not  be  biased.  A fully  automated 
technique  for  doing  this  (called  SCREEN)  has  been 
developed  and  is  documented  in  ERIM  memo 
IS-PFL-1916. 

The  quantities  X/  and  y are  calculated  during  one 
pass  through  the  data.  Equation  (27)  is  then  used  to 
apply  the  correction  during  a second  pass. 

The  XSTAR  preprocessing  algorithm  is  unique  in 
its  method  for  estimating  relative  changes  in  optical 
thickness  (y)  in  the  absence  of  ground  references  or 
ground  observations.  The  algorithm  also  retains  the 
original  form  of  the  data  after  applying  its  preproc- 
essing correction. 

A test  of  XSTAR  on  90  Landsat-2  consecutive- 
day  data  sets,  representing  a wide  range  of  Sun  zenith 
angles,  scene  characteristics,  and  atmospheric  condi- 
tions, has  indicated  that  XSTAR,  compared  to  no 
preprocessing,  doubled  the  number  of  consecutive- 
day  data  sets  for  which  the  day-to-day  euclidean  dis- 
tance between  the  signal  mean  vectors  was  less  than 
3 Landsat  counts  (an  estimated  upper  bound  on  ac- 
ceptable performance).  In  all,  one-half  to  two-thirds 


730 


of  the  data  sets  were  brought  within  three  Landsat 
counts  of  matching  after  applying  XSTAR,  and  the 
remaining  data  sets  (scenes  more  than  20-percent 
covered  by  clouds,  cloud  shadows,  or  snow)  were  in 
general  significantly  improved  by  XSTAR.  These 
results  are  plotted  in  figure  3. 

The  XSTAR  algorithm  does  not  attempt  to  com- 
pensate for  the  effects  of  view  angle,  background 
albedo,  atmospheric  absorption,  or  inconsistencies  in 
the  calibration  of  the  data.  However,  these  effects  ap- 
pear to  be  of  lesser  consequence  in  Landsat  data  than 
the  effects  of  haze  and  Sun  angle  for  which  XSTAR 
does  apply  a correction. 

Since  its  development,  the  XSTAR  algorithm  has 
been  tested  only  on  Landsat  agricultural  data.  Its  per- 
formance characteristics  on  nonagricultural  data  are 
not  yet  known. 


CONCLUDING  COMMENTS 

The  ATCOR  algorithm  is  based  on  a detailed  at- 
mospheric model  and  should  give  good  results  if  the 
“haze  diagnostic"  part  of  the  algorithm  gives  an  ac- 
curate estimate  of  the  haze  level  and  if  this  haze  level 
is  reasonably  constant  across  the  image  being  cor- 
rected. The  haze  diagnostic  should  be  accurate  if  the 
average  reflectance  of  the  darkest  objects  in  the 
scene  corresponds  to  the  “standard  value"  assumed 
for  this  reflectance  by  the  algorithm.  When  this  cor- 
respondence is  poor,  the  results  can  be  unsatisfactory 
and  this  is  undoubtedly  the  greatest  source  of  error  in 
applying  ATCOR.  However,  the  idea  in  ATCOR  of 
tabulating  preprocessing  correction  factors  from  an 
accurate  mathematical  model  and  then  interpolating 
to  estimate  an  appropriate  correction  for  each  scene 
appears  to  be  a significant  step.  Future  development 
efforts  should  concentrate  on  finding  an  improved 
haze  diagnostic. 


FIGURE  3.— Scene  average  euclidean  distance  error  from 
XSTAR  test  un  90  consecutive-day  LAC1E  acquisitions  (after 
cosine  correction  for  Sun  angle). 


Although  the  XSTAR  preprocessing  algorithm 
relies  on  a simplified  atmospheric  model,  it  uses  a 
haze  diagnostic  (displacement  of  the  data  mean, 
measured  in  the  $ direction)  that  is  especially  well 
suited  to  the  requirements  of  the  algorithm.  This  oc- 
curs because  displacements  of  Landsat  data 
measured  in  the  0 direction  correlate  significantly 
with  the  multiplicative  and  additive  changes  caused 
by  changing  observation  conditions.  As  a result,  the 
XSTAR  algorithm  is  capable  of  achieving  a modest 
level  of  success  with  great  consistency.  Efforts  are 
currently  being  made  to  use  the  XSTAR  haze  diag- 
nostic with  the  ATCOR  algorithm  and  thus  combine 
the  best  features  of  both  algorithms. 

Neither  the  ATCOR  algorithm  nor  the  XSTAR 
algorithm  provides  an  explicit  compensation  for  the 
effects  of  changing  Landsat  view  angle.  Atmospheric 
models  and  practical  experience  both  indicate  that 
these  effects  are  significant,  even  for  the  narrow 
range  of  view  angles  pertinent  to  Landsat  data. 
Development  efforts  are  currently  underway  at 
ERIM  (ref.  IS)  to  address  this  aspect  of  the  preproc- 
essing problem. 
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Appendix 

Calculation  of  the  Radiance  at  the  MSS 


REFLECTION  AND  TRANSMISSION 
MATRICES 


In  what  follows,  one  will  frequently  be  concerned 
with  reflection  and  transmission  matrices  (/?  and  T 
matrices),  which  describe  the  reflection  and 
transmission  properties  of  the  plane-parallel  scatter- 
ing layers  assumed  to  make  up  the  atmosphere. 
These  layers  are  assumed  to  be  horizontally 
homogeneous  and  to  extend  to  infinity  in  the 
horizontal  direction.  For  a layer  of  optical  depth  t,, 
the  reflection  and  transmission  matrices  are  defined 
by 


_ N(0,+y,<t>) 
»0F 


(36) 


diance  irF  through  a unit  area  normal  to  itself.  The 
subscript  diff.  refers  to  diffusely  transmitted  radia- 
tion; i.e.,  radiation  that  has  been  scattered  at  least 
once.  The  directly  transmitted  radiance  N0  is  given 
by 


tf0(r, -M.$)  = itFe  T*  (<t>  - 0O) 

(40) 

where  6 is  the  Dirac  delta  function.  Note  that 
upward-directed  radiation  is  all  diffuse,  so  the 
subscript  diff.  is  omitted  in  this  case. 


THEADDINO  METHOD 


r(WoA))  = + ro(^;fV*o)  <37> 

where 


Here,  is  the  radiance  at  optical  depth  r in 

the  di.  jn  specified  by  p and  <£,  where  p (0  < p 
1)  is  the  cosine  of  the  zenith  angle  0 measured 
from  the  normal  to  the  layer  and  <f>  is  the  correspond- 
ing azimuth  angle.  A minus  sign  in  front  of  p indi- 
cates the  direction  is  downward.  The  optical  depth  r 
is  measured  from  the  top  of  the  layer  downward; 
thus,  W(0,+/i,<£)  is  the  upward-directed  radiance  at 
the  top  of  the  layer,  and  iV(rt,— p,4>)  is  the 
downward-directed  radiance  at  the  bottom  of  the 
layer.  The  symbols  with  subscript  zero  refer  to  the 
incident  radiation.  The  incident  beam  has  an  irra- 


To  compute  the  MSS  response  for  various  values 
of  the  parameters  ph  0O,  rH,  and  one  first  com- 
putes the  radiance  at  the  MSS  for  these  values  of  the 
parameters.  This  is  done  by  computing  the  corre- 
sponding R matrix  and  using  equation  (36)  to  obtain 
the  radiance. 

The  method  used  to  compute  the  R matrix  is  the 
adding  method  originally  proposed  in  an  un- 
published report  by  Van  de  Hulst.4  (See  also 
reference  22  for  comparison  with  other  methods.)  It 
allows  one  to  take  the  R and  T matrices  for  two  sepa- 
rate layers  of  optical  depths  r,  and  r2  and  construct 
from  them  the  R and  T matrices  for  the  layer  of  opti- 
cal depth  tj  + r2  consisting  of  the  two  layers,  one  on 
top  of  the  other.  A special  case  of  the  adding  method 
occurs  when  the  two  layers  are  identical.  It  is  then 
called  the  doubling  method.  In  the  calculations  de- 
scribed in  the  following  paragraphs,  the  doubling 
method  is  used  to  build  up  R and  T matrices  for  the 
Rayleigh  and  aerosol  layers  that  constitute  the  model 
atmosphere.  The  adding  method  is  then  used  to  com- 
bine these  to  obtain  R and  T matrices  for  the  total  at- 
mosphere. Finally,  the  adding  method  is  used  to 
combine  the  atmospheric  matrices  and  the  R matrix 


4H.  C.  Van  de  Hulst,  "A  New  Look  at  Multiple  Scattering." 
unpublished  report,  NASA  Institute  for  Space  Studies.  Jan.  1963. 


for  the  Earth’s  surface  to  obtain  an  R matrix  that  de- 
scribes the  reflectance  of  the  overall  Earth/ 
atmosphere  system.  This  matrix  is  somewhat 
different  from  the  conventional  R matrix  since  it  de- 
scribes a system  that  is  not  horizontally 
homogeneous. 

The  principle  of  the  adding  method  is  depicted  in 
figure  4,  which  shows  two  scattering  layers.  It  is 
assumed  that  the  R and  T matrices  have  been  ob- 
tained for  the  two  layers,  and  it  is  desired  to  obtain 
the  R and  T matrices  for  the  two-!ayer  system.  In 
figure  4,  the  two  layers  are  separated  so  that  the  up- 
ward and  downward  radiation  field  where  they  join 
can  be  indicated. 

The  R and  T matrices  for  the  top  layer  in  figure  4 
will  be  denoted  RT  and  Tt  and  those  of  the  bottom 
layer  RB  and  TB.  It  is  understood  that  each  matrix  is 
a function  of  four  angular  variables,  which  are  omit- 
ted to  simplify  the  notation.  In  figure  4,  a part,  R\ , of 
the  incident  flux  is  reflected  by  the  top  layer,  and  a 
part,  £),,  is  transmitted  by  the  top  layer.  Of  the  part 
described  by  a part,  U\ , is  reflected  by  the  bottom 
layer,  and  a part,  7j,  is  transmitted  by  the  bottom 
layer.  The  process  is  continued  as  shown  in  the 
diagram.  All  the  transmission  matrices  ( Tr , r*  t, 
and  D ) include  both  the  diffusely  and  directly 
transmitted  parts.  The  solution  consists  in  determin- 
ing R and  7,  the  reflection  and  transmission  matrices 
for  the  two  layers  taken  together.  The  following  rela- 
tions can  be  read  directly  from  the  diagram. 


Ri  ~ rt 


Tn  TBDn 


Rn+l  = TTUn 


n = 1,2,... 


(41) 


irF  .R1  R2  .r3  R = “Rn 
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FIGURE  4.— Schematic  representation  of  the  adding  method. 


R = Rt  + TtRbD 

(44) 

T = TbD 

(45) 

The  products  in  equations  (41)  through  (45)  stand 
for  double  integrals  over  the  intermediate  angles.  For 
example,  (/-*  R BD  stands  for 


(46) 


All  other  products  are  defined  in  a similar  way. 

Separating  the  directly  transmitted  and  diffuse 
parts  of  Tt,  Tg,  D,  and  T,  one  obtains 


By  substitution  and  addition,  one  obtains 


D 


diff. 


r r.difr. 


+ Se 


rr/Mo 


+ ST. 


T,  diff. 


(47) 


z>  = [i  + (rtrb)  + (rtrbY  + ...]rT 
= [1+Sjrr  (42) 

V = RbD  (43) 


U = V''77'10  + RBD  diff. 

R = rt  + * Tr/^'  + ttmuu 


(48) 

(49) 
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Tm.  “ ° 


(50) 


where  /Vis  the  lamer  of  /Vrand  Afe.  The  coefficients 

R(m)  (MiM o)  40(1  ^ (MiMo) are  given  by  the  follow- 
ing equations. 


where  the  subscript  diff.  indicates  the  diffuse  part  of 
the  corresponding  matrix. 

Since  the  bottom  and  top  layers  are  homogeneous, 
the  solutions  are  even  functions  of  <60  — 4 and  can 
be  expanded  in  the  form 


B 

= E o)c°sm(0o  - 

m®  0 v 9 v ' 

(51) 


Qiw)(u.v)  - (1  + 6flm)  1 R^\uf)Rf\Z,v)zdz 

(55) 

* (»  + 60.m)^‘  fliTWWfWVdw 

(56) 

s^W)  = E eim)(«.v)  (57) 

«SI 


* 


*■ 


^Tn.  K)  * ♦ S""»(u.Mo)e  ^/«o 

0 + ^O.m  ^rSm.  (l'-#io) vdv  1 (58) 


with  identical  series  for  /?rand  RTm  , except  that 
everywhere  the  subscripts  are  T instead  of  B.  Most 
methods  of  solving  the  multiple  scattering  problem 
for  a homogeneous  layer,  including  the  doubling 
method  used  in  this  paper,  give  solutions  in  this 
form.  The  number  of  components  /Vfl  + 1 in  equa- 
tions (51)  and  (52)  is  the  number  of  components  in 
the  cosine  expansion  of  the  scattering  diagram  for 
the  layer.  (See  equation  (64)  and  the  discussion 
following  it.)  Substituting  equations  (51)  and  (52) 
and  the  corresponding  series  for  the  bottom  layer 
into  equations  (47)  through  (50),  one  obtains  similar 
series  for  R and  7^  describing  the  two  layers  taken 
together. 


N 

K(ti,0;tio,0o)  = E K<m)(M.M0) 

tn- 0 

(53) 


N 

= E rdtfr!(^o)cosm(*o  " *) 

m~0 

(54) 


'<,K)'  'r/',°  Ml  + M 

(59) 


*fm,K)  = *VM,K)  ♦ ^ 

* 0 + ^0,m)  /:  M‘  *win"(zj‘0)idz  (60) 


Crr'K)  •(,.,«) 

+ rsV,„,(^0)‘-  'r/“0 

(*  + 4°  m)  Jo  (l‘  Mg) ItJu 

(61) 

In  equations  (58)  through  (61),  r^and  Tflare  the 
optical  depths  of  the  top  and  bottom  layers,  respec- 
tively. Only  the  first  few  (£">(«,«.)  need  to  be  calcu- 
lated. As  >,  increases,  the  series  for  5<"»(w,y)  becomes 
a geometric  series  and  the  remaining  terms  can  be  ap- 
proximated by  a remainder  term. 
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It  will  be  assumed  that  the  Landsat  MSS  is  pointed 
vertically  downward;  i.e.,  that  the  look  angle  is  0.0°. 
This  assumption  greatly  simplifies  the  multiple  scat- 
tering calculations  and  seems  justified  since  the  max- 
imum look  angle  is  about  7°.  With  this  assumption, 
the  radiance  at  the  sensor  is  independent  of  0 and  0O 
so  only  the  m — 0 component  in  equations  (SI) 
through  (61)  needs  to  be  computed. 


THE  OOUBLINQ  METHOD 

The  doubling  method  is  simply  the  adding 
method  when  the  top  and  bottom  layers  are  the 
same.  By  repeated  doubling,  one  can  obtain  the  solu- 
tion for  a thick  homogeneous  layer  if  one  has  the 
solution  for  a thin  homogeneous  layer.  One  begins 
with  a layer  of  optical  depth  rt  and  "adds"  it  to  itself 
using  the  adding  method  to  obtain  solutions  for  a 
layer  of  depth  2 r,.  By  repeating  the  procedure,  one 
successively  obtains  solutions  for  depths  4rlt  8r|, 
16t,,  . . . , 2wtj  after  n doublings. 

Hansen  (ref.  23)  has  shown  that  a good  method 
for  obtaining  the  initial  layer  of  depth  T]  is  to  take  T| 
small  enough  that  only  first-order  scattering  is  im- 
portant. One  then  has  the  solutions 


tions  to  be  sufficiently  accurate.  This  is  the  method 
used  in  the  doubling  calculations  described  in  this 
paper. 

To  perform  the  numerical  calculations,  one  sepa- 
rates the  azimuthal  dependence  by  expanding  the 
scattering  diagram  in  a cosine  series 

N 

P(p0;-Mo.*o)  * £ f<m)(^o)cosmK  ' ♦) 

m= 0 

(64) 


where  coefficients  are  as  given  in 

reference  16,  page  150,  equation  (87).  One  then  has  a 
similar  expansion  for  RB and  rAdifr 

N 

0)  = £ *rMcos'"K  - 0)  (65) 

m»  0 

N 

tb, dirr.  (^oV  “ £ 7amdU(^)tosw(*o  *)  (66) 

m*  0 


where  is  the  scattering  diagram 

describing  scattering  from  the  direction  charac- 
terized by  t0  that  characterized  by  n,4>-  The 
convention  regarding  minus  signs  in  front  of  /*0  and 
H was  discussed  previously  (following  eq.  (39)). 
There  are  identical  expressions  for  /?rand  rrdifT.  In 
general,  r,  = 2~25  is  small  enough  for  these  solu- 


Then, R{™}  (m^0)  “ R ^m)  (M^0)  and 
*■  (M^0)  are  substituted  into  equations  (55) 

through  (61)  to  begin  the  doubling  process.  In  the 
calculations  described  in  this  paper,  only  the  m « 0 
component  was  calculated,  for  the  reasons  given  pre- 
viously. 


REFLECTION  AND  TRANSMISSION 
MATRICES  FOR  THE  ATMOSPHERE 

The  reflection  and  transmission  matrices  that  de- 
scribe the  total  atmosphere  are  denoted  Ra(hji0,X) 
and  Ta(hji0,K),  where  the  subscript  A stands  for 
“all.” Similarly, /I^(^^iq,\)  and  Tk(hji0,X)  describe 
the  upper  Rayleigh  scanning  layer,  and  R^fi^X) 
and  0,X)  describe  the  lower  haze  scattering 

layer.  Here  the  superscript  m has  been  omitted,  but  it 
is  understood  that  all  of  these  matrices  correspond  to 
m — 0.  The  parameter  X has  been  added  to  indicate 
the  wavelength.  The  calculations  are  performed  for 
25  values  of  m and  n0  (ns  mely,  24  gauss  points  plus 
the  value  1.0)  and  for  every  value  of  X fromO  * to  1.1 
micrometers  in  steps  of  0.01  micrometer. ' ney  are 
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also  performed  for  three  values  of  th;  namely,  0.0, 
0.424,  and  0.848.  For  a given  value  of  ? H,  the  calcula- 
tions are  made  for  all  values  of  X.  Under  the  assump- 
tions made  previously,  the  only  difference  between 
the  calculations  for  different  values  of  X is  that  the 
Rayleigh  and  haze  optical  depths  are  different,  as 
shown  in  figure  2. 

The  simplest  case  is  for  th  — 0.  Then,  Ra(hji  <>.X) 
- and  TA(n^k)  - Rr(ji,(i 0,X).The 

doubling  method  was  first  used  to  obtain  the  R and  T 
matrices  corresponding  to  Rayleigh  scattering  layers 

of  optical  depth  2~34, 2“23 2+n.  This  was  done 

by  starting  with  a Rayleigh  scattering  layer  of  optical 
depth  2_2S  and  doubling  36  times.  The  larger  values 
of  optical  depth  were  not  required  for  this  paper  but 
are  routinely  calculated  by  the  doubling  program. 
The  calculation  of  RR(n4iQ,k)  and  TR(n4i0,k)  was 
begun  with  the  largest  value  of  X;  namely,  X — 1.1 
micrometers.  The  optical  depth  for  the  correspond- 
ing layer  r^tl.l)  was  obtained  (fig.  2)  and  the 
matrices  Rr(hjm0,1.\)  and  Tr(.hji0,1.\),  describing 
a Rayleigh  scattering  layer  of  this  optical  depth,  were 
built  up  using  the  adding  method  to  “add”  certain 
previously  calculated  layers  that  were  selected  so  that 
the  sum  of  their  optical  depths  was  equal  to  r^(I.I). 
Next,  Rr(hjiq,1  .09)  and  7^(/x^i0,l.()9),  were  calcu- 
lated. Since  rg(1.09)  is  larger  than  tr(1.1),  this  in- 
volved “adding"  more  Rayleigh  scattering  layers  to 
the  layer  used  to  compute  and 

rx(/t^t0,l.l).  This  was  done  as  before  by  using  the 
adding  method.  This  procedure  was  continued  until 
the  calculations  had  been  made  for  all  the  selected 
values  of  X.  For  the  cases  th  — 0.424  and  th  ** 
0.848,  the  procedure  was  the  same  except  that  for 
each  value  of  X,  the  matrices  R^tiji^k)  and 
Tfjinji 0,X)  were  built  up  (in  the  same  way  as 
Rr(jijiq,\)  and  Tr(hjaq,K))  and  the  adding  method 
was  used  to  calculate  RA(n4i0,k)  and  Ta{iiji 0,X)  by 
“adding”  the  Rayleigh  scattering  layer  on  top  of  the 
haze  layer. 


RADIANCE  AT  THE  MSS 

This  section  describes  the  calculation  of  the  R 
function  associated  with  the  Earth/atmosphere 
system  from  which  the  radiance  at  the  MSS  can  be 
obtained  using  equation  (36).  It  was  assumed  that, 
for  a given  wavelength,  the  Earth's  surface  is  a Lam- 
bert reflector  with  a reflectance  p(X)  for  the  pixel  in 
the  field  of  view  at  a particular  instant  and  an  average 
reflectance  p(A)  for  the  background  (i.e.,  the  area 


around  that  pixel).  If  the  whole  surface  of  the  Earth, 
including  the  pixel  in  the  field  of  view,  haa  a unifoi 
reflectance  /r(X),  the  desired  R matrix  could  be  ob- 
tained from  equation  (60).  In  this  case,  the  top  layer 
would  be  described  by  Rjin^k)  and  TT(fijiQ<k) 
and  the  bottom  layer  would  be  described  by 
/?fl(Mji0,X)  - p(k)  and  f^MjioA)  " Also,  M " 
1 since  it  was  assumed  that  the  MSS  was  pointed  ver- 
tically downward.  Under  these  conditions,  the  m — 0 
component  of  equation  (60)  can  be  written 


R(i,p0,x)  = Rr(t.p0.x)  + * T|,|Mp(x)ijr(Mo.x) 


+ 2 


TY.diffO  .Z.A)t/(z.PQ,Ajz  </z 


(67) 


where 


Mx) = r//W  + 

= * ,r(A)/M°  * 2 f0'  2Wr.("Vx)"‘*' 


(68) 


Equation  (67)  is  not  exact  because  the  top  layer  was 
assumed  to  be  homogeneous  in  the  derivation  of 
equation  (60)  and  the  top  layer  is  not  homogeneous 
in  equation  (67).  However,  this  difference  should 
cause  only  a small  error,  which  will  be  neglected  in 
what  follows.  It  is  easy  to  reformulate  the  theory  to 
treat  an  inhomogeneous  top  layer  exactly;  however, 
to  date,  no  numerical  results  for  this  case  have  been 
obtained. 

If  one  considers  the  three  terms  on  the  right-hand 
side  of  equation  (67),  it  is  clear  that  the  second  term 
represents  the  radiance  that  is  directly  transmitted  to 
the  MSS  from  the  target  in  the  field  of  view  and  the 
other  two  terms  represent  the  path  radiance.  The 
first  term  represents  a contribution  from  the  at- 
mosphere alone,  and  the  third  term  represents  a con- 
tribution to  the  path  radiance  from  light  that  has 
been  scattered  by  the  Earth's  surface.  Thus,  if  the 
reflectance  of  the  pixel  in  the  field  of  view  were 
changed  from  p(k)  to  p(X),  the  main  effect  would  be 
to  change  p(k)  to  p(X)  in  the  second  term  on  the 
right-hand  side  of  equation  (67).  The  effect  on  the 
other  terms  and  on  fX(/i0.X)  should  be  negligible. 
Thus,  if  the  reflectance  of  the  pixel  in  the  field  of 
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view  is  p(A)  and  the  background  reflectance  is  p( A), 

the  corresponding  reflection  matrix  is  given  approx* 

• • * 

iiiunvtj 

R(p,P,P0,A)  = /?  (l  ,#i0,x) 

+ e rr(XV(p0,A)[p(A)  - p(A)l 
* a(p,P0,A)p(A)  + b(p,n0, A)  (69) 


where 

«(p,/V*)  3 e~Tr%'(p0,A) 

= e~TfiX)u(  l,p0,X)/p(A)  (70) 


*(p,P0A)  =/?(l,P0,Aj  ~ tf(p.M0,A)/I(A)  (71) 


In  R,  a,  and  6,  the  value  1 for  p.  has  been  dropped 
from  the  list  of  variables  to  simplify  the  notation.  It 
should  be  noted  that  the  function  R(p,pji0,K)  has 
properties  that  are  different  from  those  associated 
with  reflection  functions  as  they  are  usually  defined. 
However,  for  the  present  calculation,  the  important 
point  is  that  the  radiance  at  the  sensor  is  given  by 

A^P.P.Mq.X)  = M0F(A)R(p,p,p0,Xj 

= p0F(X)^(p,p0,X)p(X)  + 6(p.p0.X)]|(72) 


where  F(A)  is  the  solar  irradiance  at  the  top  of  the  at- 
mosphere at  wavelength  A. 

For  each  of  the  3 values  of  tw,  the  coefficients 
a(P4i0,X ) and  6(ftp0,A)  were  computed  for  the  71 
values  of  A,  the  25  values  of  p0,  and  50  values  of  p 
ranging  from  0.0  to  0.5  in  units  of  0.01 . This  was  done 
by  using  the  adding  program  in  the  usual  way,  with 
the  top  layer  described  by  R7<p,po,A)  and 
Tj-injiQ'k)  and  the  bottom  layer  described  by 


" P(A)  and  r^p^A)  - 0.  This  pro- 
duced the  matrices  R(jxjx0, A)  and  £/(p^*0, A),  and 
the  values  of  these  for  p — 1 were  used  to  compute 
a(pj* 0»X) 
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Development  of  Partitioning  aa  an  Aid  to  Spectral 

Signature  Extension* 

R.  W.  Thomas,0  C M.  Hay,0  andJ.  C Claydon0 


INTRODUCTION 

The  act  of  sorting  sample  segments  into  sets  hav- 
ing similar  spectral  reflectance  characteristics  has 
come  to  be  known  as  “spectral  partitioning"  in  the 
LACIE  context.  As  part  of  the  LACIE  supporting 
research  effort,  this  technique  has  been  investigated 
as  a means  of  maximizing  the  efficiency  of  Landsat 
classification  using  signature  extension  procedures. 
Lambeck  and  Rice  (ref.  1)  suggested  that  perform- 
ance of  affine  signature  extension  algorithms  (incor- 
porating corrections  for  both  multiplicative  and  addi- 
tive spectral  differences)  would  be  enhanced  by  at- 
tempting extensions  between  sample  segments  fall- 
ing in  the  same  spectral  partitions.  That  is,  within- 
group  spectral  variation  should  be  more  limited  than 
the  variation  in  the  population  as  a whole,  enabling 
mors  precise  estimates  of  parameters  used  in  sig- 
nature extension  algorithms  (refs.  2 and  3).  More  re- 
cent results  with  multisegment  classification  (ref.  4) 
also  suggest  that  partitioning  may  serve  an  important 
role  in  simultaneously  classifying  several  segments 
using  nonlocal  signatures. 

Lists  of  spectrally  similar  LACIE  sample  seg- 
ments forming  spectral  partitions  may  be  con- 
structed with  the  use  of  three  basic  information 
types:  static,  seasonal,  and  pass-specific  variables.  All 
are  directly  or  indirectly  related  to  spectral  signature 
behavior  and  can  be  used  to  define  spatial  domains 
over  which  crop-specific  signatures  should  be 
extendable.  For  example,  relatively  slowly  changing 
climatic  and  soil  characteristics  can  be  used  to  de- 
scribe, for  any  area,  growth  potentials  for  specific 
crops.  As  a consequence,  stratification  of  the  land- 
scape into  static  domains  or  strata  within  which  crop 
development,  and  therefore  spectral  development. 


•Work  (uppoited  under  NASA  Contract  NAS9-KS65. 
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should  be  similar  is  possible  using  these  “static” 
variables.  The  static  strata  may  in  turn  he  subdivided 
or  combined  by  use  of  seasonal  variable  information. 
For  instance,  departures  from  their  long-term 
averages  of  accumulated  growing-season  precipita- 
tion, heat  input,  and  other  variables  specific  to  the 
growing  year  can  be  employed  to  adjust  stratum 
boundaries.  Finally,  Landsat  pass-specific  scanner, 
atmospheric,  and  spectral  information  can  be  used  to 
lin  er  subdivide  or  refine  spettral  stratum  bound- 
aries. 

Combination  of  static  or  seasonal  spectral  strata 
with  pass-specific  information  produces  a dynamic 
partitioning  of  the  landscape  and  therefore  of  the 
LACIE  sample  segment  population.  The  relative  im- 
portance of  static,  seasonal  , and  real-time  variables  in 
defining  useful  spectral  partitions  has  been  the  sub- 
ject of  significant  debate  during  LACIE.  This  paper 
will  address  that  question  using  a static  spectral 
stratification  as  the  initial  partitioning  device.  After 
procedures  for  producing  the  static  strata  have  been 
defined,  the  relative  capability  of  those  strata  to  ac- 
count for  variability  in  wheat  signatures  will  be 
evaluated  and  contrasted  with  spectral  variance  ex- 
plained by  seasonal  and  Landsat  pass-specific  infor- 
mation. Much  of  this  work  has  been  reported  more 
extensively  by  Hay  and  Thomas  (refs.  3 and  S)  and 
by  Hay  et  al.  (refs.  6 and  7). 


STATIC  SPECTRAL  STRATIFICATION 
PROCEDURE 


Overview 

Viable  static  spectral  stratification  procedures  de- 
pend on  the  use  of  parameters  that  influence  long- 
term patterns  of  spectral  signature  and  are  easily 
measurable  as  well.  These  parameters  are  of  two 
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basic  types.  The  first  set  relates  to  physical  site 
(field)  conditions.  Soil  type,  physiographic  position, 
slope,  and  aspect  represent  such  site  characteristics. 
The  second  set  may  be  described  as  crop  growth  driv- 
ers. These  include  (I)  climatic  variables  and  (2)  long- 
term cultural  practices,  such  as  irrigation,  fertiliza- 
tion, and  mulching,  which  affect  the  amount  and 
timing  of  available  water  and  nutrients. 

Hay  and  Thomas  (ref.  3)  describe  a static  spectral 
stratification  technique  employing  subsets  of  both 
site  and  growth  driver  variables.  Developed  in  sup- 
port of  the  LACtE  Supporting  Research  and  Tech- 
nology  (SR&T)  Program,  their  procedure  uses  a 
combination  of  broad  climatic  strata  within  which  a 
finer  mosaic  of  soil  association  and  land  use  strata  is 
set.  Sample  segments  belonging  to  a given  climatic/ 
soit/land  use  stratum  (or  combination  of  these 
strata)  are  expected  to  have  similar  wheat  signatures 
on  the  “average."  On  any  given  Landsat  pass  date, 
lists  of  matched  training  (signature  source)  and 
recognition  (targets  of  signature  extension)  segments 
within  strata  must  be  adjusted  for  seasonal  abnor- 
malities or  atmospheric  effects. 


Selection  of  Stretlfleetion  Verleblee 

The  set  of  specific  variables  chosen  by  Hay  and 
Thomas  to  produce  static  signature  extension  strata 
included  (I)  general  soil  type  (soil  association),  (2) 
land  use.  (3)  average  long-term  growing-season 
degree-days,  and  (4)  average  long-term  growing- 
season  precipitation.  The  rationale  for  selection  of 
ear'll  of  these  variables  was  as  follows. 

Soil  type  and  land  use— Excluding  atmospheric 
effects,  the  spectral  signature  of  a cropped  field  (i.e., 
wheat)  is  a composite  signature  made  up  of  two 
general  components.  The  first  is  the  spectral  reflec- 
tance of  the  soil  background  and  the  second  is  the 
spectral  reflectance  of  the  crop  (vegetation)  canopy. 
The  amount  that  each  component  contributes  to  the 
composite  signature  is  dependent  on  the  percentage 
of  canopy  cover.  The  relative  contribution  of  soil 
background  to  the  composite  signature  on  any  given 
date  is  inversely  related  to  the  percentage  of  crop 
canopy  present  on  that  date. 

The  spectral  reflectance  of  soils  has  been  shown  to 
be  dependent  primarily  on  surface  moisture  content, 
organic  matter  content,  and  particle  size  (refs.  8 and 
9).  The  surface  moisture  content  of  a soil  is  a func- 
tion of  the  moisture  input  (i.e.,  precipitation  or  irriga- 
tion water),  the  period  of  time  since  the  last  input. 


the  texture  of  the  soil  (particle  size  distribution),  the 
organic  content  of  the  soil,  and  the  quality  of  the  site 
drainage.  The  general  overall  soil  background  tones 
within  areas  delineated  on  Landsat  color  composites 
in  this  study  were  positively  correlated  with  available 
moisture  capacity  indices  for  the  major  soil  series 
within  the  associations.  The  darker  toned  soils  had 
higher  available  water  capacity  indices  than  the 
lighter  toned  soils,  indicating  that,  on  a given  date, 
surface  moisture  content  accounted  for  a significant 
amount  of  the  variability  in  spectral  reflectance  of 
the  soil  background.  Thus,  soil  association  (general 
soil  type)  grouped  by  available  water  capacity  indices 
and  precipitation  (moisture  input)  was  chosen  as  a 
static  signature  extension  stratification  variable. 

Within  certain  image  biophases,  there  exist  crops 
whose  spectral  signatures  can  be  confused  with  those 
of  wheat.  The  presence  of  a confusion  crop  within  an 
area  is  dependent  or.  the  general  land  use/crop  type 
distribution  patterns  within  an  area.  In  addition,  crop 
canopy  density  and  development  can  be  affected  by 
cropping  practices  such  as  irrigation.  Irrigation 
availability  also  tends  to  diversify  a*,  agricultural  en- 
vironment, thereby  increasing  the  probability  of  con- 
fusion crops.  For  this  reason,  land  use  was  chosen  as 
a sutic  signature  extension  stratification  variable. 

Growing-season  degree-days  and  precipitation.— 
Spectral  reflectance  from  the  vegetative  canopy  is  a 
function  of  the  crop  canopy  density  and  the 
phenological  stage  of  crop  development.  Within  a 
given  region,  crop  phenological  development,  and 
therefore  crop  canopy,  is  greatly  dependent  on  the 
climatic  variables  of  temperature  and  precipitation. 
A wide  review  of  the  literature  (including  important 
references  10  to  13)  indicates  that  v.ie  following 
climatic  variables  could  be  used  for  stratification. 

1 . Average  growing-season  degree-day1  sums 

2.  Average  growing-season  precipitation 

3.  Average  last  date  of  spring  frost 

4.  Average  temperature  and/or  average  minimum 
temperature  for  the  coldest  month  of  the  year 


* Degree-day  it  a measure  of  daily  accumulated  temperature 
above  a specified  biological  threshold  temperature.  The  degree- 
day  value  for  a given  month  S is  defined  (ref.  13)  as  the  number 
of  days  in  a given  month  n limes  the  difference  between  the 
average  temperature  in  that  given  month  T and  a growth 
threshold  temperature  (commonly  40*  Fi  below  which  wheat  ha* 
been  found  not  to  accumulate  significant  biomass.  Thus,  the 
growing-season  day-degree  itm  an  be  expressed  *s  S *•  ifj 
- 40*  F),  where  j is  the  mor  th  index. 
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Generally,  the  iaolines  of  the  temperature  varia- 
bles are  very  positively  correlated  with  one  another 
so  that  only  one  or  two  need  be  considered  for  use  in 
stratification.  Thus,  average  (long-term)  growing- 
season  precipitation  and  average  growing-season 
day-degree  sums  were  included  as  static  stratification 
variables. 


Stratification  Procedure 

Land  usekoHs.— The  stratification  procedure  was 
developed  over  two  LACIE  test  rreas  in  the  U.S. 
Great  Plains:  Kansas  (winter  wheat)  and  North 
Dakota  (spring  wheat).  Basically,  the  stratification 
technique  involved  the  following  stepwise  sequence. 
First,  a single  time  of  year  was  identified  when  soils 
and  land  use  or  cropping  practice  patterns  of  interest 
were  most  separable  on  Landsat  color-infrared  9-  by 
9-inch  format  transparencies.  One  or  two  supple- 
mentary Landsat  dates  were  selected  to  allow  better 
identification  of  crop  or  soil  moisture  factors  indicat- 
ing different  soil  associations.  Then,  land  use  and  soil 
association  information  was  delineated  on  clear  ace- 
tate overlaid  on  the  full-frame  9-  by  9-inch  color-in- 
frared Landsat  transparencies.  Soil  association  lines 
and  land  use  lines  were  located  according  to  in- 
terpretation of  this  imagery  with  reference  to  land 
use  data  published  by  the  U.S.  Department  of 
Agriculture  (USDA)  Economics,  Statistics,  and 
Cooperatives  Service  (ESCS)  and  the  USDA  Soil 
Conservation  Service  (SCS).  This  soil  association 
stratification  was  dso  referenced  to  cross-correlated 
county  SCS  soil  data  and  regional  soil  association 
maps 

The  land  use  classification  system  employed  for 
stratification  is  given  in  the  appendix  This  system 
merges  desirable  features  from  the  U S.  Geological 
Survey  (USGS)  Circular  671  system  (ref.  14)  and 
suggestions  from  the  USDA  system.  The  soil  associ- 
ation classification  was  based  on  a cross-correlation 
of  SCS  county  soil  survey  data,  Aandahl's  Great 
Plains  soil  distribution  data,2  and  Landsat  data. 

Working  initially  within  the  counties  for  which 
detailed  soil  association  map  dal*  were  available,  the 
map  data  were  correlated  to  features  observable  on 
the  Landsat  transparency.  Any  needed  boundary  ad- 


2A R.  Amdahl.  Soils  of  the  Great  Plaint  Mar.  1:2  500000. 
Lincoln.  Ncbr..  1972. 


justments  were  made  to  bring  the  soil  association 
map  data  into  conformity  with  the  more  detailed 
spatial  data  inherent  within  the  imagery.  Correla- 
tions of  soil  associations  across  the  crop  reporting 
district  (CRD)  were  based  on 

1.  Commonality  of  soil  association  name 

2.  Image  feature  continuity  across  county  bound- 
aries 

3.  Similarity  of  descriptions  of  association  and 
soil  series  within  associations  (The  names  of  some, 
soil  series  and  associations  changed  when  they 
crossed  county  boundaries.) 

Correlated  associations  were  given  an  alphanu- 
meric code  which  contained  the  general  soil  group 
number  from  the  legend  code  of  Aandahl's  map  and 
a letter  assigned  to  each  subgroup  determined  from 
the  detailed  county  soil  survey  data. 

It  was  necessary  to  redefine  some  county  soil  as- 
sociations in  order  to  better  serve  the  needs  of  a sig- 
nature extension  stratification.  For  example,  some 
associations  at  originally  defined  within  the  county 
soil  report  contained  a significant  amount  of 
variability  in  soil  type,  available  moisture  capacity, 
and  land  use  as  a result  of  intermingling  of  signifi- 
cantly different  soil  types.  Within  the  constraints  of 
the  minimum  mapping  area  of  10  to  15  square  miles, 
it  was  possible  to  redefine  some  of  the  highly  varia- 
ble associations  into  more  uniform  associations  with 
respect  to  general  soil  type,  moisture  capacity,  and 
land  use  pattern.  Care  was  taken,  however,  that  any 
redefinition  of  soil  associations  did  not  violate  the 
definitional  concept  of  a soil  association. 

In  areas  for  wnich  detailed  county  soil  association 
map  data  were  not  available,  subdivision  of  Aan- 
dahl's  general  soil  groups  was  accomplished  through 
interpretation  of  the  Landsat  imagery.  (This  would 
be  the  technique  employed  over  regions  of  the  world 
where  only  very  general  soil  maps  may  be  available.) 
This  procedure  made  use  of  such  similarities  as 

1.  Landform  association  (topographic  site  rela- 
tionships) 

2.  Land  use  patterns 

3.  Soil  background  tone  on  given  dates  (correlata- 
ble  with  available  moisture  capacity  and  general  soil 
type) 

4.  Proportional  relationships  among  the  preced- 
ing characteristics 

Where  possible,  soil  associations  interpreted  from 
Landsat  imagery  were  rorrelated  with  soil  associ- 
ations from  the  county  soil  surveys.  This  was  ac- 
complished by  analyses  of  the  Landsat  imagery  for 
landform,  land  use,  soil  tone,  and  patterns  similar  to 


741 


count  y-ba«d  descriptions  and  by  the  use  of  gMlogi- 
cal  map  date  to  assess  patent  material  similarities. 

In  a small  number  of  cases.  Mil  association  areas 
contained  within  their  boundaries  significantly 
different  land  use  classes  (excluding  urban;  i.e.,  in* 
tensive  cropland  verms  rangeland).  The  question 
was  then  raised  as  to  the  validity  of  the  Mil  associ* 
ation  as  found  mapped  within  the  county  Mil  report. 
Analysis  of  the  detailed  soil  series  maps  (more 
detailed  than  the  Mil  association  map)  more  often 
than  not  indicated  that,  in  these  cases,  the  imagery 
was  probably  more  reliable  for  the  placement  of  soil 
association  boundaries  than  the  county  Mil  associ* 
ation  map.  Thus,  land  use  delineation  served  as  an 
iterative  check  on  the  mil  association  delineations. 

Once  the  land  use  and  Mil  associations  had  been 
delineated,  they  were  combined  and  registered  to 
1:1 000000-Kale  USGS  base  maps.  Importantly,  the 
land  uk  and  soil  association  classes,  and  conse- 
quently the  combined  land  use/soil  association 
classes,  were  defined  with  regard  to  the  interrelated 
effects  of  several  environmental  factors  (a.g., 
microclimate.  Mil  characteristics,  cropping  practices) 
on  wheat  growth  behavior  end  thus  on  spectral  sig- 
nature response.  The  assumption  was  that  the 
spectral  signature  of  wheat  would  tend  to  be  similar 
within  each  such  mapping  unit  type  throughout  the 
year,  subject  to  the  constraints  of  the  other  stratifica- 
tion variables  (e.g,  growing-season  precipitation). 

Combination  of  climatic  strata  with  land  use/soil 
strata. — Long-term  (30-year)  average  growing-season 
degree-day  and  precipitation  values  were  computed 
for  all  ground  meteorological  stations  having  com- 
plete temperature  and  rainfall  data  in  the  western 
two-thirds  of  Kansu  and  the  entire  state  of  North 
Dakota.  These  data,  together  with  longitude  and 
latitude  coordinates  for  each  station,  were  used  by  a 
computer  software  data  handling  package  (MAPIT 
package  available  on  the  University  of  California  at 
Berkeley  (UCB)  CDC-7600  computer)  to  generate 
isolines  of  degree-'* «ys  and  precipitation  over  each 
state.  iMlinc  values  were  set  to  allow  the  production 
of  meaningful  climatic  patterns  while  retaining  as 
much  reMlution  as  possible.  The  resulting  i splines 
were  then  smoothed  by  hand  to  remove  meaningless 
boundary  aberrations  introduced  by  the  interpola- 
tion algorithm. 

Next,  the  degree-day  and  precipitation  isoliues 
were  registered  to  each  other  to  form  climatic  strata. 
Multidate  Landsat  full-frame  imagery  was  then  in- 
spected in  each  state  to  determine  gr«*s  patterns  (Mil 
moisture  and  crop  development  stage)  of  wheat 


spectral  responu.  Degree-day  and/or  precipitation 
intervals  were  then  combined  to  form  larger  climatic 
strata  if  the  gross  pattern  apparent  on  Landsat  imag- 
ery suggested  that  larger  regions  were  giving  rise  to 
similar  wheat  spectral  response  "on  the  average." 

The  resulting  climatic  strata  were  registered  to  the 
1:1 000000*»cate  map  sheets  on  which  the  land 
use/Mil  association  strata  were  overlaid. 

Summary  of  Steps  Used  In  Producing 
the  Static  Spectral  Stratification 

Step  1 . A base  date  of  Landsat  imagery  is  selected 
from  a period  when  Mils  and  land  uk  or  cropping 
practices  are  most  contrasted  and  most  easily 
delineable. 

Step  2.  Soil  associations  are  delineated  on  the  baK 
date  color-infrared  transparency,  using  available 
published  Mil  data  and  interpretation  of  the  Landsat 
imagery.  The  associations  are  then  correlated  across 
the  CRD  and  ultimately  across  the  entire  aria  of  in- 
terest. 

Step  3.  Land  uk  or  cropping  practices  are  deline- 
ated on  the  baK  date  color-infrared  transparency, 
referencing  the  Mi!  association  delineations  pre- 
viously completed. 

Step  4.  The  delineations  from  steps  2 and  3 are 
combined  to  produce  one  iand-uae/general-soil-type 
delineation. 

Step  5.  All  remaining  CRD's  are  processed  in  a 
similar  manner.  The  resulting  land-use/general-Mil- 
type  strata  from  each  CRD  are  transferred  to  a 
1:1  000009-scale  USGS  bare  map  and  any  boundary 
inconsistencies  between  CRD’s  are  eliminated. 

Step  6.  Growing-season  degree-day  sums  are 
calculated  and  plotted  on  the  bare  coordinate  system 
by  the  reporting  meteorological  station.  iMlines  are 
then  determined  by  automatic  interpolation  and 
manual  smoothing  of  the  data. 

Step  7.  Growing-season  precipitation  is  calculated 
and  plotted  on  the  baK  coordinate  system  by  the  re- 
porting meteorological  station.  Isolines  are  then 
determined  as  in  the  care  of  degree-days. 

Step  8.  Climatic  strata  bounding  isolines  are 
Klected  by  referencing  the  land  use/soil  association 
strata  and  reveral  dates  of  Landsat  imagery  for  con- 
sistent correlations  of  Mil  color-tone  (soil  moisture) 
and  crop  development  stages  with  certain  isolincs. 

Step  9.  Climatic  strata  are  registered  to  the 
1:1  000000-scale  maps. 

For  a more  complete  description  of  the  procedural 
steps,  see  references  3 and  5 
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Figure  1 shows  the  climatic  strata  generated  for 
the  six  western  crop  reporting  districts  in  Kansas. 
Figure  2 shows  both  the  climatic  and  the  land 
use/soil  association  strata  for  North  Dakota. 


EVALUATION  OF  STATIC  PARTITIONS 
RELATIVE  TO  VARIANCE  CONTROL 
OBJECTIVES 


Spectral  Homogeneity 

The  static  stratification  developed  as  described  in 
the  preceding  section  was  subsequently  evaluated 
(ref.  7)  in  relation  to  its  capability  to  group  spectrally 
similar  areas. 

Approach. — The  experimental  procedure  was  com- 
posed of  five  basic  parts.  The  first,  preprocessing , 
standardized  sample  segments  to  a common  Sun 


elevation  and  haze  condition.  This  was  accomplished 
by  implementation  of  XSTAR  haze  correction  pro- 
cedures (ref.  IS)  developed  at  the  Environmental 
Research  Institute  of  Michigan  (ERIM).  Preprocess- 
ing in  this  case  provided  a more  stable  measurement 
frame  (Landsat  or  “Tasseled  Cap”  space)  and 
thereby  Increased  the  ease  with  which  real  spectral 
differences  could  be  identified  and  evaluated. 

Each  sample  segment  was  partitioned  according 
to  land  use/soil  association  strata  as  defined  by  the 
static  stratification.  Each  segment  partition  was  then 
individually  clustered  in  a single-date  mode  by 
ISOCLAS  (adapted  from  NASA  Johnson  Space 
Center  (JSC)).  The  clustering  process  was  limited  to 
10  iterations,  a maximum  band  standard  deviation  of 
3.2  Landsat  counts  within  a cluster,  and  a distance 
between  clusters  of  3.2. 

Resulting  clusters  by  segment  by  UCB  stratum 
were  .then  stratified  or  grouped  according  to  the  per- 
centage of  wheat  within  the  clusters.  This  was  ac- 


FIGLIRE  1.— Climatic  strata  for  the  state  of  Kansas  used  in  the  analysis  of  stratum  spectral  homogeneity.  The  ranges  for  lonx-term 
average  xrowinx-season  degree-days  and  precipitation  (reported  in  inches)  are  recorded  in  each  stratum  as  numerator  and  denomina- 
tor, respectively.  When  a range  is  not  given,  a range  is  assumed  in  the  obvious  plus  or  minus  direction. 
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H(il  Kt  2. — < I imalic  and  land  uw/soil  association  strata  for  the  state  of  North  Dakota  used  in  the  attal.ssis  of  stratum  spectral 
hotmiRrncits  l.olis.  Renll*  cumin:  lines  delineate  climatic  strata,  file  finer  resolution  mosaic  represents  the  land  usc/soil  association 
stratification,  kacli  land  usc/soil  stratum  is  tainted  h>  a fractional  code:  numerator  for  land  use.  denominator  for  soil  t)pe. 


coniplished  by  comparing  the  cluster  map  with  cor- 
responding blind  site  ground  data  maps.  To  mini- 
mize the  time  required  in  this  cluster  grouping  pro- 
cess. clusters  were  ordered  (highest  to  lowest)  by 
their  ( 2 x > band  7 to  (lx)  band  5 ratios  of  the 
cluster  means  on  the  Landsat  pass  date  in  question 
This  ratio  was  used  as  an  indicator  of  vegetation  and. 
depending  on  the  date  and  state,  of  wheat  versus 
other  crop  types.  Using  an  interactive  color  televi- 
sion (TV)  monitor  system,  clusters  having  the  high- 
er band  7 to  5 ratios  were  displayed  and  analyzed 
first,  followed  by  clusters  having  iower  ratios  down 
to  "a  dry.  stubble  vegetation  or  soil  line"  (1.10).  In 
this  way.  the  multiple  clusters  occurring  within  fields 
could  be  "reconstructed"  into  Held  patterns  and 
strongly  correlated  crop-type  patterns  on  the  initially 
blacked-out  TV  screen.  The  proportion  of  w heat  in  a 
given  cluster  could  then  be  readily  judged  according 
to  that  cluster's  distribution  among  fields.  Four  basic 
w heat  percentage  cluster  groups  were  established:  75 
to  KM)  percent,  50  to  less  than  75  percent.  25  to  less 
than  50  percent,  and  0 to  less  than  25  percent.  Infor- 


mation was  also  recorded  regarding  the  cover-type 
makeup  of  the  nonwheat  ; ortion  of  each  cluster 
group. 

A random  sample  of  pixels  was  labeled  from  the 
cluster  groups  comprising  75-  to  100-percent  and  50- 
to  less  than  75-percent  w heat  in  each  stratum  of  each 
segment  on  each  date.  A random  number  generator, 
operated  through  the  interactive  color  display 
system,  minimized  the  time  required  for  pixel  selec- 
tion. Ten  to  fifteen  pixels  in  each  of  the  two  cluster 
groups  were  labeled  as  to  crop  type  using  the  blind 
site  ground  data  maps.  This  labeled  pixel  sample 
served  three  purposes:  ( I ) it  served  as  a check  on  the 
visual  estimate  of  w heat  percentage  for  each  cluster 
group;  (2)  it  provided  the  data  employed  in  a Hotell- 
ing's T2  test  ol  wheat  spectral  difference  between  all 
possible  pairs  of  land  use  and  soil  strata/climatic 
strata  sampled;  and  (3)  it  provided  the  wheat  pixel 
data  used  later  in  a spectral  sensitivity  analysis. 

The  fil  th,  and  final,  step  in  the  analysis  was  to  per- 
form pairwise  spectral  comparisons  of  wheat  sig- 
natures between  all  possible  combinations  of  the 
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land  use/soil  strata  and  climatic  strata  sampled. 
These  comparisons  were  made  by  applying  Hotell- 
ing's T2  test  to  the  four-channel  Landsat  wheat  sig- 
natures obtained  from  each  pair  of  strata.  Three 
sources  of  wheat  signature  data  were  evaluated  sepa- 
rately: (1)  the  sample  of  pixels  from  the  75-  to  100- 
percent  wheat  cluster  group,  (2)  the  sample  of  pixels 
from  the  50-  to  less  than  75-percent  wheat  cluster 
group,  and  (3)  the  convened  sample  of  pixels  from 
the  75-  to  100-p-jrcc  >t . id  50-  to  less  than  75-percent 
wheat  cluster  grout  l.  obtained  in  each  stratum  of 
each  segment.  Comparisons  were  limited  to  the  same 
state  and  the  same  biostage.3 

The  result  of  the  Hotelling  test  was  a statistical  sig- 
nificance or  alpha  value  which  gave  the  probability 
that  the  observed  wheat  signatures  came  from  the 
same  population.  Alpha  values  of  0.05  (5  times  in 
100)  or  less  were  interpreted  to  mean  that  the  null 
hypothesis  (wheat  signatures  for  the  given  pair  of 
strata  are  similar)  was  to  be  rejected  for  the  given 
pair  of  strata  in  question.  By  noting  which  pairs  of 
strata  did  not  cause  rejection  of  the  null  hypothesis, 
sets  of  strata  having  statistically  similar  wheat  sig- 
natures could  be  defined.  Furthermore,  it  was 
assumed  that  nonrejection  of  the  null  hypothesis  of 
soectral  similarity  implied  a high  probability  of  ac- 
ceptable wheat  classification  performance.  That  is,  if 
wheat  spectral  models  (training  statistics  mean  vec- 
tor covariance  matrix)  obtained  from  one  portion  of 
a set  of  spectrally  similar  strata  were  used  to  classify 
(using  quadratic  or  linear  discriminant  functions) 
the  remaining  portion  of  that  stratum  set,  an  overall 
acceotable  level  of  classification  performance  would 
be  obtained.  Acceptable  as  used  here  is  defined  in 
relation  to  the  classification  accuracy  obtained  by 
classifying  on  the  basis  of  local  stratum  training 
statistics. 

Data  set. — In  Kansas  and  North  Dakota,  two 
biophase  periods  were  selected  in  which  to  apply  the 
grouping  and  sensitivity  analysis  procedures  just  de- 
scribed. The  first  date  in  both  states  represented  a 
wheat  emergence  condition.  The  second  date  corre- 
sponded approximately  to  a jointing  or  advanced 
jointing  condition  for  the  wheat  crop.  These  time 
periods  were  selected  on  the  basis  of  sensitivity 
analysis  results  reported  in  Hay  et  al.  (ref.  6),  which 
suggested  that  these  stages  were  most  difficult  to 


3For  purposes  of  this  analysis,  a given  bioslage  was  considered 
to  be  extended  over  the  several  days  (5-day  period  maximum)  in- 
cluded in  the  data  set. 


characterize  by  static  stratification  variables.  This 
analysis  was  therefore  considered  conservative  in 
relation  to  the  performance  of  the  static  stratifica- 
tion. Available  sample  segments  were  limited  to 
those  1976  LACIE  blind  sites  having  ground  data  so 
as  to  minimize  incorrect  interpretation  of  results. 

Results. — Within  each  state,  all  possible  pairs  of 
strata  were  tested  for  spectral  similarity.  For  each 
stratum  pair.  Hotelling's  T2  test  was  applied  sepa- 
rately to  a sample  of  pixels  from  the  75-  to  100-  per- 
cent wheat  cluster  group  (if  this  group  was  repre- 
sented in  both  strata)  and  similarly  to  a sample  of 
pixels  from  the  50-  to  less  than  75-percent  wheat 
cluster  group.  Pixel  data  from  both  cluster  groups  in 
each  stratum  were  also  pooled  and  tested  against  cor- 
responding pooled  data  in  other  strata. 

Results  presented  in  table  1 for  tests  based  on 
pooling  cluster  groups  1 and  2 (75-  to  100-percent 
wheat  and  50-  to  less  than  75-percent  wheat,  respec- 
tively) show  that  within  a given  climatic  stratum,  the 
null  hypothesis  of  spectral  similarity  between  land 
use/soil  strata  was  accepted  32  to  75  percent  of  the 
time.  Significance  levels  used  for  rejection  were  a < 

0.05  and  a < 0.01.  The  acceptance  rate  between  adja- 
cent climatic  strata  (i.e.,  strata  differing  by  one  class 
of  either  long-term  growing-season  degree-days  or 
precipitation  (not  both))  ran  between  0 and  43  per- 
cent. Resuits  for  tests  across  climatic  strata 
diagonally  adjacent  (differing  by  one  class  in  both 
degree-days  and  precipitation)  available  from  North 
Dakota  gave  acceptance  rates  of  50  percent  (date  1) 
and  67  percent  (date  2)  for  either  significance  level. 
In  general,  low  rates  of  acceptance  prevailed  for  land 
use/soil  stratum  pairs  separated  by  more  than  one 
adjacent  climatic  stratum. 

Based  on  the  results,  three  basic  patterns  were  evi- 
dent for  the  two  states  and  two  dates  involved. 

1.  The  wheat  signature  population  generally  over- 
lapped within  a given  climatic  stratum.  This  pattern 
was  more  pronounced  in  the  later  as  opposed  to  the 
earlier  date. 

2.  Wheat  signature  overlap  also  occurred  between 
horizontally,  vertically,  or  diagonally  adjacent 
climatic  strata.  The  frequency  of  overlap  was 
generally  at  a lower  rate  than  within  a given  climatic 
stratum.  It  should  be  noted  that  no  diagonally  adja- 
cent climatic  stratum  signature  comparisons  were 
available  for  Kansas.  Given  the  somewhat  larger 
areal  extent  of  the  climatic  strata  in  Kansas  relative 
to  that  in  North  Dakota  (owing  to  the  wider  class 
values  for  degree-days  and  precipitation  used  in  Kan- 
sas), the  signature  overlap  rate  between  diagonally 
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Table  I . — Hotelling's  T3  Test  Results 
of  Stratum  Grouping  Analysis 


Cluster  set  Frequency  of  null  hypothesis  acceptance,  percent 


Within  same  Between  adjacent  climatic  strata 
climatic  stratum  1 

Vertically  or  Diagonally 
horizontally 


a<  5% 

a<l\ 

a <}% 

a<  l % 

a <5% 

a<l% 

“Kansas 
date  1 

32 

42 

19 

43 

_b 

— 

“Kansas 
date  2 

50 

75 

<*0 

d0 

— 

— 

“North 
Dakota 
dale  1 

33 

50 

74 

24 

50 

50 

*Norlh 
Dakota 
date  2 

63 

75 

— ™ 

67 

67 

*IJ  land  use/soil  strata  distributed  over  8 sefments  and  5 climatic  strata 
'’Dash  indicates  no  stratum  pairs  available  for  lest 
cd  land  use/soil  strata  distributed  over  6 segments  and  5 climatic  strata. 
’'Based  on  only  3 possible  stratum  matches  available  for  test 
c13  land  use/soil  strata  distributed  over  8 segments  and  t>  climatic  strata 
land  use/sod  strata  distributed  over  5 segments  and  5 climatic  strata 


adjacent  climatic  strata  in  Kansas  is  expected  to  be 
lower  than  that  obtained  in  North  Dakota. 

3.  Wheat  signatures  rarely  overlapped  beyond  an 
adjacent  climatic  stratum. 


ERIM  Evaluation  of  Static 
Spectral  Stratification 

Personnel  of  the  ERIM  also  evaluated  the  UCB 
stratification  as  part  of  their  work  in  the  LACIE 
SR&T  effort  (ref.  16).  Their  approach  was  to  per- 
form all  possible  pairwise  signature  extensions 
among  23  segments  (a  total  of  506  extensions)  dis- 
tributed across  Kansas.  Extensions  were  based  on 
multitemporal  Landsat  data  for  biowindow  1 (Julian 
dates  291  to  90)  and  biowindow  2 (Julian  dates  90  to 
138)  which  had  been  Sun  and  haze  corrected  using 
the  ERIM  XSTAR  algorithm.  Resulting  field  mean 
classification  accuracies  (average  percentage  correct 
based  on  classification  of  mean  spectral  vectors  by 
field)  were  then  determined  for  each  extension. 
These  results  in  turn  were  summarized  into  within- 
stratum  and  between-stratum  extensions  and  then 
evaluated  using  an  analysis  of  variance. 


Table  II  presents  the  outcome  of  the  ERIM 
analysis  of  variance.  The  table  shows  that  there  was 
no  significant  difference  in  classification  accuracy 
for  within-  versus  between-stratum  extensions  in  the 
case  of  land  use  strata.  A similar  comparison  for  soil 
association  strata  could  not  be  made  in  that  those 
strata  divided  the  23  segments  into  23  different  parti- 
tions. Significantly  higher  within-  versus  between- 
stratum  classification  performance  occurred, 
however,  in  the  cases  of  both  degree-day  and  pre- 
cipitation strata  when  evaluated  separately.  Finally, 
the  best  within-  versus  between-stratum  classifica- 
tion performance  (86.5  percent  versus  66.6  percent) 
was  obtained  using  the  combined  degree-day  and 
precipitation  climatic  strata. 

The  ERIM  analysis  was  based  on  an  initial  UCB 
stratification  in  which  the  individual  climatic 
stratum  areas  were  somewhat  larger  than  those 
shown  in  figure  1.  Nevertheless,  the  following  in- 
terpretation of  the  results  presented  in  table  II  re- 
mains valid.  Namely,  the  climatic  strata  appear  to  be 
isolating  general  differences  in  crop  development — 
differences  related  to  the  crop-development-driving 
nature  of  the  climatic  variables  themselves.  Wheat 
spectral  differences  did  not  appear  to  be  highly  cor- 
related “on  the  average”  with  a relatively  high- 
resolution  land  use  static  stratification. 


ERIM  Multisegment  Classification  Results 

Recent  work  reported  by  Kauth  and  Richardson 
(ref.  4)  also  suggests  the  scale  of  spectral  partitions 
should  be  on  the  order  of  that  represented  by 


Table  11. — XSTAR  Field  Mean  Classification  Results 
From  ERIM  Evaluation  of  UCB  Static  Spectral 
Stratification  for  Wheat  in  Kansas 


Stratification 

Within  strata 

Across  strata 

Signifi- 
cance of 
difference, 
a value 

So. 

extensions 

Av 

XSTAR, 

percent 

correct 

So. 

extensions 

Av 

XSTAR. 

percent 

correct 

Land  use 

12 

67.2 

157 

70.4 

0.53 

Degree-days 

74 

72.8 

95 

637 

.01 

Precipitation 

41 

82.4 

128 

66.2 

.001 

Degree-days 
plus  pre- 
cipitation 

26 

865 

143 

666 

.001 
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climatic  strata,  They  are  in  the  process  of  developing 
a multitemporal,  multisegment  approach  to  crop  pro- 
portion estimation  in  the  LACIE  context.  The  basic 
procedure  is  to  (1)  Sun  and  haze  correct  all  Landsat 
passes  for  a given  population  of  sample  segments; 
(2)  simultaneously  cluster  all  segment  data  on  the 
basis  of  both  spectral  (Tasseled  Cap)  and  locational 
features;  (3)  cluster  the  spectral  mean  vector  of  each 
spatial  “blob”  (primarily  field  centers)  that  resulted 
from  the  initial  clustering  and  assign  each  blob  to  one 
of  the  spectral  strata  that  results  from  this  second 
clustering;  (4)  inspect  the  population  of  blobs  in  each 
segment  and  select  a minimum  set  of  segments 
together  containing  adequate  representation  of  all 
spectral  strata  in  the  entire  population  of  segments; 
and  (5)  sample  each  spectral  stratum  to  determine  its 
crop  composition  and  then  expand  these  proportion 
estimates  over  all  segments  to  obtain  final  crop  pro- 
portion estimates  and  variances. 

Application  of  this  procedure  to  17  LACIE  Phase 
II  segments  spread  across  Kansas  resulted  in  seg- 
ment proportion  estimates  close  to  truth  in  a region 
covering  the  width  of  3 to  4 climatic  strata  as  shown 
in  figure  1.  Proportion  estimates  were  significantly 
less  accurate  for  most  segments  falling  outside  this 
region.  This  area  of  successful  multisegment  propor- 
tion estimation  corresponds  approximately  to  an 
area  encompassed  by  a given  climatic  stratum  plus 
the  immediately  adjoining  climatic  strata. 


EVALUATION  OF  THE  RELATIVE 
IMPORTANCE  OF  8TATIC  VERSUS 
SEASONAL  AND  PASS-SPECIFIC 
PARTITIONING  VARIABLES  IN 
SPECTRAL  VARIABILITY 

To  gain  insight  into  the  underlying  factors  respon- 
sible for  the  results  seen  in  the  Hotelling’s  T2 
analysis  and  to  obtain  a measure  of  the  relative  im- 
portance of  static  and  nonstatic  spectral  stratification 
variables,  a spectral  sensitivity  analysis  was  per- 
formed on  the  data  set  described  in  the  first  portion 
of  the  preceding  section. 


Approach 

The  basic  approach  was  to  develop  regression  rela- 
tionships relating  spectral  reflectance  (dependent 
variable)  to  a set  of  static  stratification,  seasonal,  and 
date-specific  predictor  variables.  Matched  spectral 


response  and  predictor  variable  data  were  obtained 
for  all  pixels  sampled  in  the  analysis  of  stratum 
homogeneity. 

The  relative  importance  of  each  signature  predic- 
tor variable  listed  in  table  III  was  expressed  two 
ways.  The  first  consisted  of  the  percentage  of  total 
spectral  variance  (by  band)  explained  by  the  addition 
of  a given  predictor  variable  to  the  regression  equa- 
tion. Variables  were  added  in  the  same  order  as  listed 
in  table  III,  using  a stepwise  regression  technique. 
The  order— static,  seasonal,  date-specific — was 
chosen  to  most  effectively  identify  the  percentage  of 
spectral  variance  accounted  for  by  the  static 
stratification  variables  before  application  of  a sig- 
nature extension  algorithm.  The  R 2 (multiple  cor- 
relation coefficient  squared)  increments,  represent- 
ing the  percentage  of  variance  added  by  each  varia- 
ble, were  highly  dependent  on  this  ordering. 

The  second  measure  of  signature  predictor  varia- 
ble importance  did  not  employ  a prespecified  order 
of  entry  into  the  regression.  A forward  selection 
regression  procedure  (as  implemented  by  the  Statisti- 
cal Package  for  the  Social  Sciences)  was  used  to  o-  Jer 
variables  and  tabulate  the  R2  increments.  Using  this 
technique,  the  predictor  variable  having  the  highest 
simple  correlation  with  the  spectral  band  in  question 
was  entered  into  the  regression  first.  The  next  varia- 
ble entered  was  the  one  having  the  highest  partial 
correlation  with  the  spectral  band  after  the  effect  of 
the  first  variable  entered  was  removed  from  both  the 
dependent  and  the  independent  variables.  The  third 
variable  entered  had  the  next  highest  partial  correla- 
tion with  the  spectral  response  variable  among  all  re- 
maining predictor  variables  with  the  effects  of  the 
first  two  variables  removed,  and  so  on.  Order  of  en- 
try for  a given  variable  among  atl  bands  for  a given 
date  provided  the  second  measure  of  performance. 


Results 

Pixel  data  from  both  the  75-  to  100-percent  wheat 
and  50-  to  less  than  75-percent  wheat  classes  were 
poolsd  and  regressed  on  corresponding  static, 
seasonal,  and  Landsat  pass-specific  signature  predic- 
tion variable  data.  Results  for  individual  regressions 
on  each  Landsat  band  are  presented  in  tables  IV  to 
VII.  The  tables  are  arranged  first  by  state,  then  by 
date  1 or  2.  Each  table  is  then  subdivided  into  results 
for  ordered  regression  (part  (a))  and  regression  with- 
out prior  ordering  (part  (b)). 

The  most  striking  feature  of  the  tables  showing 
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TaBLE  lit. — Signature  Predictor  Variables  Used  in  the  Kansas  and  North  Dakota  Wheat  Spectral  Sensitivity 


Analysis 


PreJielor  variables 

Measurement  technique  used  for  each  field  sampled 

(.  Sialic  stratification  variables  (obtained 
from  static  strata  map) 

A.  Cultivated  area  percentage 
(CULTPCTl 

Midpoint  of  cultivated  area  percentage  range  for  the  land  use  class  covering  the 
wheatfield. 

B.  Available  soil  water-holding  capacity 
(AWC) 

Average  inches  of  water  held  per  inch  of  soil  at  field  capacity  in  the  top  24  inches 
for  the  static  strata  soil  association  covering  the  wheatfield.  These  values  are  ob- 
tained from  information  available  in  county  soil  survey  publications. 

C.  Long-term  average  growing-season 
degree-days  (LTGSDD) 

Midpoint  of  growing-season  degree-day  class  covering  the  wheatfield  Degree-day 
classes  obtained  from  30-year  average  data  by  automatic  and  manual  interpola- 
tion of  ground  meteorological  station  data  for  the  period  April  through  June  in 
Kansas  and  June  through  August  in  North  Dakota. 

D.  Long-term  average  growing-season  pre- 
cipitation (LTGSP) 

Midpoint  of  grow  ing-season  precipitation  class  covering  the  wheatfield.  Precipita- 
tion classes  obtained  from  30-year  average  data  by  automatic  and  manual  inter- 
polation of  ground  meteorological  data  for  the  period  April  through  June  in  Kan- 
sas and  June  through  August  in  North  Dakota. 

E.  Long-term  potential  average  available 
water  in  top  2 feet  of  soil  ((2-1  x 
AWC)  x LTGSP) 

Multiply  previously  obtained  values  of  AWC  and  LTGSP. 

F.  Long-term  growing-season  cvapo- 
transpiration  (LTGSET) 

Substitute  5-year  average  values  for  pan  evaporation  from  nearest  ground 
meteorological  station  making  this  measurement.  Alternatively,  empirical 
models  using  temperature  and  solar  radiation  may  give  satisfactory 
evapotranspiralion  estimates.  Currently,  only  pan  data  are  used  here. 

G.  Long-term  evapotranspiralion  stress  on 
soil  moisture  reserve  ((24  x AWC) 
x LTGSET) 

Multiply  previously  obtained  values  AWC  and  LTGSET. 

II.  Seasonal  variables  (specific  to  1975-7© 
growing  season) 

A.  Robertson  biostage  or  bionumber— A 
numerical  measure  of  crop  develop- 
ment based  on  daily  mtximum  and 
minimum  temperature  it  selected 
meteorological  stations  m LACK: 
countries 

Data  obtained  from  Robertson  biostage  isolint  maps  reported  for  the  Great  Plains 
in  the  Weekly  Meteorological  Summaries  produced  in  LACIE  The  Robertson 
system  divides  the  biological  stages  of  wheat  into  seven  development  phases:  ( I ) 
planting.  (2)  emergence.  (3)  jointing,  (4)  heading.  (5)  soft  dough  (turning 
greenish  yellow  to  yellow),  (b)  hard  dough,  and  (7)  harvest.  A Robertson  num- 
ber of  4.0  would  mean  that  50  percent  of  the  crop  is  headed  Robertson  numbers 
used  in  the  sensitivity  analysis  were  recorded  to  the  nearest  0.1  of  a development 
phase. 

B.  Growing-season  degree-days  accumu- 
lated to  Landsat  pass-date 
(SUMGSDD) 

Calculated  from  temperature  data  supplied  fiom  nearest  ground  meteorological 
station  having  a physical/climatic  setting  most  closely  approximating  the  seg- 
ment in  which  the  wheatfield  falls.  Growing-season  period.  Apnl  through  June 
(Kansas).  May  through  August  (North  Dakota). 

C.  Growing-season  precipitation  accumu- 
lated to  Landsat  pass-date 
(SUMGSP) 

Determined  as  in  U.B.  relative  to  precipitation  data. 
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Table  Hi — Concluded 


Predictor  variable* 


Meusuremeat  technique  med  lar  each  field  sampled 


D.  Growing-icuon  potential  available  toil 
water  in  top  2 feet  of  toil  column 
((24  x AWC)  x SUMGSP) 


Multiply  previously  obtained  values  for  AWC  and  SUMGSP. 


Growing-season  evapotranspiration  ac- 
cumulated to  Landsat  pass-date 
(SUMGSET) 

Growing-season  measure  of  available 
soil  moisture  in  top  2 feet  of  soil  (((24 
x AWC)  x SUMGSPI  - 
SUMGSET  *■  potential  available  soil 
water  minus  evapotranspiration  loss) 


Substitute  growing-season  sum  of  pan  evapotranspiration  data  from  nearest  station 
making  this  measurement. 


Use  values  for  AWC.  SUMGSP,  and  SUMGSET  obtained  previously  Note  that 
ground  water  table  (a  water  source)  is  assumed  not  to  be  near  the  soil  surface. 


G.  Average  lar.uary  1976  temperature 
(JANTEMPT) 


Determined  from  nearest  meteorological  station  as  in  II. B. 


Planting-season  degree-days  accumu- 
lated to  Landsat  pass-date 
(SUMPSDD) 


Determined  as  in  II.B.  but  for  the  period  September  through  November  (Kansas) 
and  April  (North  Dakota). 


I.  Planting-season  precipitation  accumu- 
lated to  Landsat  pass-date 
(SUM  PSP) 


Determined  as  in  II.B.  relative  to  precipitation  data  in  the  period  August  through 
November  (Kansas)  and  April  (North  Dakota). 


III.  Landsat  date-specific  variables 


A.  Precipitation  in  the  4 days  preceding 

Landsat  pass-date  (PPT4DA) 

B.  100  x tangent  of  Landsat  scan  angle 

(SCANANG) 


Determined  as  in  II.B.  relative  to  precipitation  data. 


Departure  measured  along  scan  line  of  segment  relative  to  an  imaginary  base  line 
perpendicular  to  the  scan  direction  and  passing  through  the  Landsa*  full-frame 
center  point.  Measurement  based  on  full-frame  center-point  longitude  and 
latitude  coordinates  given  in  Landsat  Cumulative  U.S.  Standard  Catalog  and  on 
sample  segment  coordinates  supplied  by  JSC.  The  departure,  reported  in  nautical 
miles,  is  defined  as  aero  on  the  base  line  and  increases  positively  to  the  east  and 
negatively  to  the  west. 

departure  (n.  mi.) 

Then  tan  (scan  angle)  - mean  satellite  altitude 
(494  n.  mi.) 


Landsat  band  7 to  band  $ ratio 
(R  ASF)— -This  ratio  is  one  real-time 
indicator  of  bioslage 


Obtain  (2  x ) band  7 to  ( I x ) band  $ ratio  for  the  pixel. 


results  for  Landsat  bands  with  ordered  regression  is 
the  significant  importance  of  long-term  growing- 
season  degree-days  and/or  precipitation  in  account- 
ing for  the  variation  in  spectral  response.  In  this  case, 
degree-days  was  the  strongest  for  both  Kansas  dates 
and  the  second  North  Dakota  date.  Long-term  grow- 
ing-season precipitation  accounted  for  the  larger 
share  of  variance  on  the  first  North  Dakota  date.  The 
other  variable  accounting  for  a substantial  amount  of 
spectral  variance  was  cultivated  area  percentage. 


This  variable,  obtained  from  the  static  stratification 
land  use  code,  was  significant  in  Landsat  bands  6 and 
7 on  date  2 in  both  states  and  in  all  bands  on  date  1 in 
North  Dakota.  An  evaluation  of  the  cross  variable 
correlation  matrix  suggests  that  the  importance  of 
the  cultivated  percentage  was  largely  an  artifact  of 
the  sample  distribution  in  North  Dakota.  One  other 
variable,  available  soil  water-holding  capacity 
(AWC),  was  expected  to  be  significant  in  North 
Dakota.  Unfortunately,  AWC  values  could  not  be 
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Table  IV. —Spectral  Sensitivity  Analysis  of  454 
Kansas  Pixels  Sampled  January  18  to  20,  1976 


(a)  Ordered  regression 


Variable 

AR*  value  for  Landsat  band — 

4 

! 

6 

7 

1.  Cultivated  percen- 
tage 

(CULTPCT) 

0.01 

— 

002 

— 

2.  Water-holding 
capacity  (AWC) 

.01 

0.01 

.01 

0.01 

3.  Long-term  grow- 
ing-season 
degree-days 
(LTGSDD) 

.17 

.19 

43 

.35 

4 Long-term  grow* 
ing-season  pre- 
cipitation 
(LTGSPi 

04 

09 

.04 

.06 

5.  (24  x AWC)  x 
LTGSP 

02 

03 

,02 

.04 

6.  Long-term  grow- 
ing-season 
evapotranspira- 
tion  (LTGSET) 

01 

.01 

7.  (24  x AWC)  x 
LTGSET 

03 

.03 

.03 

.04 

8 Average  January 
temperature 
(JANTEMPTl 

04 

Ob 

.07 

.05 

9.  Planting-season 
degree-days 
(PSDD) 

10  Planting-season 
precipitation 
(PSP) 

.02 

.01 

0i 

01 

1 1 Scan  angle 

.02 

.01 

— 

— 

\2.  Band  7/band  5 
ratio 

10 

.18 

.02 

.17 

Total  R2 

,4b 

61 

.64 

.73 

Square  root  of 
mean  square 
euor 

3.3 

4.5 

6.3 

2.4 

Total  sum 
of  squares 

8 4 * I03  22  9 x 10^49.5  x )03  9.0  x |()3 

calculated  lor  every 

land 

use/soil 

stratum; 

conse- 

quently,  this  variable  (as  well  as  composite  variables 
using  AWC)  was  omitted  from  the  sensitivity 
analysis. 

Although  the  reader  is  cautioned  against  putting 
much  weight  on  the  exact  order  of  entry,  the  follow- 
ing observations  were  deemed  significant  in  relation 


Table  IV. — Concluded 


(bi  Regression  without  prior  ordering 


Variable 

Order  of  entry  for  Landsat  band— 
(0) 

4 

5 

A 

7 

1 CULTPCT 

4 

4 

8 

7 

2 AWC 

3 

3 LTGSDD 

5 

5 

6 

6 

4 LTGSP 

11 

5 

4 

S.  <24  x AWC)  x LTGSP 

3 

10 

9 

6 LTGSET 

8 

7 

3 

2 

7 (24  x AWC)  x LTGSET 

6 

6 

10 

8 

8 JANTEMPT 

7 

8 

4 

3 

9 PSDD 

i 

1 

1 

1 

10.24) 

(0.28) 

(0.51) 

<0.43) 

10  PSP 

2 

2 

2 

5 

11.  Scan  angle 

9 

9 

7 

9 

Numten  in  ptrvnthne*  arc  A R‘  value* 


to  the  results  of  regression  without  prior  ordering.  In 
Kansas,  variables  entered  first  on  date  1 were  fall 
197$  planting-season  precipitation  and  degree-days. 
This  was  expected  for  a January  1976  pass  date. 
Long-term  growing-season  degree-days,  cultivated 
percentage,  and  scan  angle  were  the  first  variables 
entered  (i.e.,  having  the  highest  correlation  or  partial 
correlation  with  the  Landsat  band  values)  into  the 
regressions  on  the  second  date  in  Kansas.  Precipita- 
tion in  the  4 days  preceding  Landsat  pass  (4-day  ppt.) 
and  scan  angle  were  entered  consistently  as  the  first 
and  second  variables  for  both  dates  in  North  Dakota. 
If  present,  4-day  precipitation  can  have  an  important 
impact  on  spectral  signatures  by  wetting  the  soil  or 
canopy  surfaces.  Other  variables  entered  subse- 
quently included  either  long-term  ^r  seasonal  degree- 
day  or  precipitation  variables. 


VIEWING  THE  EVIDENCE  AS  A WHOLE: 
A CURRENT  PERSPECTIVE  ON 
PARTITIONING 


Nature  of  the  Spectral  Surface 

Using  the  results  of  the  spectral  sensitivity 
analysis  as  an  aid.  the  following  interpretation  of 
results  gained  in  the  analysis  of  spectral  homogeneity 
within  and  between  strata  is  offered. 
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TABLE  V.— Spectral  Sensitivity  Analysis  of  162  Kansas 
Pixels  Sampled  May  4 to  ?,  1976 


la)  Ordered  regression 


Variable 

4R-  value  for  Landtat  band- 

4 

.1 

6 

7 

t. 

CULTPCT 

mb 

0.21 

0.36 

2. 

AWC 

0,04 

0.05 

.01 

.03 

3. 

LTGSDD 

80 

.78 

53 

.30 

4 

LTGSP 

02 

— 

01 

.04 

5 

(24  x 
AWC)  x 
LTGSP 

.01 

.02 

,01 

01 

Tout  ft1 

86 

.84 

.76 

.74 

Square  root 
of  mean 
square 
error 

5.8? 

8 19 

9.05 

420 

Total  sum 

38  8 x I03  67.0  x | 

I03  54.2  x to3  10.7  x 103 

of 

squares 


Table  V. — Concluded 

lb)  Regression 

f 

-| 

1 

St 

Variable 

Order  of  entry  for  Landsal  band— 

la) 

4 

5 

ft 

j 

1. 

CULTPCT 

2 

2 

5 

6 

(0.31) 

(0.78) 

2. 

AWC 

5 

3. 

LTGSDD 

1 

(043) 

1 

(0.51) 

1 

(0.71) 

1 

(0.65) 

4. 

LTGSP 

6 

5. 

(24  x 

4 

6 

4 

AWC)  x 
LTGSP 

6 

LTGSET 

5 

7. 

Robertson 

6 

2 

bionumber 

8 

JANTEMPT 

4 

4 

9. 

Glowring- 

5 

% 

* 

season 

degree- 

day's 

(GSDDt 

10  Scan  angle 

3 

3 

3 

3 

, 1 

Number*  m paremhcvc*  ate  4/f‘  value* 


1.  The  multivariate  spectral  surface  for  wheat  ap- 
pears to  be  relatively  smooth,  gradually  changing 
over  space.  The  spectral  overlap  encountered  within 
and  between  climatic  strata  supports  this  notion. 

2.  Furthermore,  the  results  of  the  sensitivity 
analysis  indicate  that  this  surface  is  strongly  tied  to 
degree-day  and  precipitation  crop  development 
variables.  The  spectral  influences  of  long-term  grow- 
ing-season degree-days  and,  at  times,  long-term 
growing-season  precipitation  were  found  to  be  partic- 
ularly significant.  These,  of  course,  were  also  the  two 
variables  used  to  define  the  climatic  strata. 

3.  The  sensitivity  analysis  also  suggests  that  ex- 
ceptions to  interpretation  1 may  be  due  largely  to 
pass-specific  precipitation  differences,  their  interac- 
tion with  soil  type  reflectances,  and  scan  angle 
differences.  Land  use  may  also  have  an  impact  in 
situations  where  it  is  strongly  correlated  with  soil 
type  or  with  particular  agricultural  practices  affecting 
plant  canopy  reflectance  (eg.,  irrigation,  field  size 
and  shape). 

4.  Examination  of  the  spectral  surface  on  in- 
dividual dates  suggests  that  its  average  gradient 
changes  throughout  the  crop  year  for  wheat.  That  is, 
the  region  of  spectral  overlap  giving  adequate 
classification  or  proportion  estimates  will  vary  in  size 
if  analysis  ii  performed  on  a single-date  basis.  In  rela- 
tive terms,  this  region  may  be  of  moderate  size  early 
in  the  crop  year  (soil  background  reflectance  impor- 
tant), largest  just  before  heading  (accumulated 
weather/climatic  influences  dominant),  and  smallest 
during  heading  and  ripening  (local  soil 
moisture/depth,  crop  practice  influences  apparent). 
A limited  sensitivity  analysis  on  several  dates  in 
Kansas  for  the  1975-76  crop  year  (ref.  6)  suggested 
this  pattern. 

5.  The  ER1M  multisegment  results  suggest  that 
the  effect  of  these  changes  in  the  shape  of  the 
spectral  surface  on  classification  performance  may 
be  controlled  to  some  extent  using  multidate 
classification.  Further  analysis  is  required. 


Relative  Role  of  Static  Veraue  Real-Time 
Partitioning  Variables  in  the  Context  of  a 
Multitemporal,  Multisegment  Classification 
Approach  to  Signature  Extension 

At  present,  it  appears  that  a multisegment  cluster- 
ing and  classification  approach  similar  to  that  de- 
scribed earlier  (ref.  4)  provides  the  most  workable 
solution  to  the  signature  extension  problem.  The 
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Table  Vi— Spectral  Sensitivity  Analysis  of  192 
North  Dakota  Pixels  Sampled  May  24  to  28,  1976 


la)  Ordered  regression 


1 ariable 

AR* 

value  for  Landtal  band— 

4 

.t 

A 

7 

1 CULTPCT 

0.27 

029 

023 

0.17 

2 LTGSDD 

.02 

02 

.01 

04 

3 LTGSP 

31 

35 

.13 

06 

4 LTGSET 

.02 

01 

03 

02 

$.  Robertson 
bionumber 

.01 

.01 

01 

.01 

6 GSOD 

.01 

01 

— 



7.  Growing-season 
precipitation 
(GSP) 

02 

01 

.02 

03 

8.  Sum  growing- 
season  ;vupo- 
tranipiraiion 
(SUMGSETl 

03 

,04 

9.  MNTF.MPT 

— 

— 

— 

— 

Total 

.61 

70 

.46 

.37 

Square  root 
of  mean 
square 
error 

3.7 

5 0 

7.5 

3.7 

Total  sum  of 
squares 

6 3 X 103  15.5  X I03 

18.7  x 103 

3 8 x 103 

following  observations  are  addressed  to  the  applica- 
tion of  spectral  partitioning  in  that  context. 

1.  Multisegment  clustering  and  classification 
should  be  possible  using  climatic  strata,  or,  more 
generally,  distance  on  a climate-related  spectral  sur- 
face, as  a guide  to  segment  grouping.  In  other  words, 
it  should  be  possible  to  use  spectral  training  data 
(cover-type-specific  Landsat  or  Tasseled  Cap  band 
means,  variances,  and  covariances)  obtained  from  a 
specially  selected  sample  of  LACIE  segments  to 
classify  with  acceptable  accuracy  the  entire  set  of 
LACIE  segments  falling  within  climatic  partitions.  A 
cost  savings  over  training  and  classifying  each  seg- 
ment separately  should  result. 

Spatial  distribution  of  sample  segments  need  not 
be  dictated  by  spectral  strata  boundaries,  but  sample 
segments  can  and  should  be  allocated  among  area, 
yield,  or  production  strata  to  control  sampling  error 
in  standard  fashion.  Training  gains  (i.e.,  number  of 
segments  used  to  develop  spectral  models  for 
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classification  versus  total  number  of  segments 
classified)  shouid  be  largest  with  higher  sampling  in- 
tensities (number  of  sample  segments  classified  per 
unit  area).  In  this  regard,  it  may  be  cost-effective  to 
increase  sample  sizes  somewhat  over  those  required 
for  normal  estimate  precision  control  in  order  to  take 
advantage  of  a larger  training  gain.  This  action  would 
ensure  a higher  probability  of  achieving  assigned 
regional  precision  objectives  and  also  enable  the  pro- 
duction of  local  crop  proportion  estimates  of  higher 
precision. 

2.  The  center  of  any  given  spectral  partition  can 
be  conveniently  chosen  to  aline  with  the  center  of  a 
specific  population  of  sample  segments  occupying  a 
given  area  used  for  estimate  summary  (e  g.,  some  re- 
porting or  aggregation  unit).  A method  for  defining 
the  distance  rrom  the  spectral  partition  center  to  the 
partition  “boundary"  remains  a significant  research 
question.  Such  a distance  metric  will  undoubtedly  be 
a function  of  crop  development  and  in  turn  the 
growth-driving  environments!  variables.  For  the 
present,  spectral  partition  size  must  be  roughly 
defined  in  terms  of  climatic  strata  or  combinations  of 
climatic  strata  baseo  (a)  on  degree-day  and  precipita- 
tion differences  as  they  potentially  affect  growth  and 
(b)  on  actual  multisegment  classification  and  propor- 
tion estimation  performance. 


Table  VU.— Spectral  Sensitivity  Analysis  of  157  North 
Dakota  Pixels  Sampled  June  JO  to  July  2. 1976 


(a)  Ordered  regrenton 


Variable 

AR3  value  for  Landsat  band— 

4 

5 

6 

7 

1 CULTPCT 

004 

003 

0.10 

0.10 

2 LTGSDD 

02 

— 

22 

.16 

3.  LTOSP 

.02 

.01 

— 

.01 

4 LTGSET 

.07 

07 

.03 

.06 

$.  Robertson 
bionumber 

01 

03 

05 

.07 

Total  R2 

.15 

13 

40 

.40 

Square  root 
of  mean 
square 
error 

22 

2.8 

6.7 

3.9 

Total 
sum  of 
squares 

08  x 103 

1,4  x 103  11.1  x I03 

3.9  x 103 

3.  To  achieve  the  successful  multisegment 
classification  described  in  observation  1 will,  in  all 
probability,  require  Sun  angle  and  haze  correction  to 
a common  standard.  Although  the  XSTAR 
algorithm  suffices  for  this  purpose  in  the  case  of 
classification  of  Tasseled  Cap  bands  (ERIM  results), 
the  authors  have  found  that  some  question  remains 
as  to  the  proper  algorithm  to  apply  when  classifies' 
tion  is  based  on  Landsat  band  combinations. 

4.  AI*‘«ough  pass-specific  precipitation,  soil 
reflectance,  and  scan  angle  may  generate  spectral 
outliers,  these  should  not  generally  pose  significant 
problems  to  multisegment  clustering  within  climatic 
strata.  This  is  not  to  say,  however,  that  recognition 
segments  (segments  into  which  signature  is  extended 
from  others)  having  no  adequate  spectral  analogs 
will  not  occur.  Undoubtedly,  they  will.  But.  within 
many  biostages  or  combinations  of  biostages, 
multisegment  classification  as  described  in  observa- 
tion 1 should  be  possible  with  at  least  some  portion 
of  the  population  of  sample  segments  at  hand. 
Further  technical  developments  in  scan  angle  correc- 
tion and  flagging  of  soil  type  conditions,  etc.,  in 
which  outliers  will  occur  should  serve  to  maximize 
successful  use  of  the  multisegm'..:  approach  to  sig- 
nature extension  within  climatic  strata. 


Table  VU, -Concluded 


(b)  Regret  non  without  prior  ordering 


Variable 

Order  ol  entn  for  Landtai  band — 
(at 

4 

5 

A 

7 

1 CULTPCT 

4 

4 

4 

4 

2 LTOSDD 

5 

3.  LTOSP 

3 

5 

(0.11) 

4 LTOSET 

5 

J OSDD 

3 

(0.13) 

6 GSP 

3 

3 

(013) 

(0  09) 

7.  4-day 

1 

1 

1 

1 

precipi- 

(013) 

(007) 

tation 

8.  Scan  angle 

2 

2 

2 

2 

(0  12) 

(012) 

*Num(wt,  m «ilun 


The  general  question  of  crop  and  environment  in- 
teraction with  spectral  reflectance  i :;iains  an  area  of 
major  research  concern.  A co"»p!.,e.  robust  solution 
to  the  spectral  partitioning  problem  must  await  the 
results  of  further  work  on  signature  prediction  and 
modeling. 
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Appendix 

Legend  Code  for  Signature  Extension 
Land  Use/Soil  Association  Strata 


The  lend  use/soil  association  strata  are  annotated 
with  a fractional  code  The  numerator  is  the  land* 
use/crop-diversity  designation  and  the  denominator 
is  the  soil*group/soil*association  designation. 


Land  use  code 


. ^rop  diversity  code 

211-1 
88*A 


Soil  group  code 


iil  association  code 


LAND  USE  CLASSIFICATION  CODE 

100  Urban  and  built-up  land 

110  Residential,  commercial,  industrial,  in* 
stitutional,  transportational.  mixed,  open,  and  other 
120  Strip  and  clustered  settlements 
130  Resorts 

200  Agricultural  land  (more  than  15  percent  of 
area  is  cultivated) 

211  Cropland  and  intensive  pasture  (more 
than  75  percent  of  the  area  is  cultivated) 

212  Cropland  and  intensive  pasture  (more 


than  50  percent  but  less  than  75  percent  of  the  area  is 
cultivated) 

213  Orchard  and  vineyards 
220  Extensive  agriculture  (less  than  50  per* 
cent  of  the  area  is  cultivated) 

300  Rangeland  (less  than  15  percent  of  the  area  is 
cultivated) 

310  Grassland  range 
320  Woodland  range 
330  Chaparral  range 
340  Desert  shrub  range 
400  Forest  land 
500  Water 

600  Nonforested  wetland 
700  Barren  land 
800  Tundra 

900  Permanent  snow  and  icefields 


CROP  DIVERSITY  CODE 

1 Relatively  high  crop  diversity 

2 Medium  crop  diversity 

3 Low  crop  diversity 
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Methods  of  Extending  Crop  Signatures  From  One 

Area  to  Another 


T.  C.  Mintera 


INTRODUCTION 

Background 

The  Large  Area  Crop  Inventory  Experiment 
(LACIE)  is  an  attempt  to  establish  the  feasibility  of 
inventorying  the  production  of  wheat  on  a world- 
wide basis  by  using  Landsat  data.  A basic  5-  by  6- 
nautical-mile  sampling  unit,  called  a segment,  is 
employed.  A wheat  area  estimate  for  a region  is 
made  by  totaling  estimates  for  subregions,  where  the 
wheat  area  estimate  for  a subregion  is  based  on  an 
estimate  for  each  of  the  segments  in  the  subregion. 
Wheat  area  estimates  are  made  by  extracting  from  a 
segment  training  data  for  wheat  and  nonwheat  and 
then  using  the  statistics  for  these  training  data  to 
classify  the  segment  pixel  by  pixel.  This  local  train- 
ing and  classification  procedure  requires  that  training 
data  in  each  segment  be  labeled  by  an  analyst  in- 
terpreter (Al). 

Signature  extension  is  an  attempt  to  reduce  the 
total  amount  of  analyst  work  required  in  making  a 
wheat  area  estimate  for  a region.  The  approach  is  to 
extract  training  statistics  from  one  segment  and  use 
these  statistics  or  signatures  to  classify  several  other 
segments;  hence,  the  term  “signature  extension." 

This  paper  summarizes  much  of  the  work  ac- 
complished bv  LACIE  on  signature  extension  in 
1975  and  1976.  Several  significant  advances  in  haze 
correction  procedures  are  documented  in  this  paper. 
All  the  work  on  signature  extension  described  in  this 
paper  was  accomplished  by  the  Earth  Observations 
Division  of  the  NASA  Johnson  Space  Center, 
Lockheed  Electronics  Company,  IBM,  and  the  sup- 
porting research  institutions.  All  material  presented 


aLockheed  Electronics  Company.  Housion,  Texas. 


in  this  paper  has  been  extracted  from  documents 
published  by  these  groups. 

It  will  be  obvious  from  results  presented  in  this 
paper  that  signature  extension  is  a very  difficult 
problem.  The  approach  to  signature  extension  de- 
scribed herein  represents  LACIE’s  understanding  of 
the  problem  and  its  possible  solution  in  1975  and 
1976.  The  lack  of  success  of  this  signature  extension 
approach  led  to  the  development  of  the  multiseg- 
ment training  approach  to  signature  extension  de- 
scribed in  the  paper  by  Kauth  and  Richardson  en- 
titled “Signature  Extension  Methods  in  Crop  Area 
Estimation.” 


Objective  of  Signature  Extension 

The  objective  of  signature  extension  is  to  increase 
the  spatial-temporal  range  over  which  a set  of  train- 
ing statistics  can  be  used  to  classify  Landsat  data 
without  significant  loss  of  recognition  accuracy. 
Because  of  variations  in  measurement  conditions 
when  Landsat  data  are  collected,  the  computer  must 
be  retrained  on  a regular  basis.  The  crop  signatures 
observed  by  Landsat  are  not  constant  in  either  time 
or  space.  The  need  to  retrain  the  computer  requires 
labeling  of  new  examples  of  wheat  and  nonwheal,  a 
process  that  is  both  costly  and  time  consuming.  A 
viable  signature  extension  technology  for  LACIE 
would  provide  more  timely  and  cost-effective 
classification  over  extensive  land  areas. 


APPROACH  USED  IN  SIGNATURE 
EXTENSION 

The  proposed  approach  to  signature  extension 
was  to  use  the  training  samples  developed  by  an 
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analyst  in  one  segment  (the  training  segment,  or 
TSEG)  to  structure  a classifier,  which  could  then  be 
used  to  identify  wheat  and  estimate  wheat  propor- 
tions in  the  TSEG  and  in  several  nearby  segments 
(called  recognition  segments,  or  RSEG's).  It  was 
acknowledged  that  the  TSEG  wheat  and  nonwheat 
signatures  might  not  be  representative  of  signatures 
in  the  RSEG.  Any  difference  between  TSEG  and 
RSEG  crop  spectral  signatures  would  result  in  poor 
classifier  performance  in  the  RSEG's,  as  measured  in 
terms  of  probability  of  correct  classification  (PCC) 
and  a higher  variance  of  the  wheat  propoition  esti- 
mate. The  existence  of  a number  of  factors  that 
might  cause  variations  in  the  TSEG  and  RSEG  crop 
signatures  was  postulated.  These  factors  were 
divided  into  two  main  categories:  dynamic  factors 
and  static  sources  of  differences.  Some  of  these  are 
listed  in  table  I. 

To  facilitate  a successful  extension  of  signatures 
from  the  TSEG  to  an  RSEG,  it  was  proposed  that 
these  sources  of  signature  variation  be  accounted  for 
and  removed  from  the  TSEG  signatures  before 
classifying  the  RSEG.  Static  sources  of  variation 
were  to  be  removed  by  partitioning  segments;  i.e.,  by 
grouping  together  those  segments  from  a large  area 
which  had  similar  characteristics  of  the  sort  listed  as 
type  B in  table  I.  For  more  on  partitioning,  see  the 
paper  by  Thomas  et  al.  entitled  "Development  of 
Partitioning  as  an  Aid  to  Spectral  Signature  Exten- 
sion." 

Differences  in  signatures  from  the  TSEG  and  an 
RSEG  which  were  caused  by  dynamic  factors  (i.e., 
atmospheric  haze  and  Sun  angle  changes)  were  to  be 
removed  by  mathematically  modeling  these  effects 
and  correcting  the  TSEG  signatures  accordingly  on  a 
pairwise  basis.  It  is  well  known  (see  the  paper  by 
Lambeck  and  Potter  entitled  “Compensation  for  At- 
mospheric Effects  in  Landsat  Data"  and  refs.  1 and 
2)  that  signature  changes  caused  by  deferences  in  at- 
mospheric haze  level  and  Sun  angle  can  be  mathe- 
matically modeled  by  an  affine  transformation  of  the 
form 

>'k  = akxk  + bk  (1) 


where  xk  — a multispectral  scanner  measure- 
ment in  the  Ath  spectral  band  from 
the  TSEG 

yk  — the  transformed  equivalent  of  xk  in 
the  RSEG 


a*  - a multiplicative  factor  for  the  Ath 
spectral  band  which  is  a function  of 
the  differences  between  TSEG  and 
RSEG  haze  levels  and  Sun  angles 
bk  - an  additive  term  for  the  Ath  spectral 
band  which  is  a function  of  the 
differences  between  TSEG  and 
RSEG  haze  levels  and  Sun  angles 

A number  of  algorithms  were  developed  for 
estimating  the  coefficients  of  this  affine  transforma- 
tion (eq.  (1»  for  Landsat  multispectral  scanner  data. 
An  important  exception  was  the  University  of 
Houston  Maximum  Likelihood  Estimation  algo- 
rithm, which  estimated  the  RSEG  statistics  directly. 
More  will  be  said  later  about  this  approach.  An  im- 
portant constraint  imposed  on  these  algorithms  was 
that  ak  and  bk  had  to  be  estimated  without  the  aid  of 
any  training  data  in  the  recognition  segment.  If  train- 
ing data  were  made  available  in  the  RSEG,  the  sig- 
natures developed  from  these  data  would  be  prefer- 
able to  corrected  TSEG  signatures  in  classifying  the 
RSEG.  But  having  an  analyst  develop  these  training 
data  in  the  RSEG  would  defeat  the  purpose  of  sig- 
nature extensio; .. 

Three  approaches  were  taken  in  estimating  the 
corrections  to  be  applied  to  the  TSEG  signatures. 
These  were 

1.  Cluster-matching  algorithms 

2.  Distribution-matching  algorithms 

3.  Atmospheric  models 

The  cluster-matching  algorithms  used  a clustering 
algorithm  to  identify  the  inherent  spectral  classes  in 
the  TSEG  and  RSEG  data.  Clusters  corresponding  to 
the  same  crop  in  the  TSEG  and  the  RSEG  were  then 
matched  using  various  procedures.  The  coefficients 
a^and  bk  of  the  affine  transformation  (eq.  (I))  could 
then  be  readily  obtained.  The  algorithms  using  this 
approach  were  the  Rank  Order  Optimal  Signature 
Transformation  Estimation  Routine  (ROOSTER) 
and  the  Optimal  Signature  Correction  Algorithmic 
Routine  (OSCAR).  These  algorithms  are  discussed 
it:  greater  detail  in  a succeeding  section. 

The  distribution-matching  algorithms  used  max- 
imum-likelihood estimation  procedures  to  correct 
for  differences  between  TSEG  and  RSEG  probability 
density  functions  (pdfs).  The  University  of  Houston 
Maximum  Likelihood  Estimation  (UHMLE) 
algorithm  attempted  to  correct  for  these  pdf 
differences  without  any  assumptions  as  to  the  form 
the  correction  might  take.  The  Maximum  Likelihood 
Estimation  of  Signature  Transformation  (MLEST) 
algorithm  assumed  that  differences  between  TSEG 
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and  RSEG  pdfs  could  be  accounted  for  by  an  affine 
transformation  of  the  form  shown  in  equation  (1). 
MLEST  used  a maximum-likelihood  iteration  ap- 
proach to  select  a set  of  coefficients  ak,  bk  that 
matched  the  TSEG  and  RSEG  pdfs  as  closely  as 
possible.  MLEST  and  UHMLE  are  described  in 
detail  in  succeeding  sections. 

The  Atmospheric  Correction  (ATCOR)  program 
employs  an  atmospheric  model  to  predict  the  effect 
on  Landsat  data  of  changes  in  haze  and  Sun  angle.  In- 
dicators of  the  haze  level  are  derived  from  the  data 
and  processed  through  an  atmospheric  model  to  esti- 
mate the  coefficients  ak,  bk  of  the  affine  transform 
(eq.  (1)).  This  algorithm  is  discussed  in  another 
paper  (Lambeck  and  Potter’s)  and  will  not  be  de- 
scribed in  detail  here. 

Several  experiments  were  conducted  to  evaluate 
the  approaches  listed  previously.  These  experiments 
and  their  results  are  described  in  the  fourth  and  fifth 
sections.  Conclusions  drawn  from  these  experiments 
are  presented  in  the  final  section. 


A DESCRIPTION  OF  THE  SIGNATURE 
CORRECTION  PROCEDURES 


In  this  section,  several  of  the  signature  extension 
algorithms  tested  are  described  in  detail.  These  in- 
clude ROOSTER  and  its  modification,  OSCAR  and 
its  modification,  MLEST,  and  UHMLE  ATCOR  is 
described  in  the  paper  by  Lambeck  and  Potter  and 
will  not  be  discussed  here. 

The  following  notation  is  used  in  the  mathemati- 
cal description  of  the  algorithms  discussed  in  this 


paper. 

w 

— set  of  samples  from  the 

{>’) 

training  segment 
= set  of  samples  from  the 

mt 

recognition  segment 
= number  of  subclasses  in 

mr 

the  training  segment 
=*  number  of  subclasses  in 

M/«  i — 1,2 Mt 

V-  !'2 mt 


the  recognition  segment 
dimensionality  of  sam- 
ples 

subclass  means  in  the 
training  segment 
subclass  covariance 
matrices  in  the  training 
segment 


9/,/“  1,2... 

,mt 

M/,1-  1,2,.. 

,mk 

i;,/- 1,2,.. 

,mr 

1,2,.. 

.,mr 

a priori  probabilities  of 
the  training  segment 
subclasses 

subclass  means  in  the 
recognition  segment 
subclass  covariance 
matrices  in  the  recogni- 
tion segment 

a priori  probabilities  of 
the  recognition  segment 
subclasses 


Cluster-Matching  Algorithms 

Introduction. — In  this  section,  two  cluster-match- 
ing algorithms  and  their  modifications  are  discussed. 
The  basic  theory  is  presented,  then  the  algorithms 
are  described  in  detail.  The  algorithms  described  are 
ROOSTER  and  OSCAR. 

The  theory  of  cluster-matching  algorithms. — Given 
that  an  affine  signature  transformation  is  to  be  used 
to  compensate  for  multiplicative  and  additive 
differences  between  two  scenes,  the  values  of  the 
coefficients  ak  and  bk  for  equation  (1)  must  be  esti- 
mated. For  this  purpose,  one  needs  some  effective 
way  of  comparing  the  data  from  the  two  scenes.  One 
method  for  accomplishing  this  comparison  is  to 
compare  cluster  statistics  for  the  scenes. 

Consider  two  scenes  where  the  same  ground 
classes  are  present  in  the  same  proportions  but  the 
data  values  in  scene  2 differ  from  those  in  scene  1 by 
a transformation  of  the  form  shown  in  equation  (1); 
i.e.. 


->'i k = ak*k  + bk  (D 

In  this  case,  the  probability  density  function  of  the 
data  in  each  scene  should  look  the  same;  i.e.,  the 
same  number  of  modes  should  be  present  in  each 
scene,  each  with  the  same  frequency,  but  the  location 
of  the  modes  will  differ  by  the  scale  factor  ak  and  the 
displacement  bk  in  the  A.th  Landsat  band.  Each  scene 
is  clustered  separately  to  find  these  modes.  Since  the 
same  classes  are  present  in  both  scenes,  the  corre- 
sponding modes  can  be  paired  up.  The  parameters  ak 
and  bk  can  be  estimated  from  the  locations  of  these 
paired  modes.  In  the  algorithms  reported  on  here, 
each  mode  is  described  by  the  mean  of  a cluster  and 
ak  and  bk  are  estimated  by  the  method  of  least 
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squares.  In  practice,  several  difficulties  arise:  first, 
the  ground  class  labels  for  the  modes  of  the  data  in 
the  RSEG  are  unknown;  second,  the  frequencies  of 
the  same  ground  class  in  the  two  scenes  will  usually 
differ;  and  third,  some  ground  classes  present  in  one 
scene  may  not  be  present  in  the  other. 

The  first  basic  cluster-matching  algorithm,  called 
MASC  (for  Multiplicative  and  Additive  Signature 
Correction)  (ref.  2),  was  developed  at  the  Environ- 
mental Research  Institute  of  Michigan  (ERIM)  to 
test  the  cluster  regression  approach  to  determining 
the  ak  and  bk  coefficients.  Although  this  algorithm 
achieved  some  occasional  successes  at  signature  ex- 
tension, i>  did  not  include  an  adequate  means  for 
selecting  only  valid  cluster  pairs  from  the  many  po- 
tential cluster  pairs. 

The  difficulty  involved  in  identifying  valid  cluster 
matches  between  a pair  of  scenes  may  perhaps  be 
partly  appreciated  by  considering  the  problem  of 
matching  a set  of  10  training  scene  clusters  with  a set 
of  10  recognition  scene  clusters.  If  one  tries  to  ex- 
amine all  possible  sets  of  10  cluster  pairs  to  find 
which  is  best,  one  finds ‘that  there  are  10!  (3  628  800) 
sets  of  pairs  to  be  considered,  assuming  that  there  are 
no  multiple  pairings  with  the  same  cluster.  If  one 
happens  to  guess  that  only  8 valid  pairs  are  possible, 
then  the  number  of  sets  of  pairs  to  be  considered  in- 
creases by  a factor  of  45/2,  to  more  than  80  million; 
i.e., 

Obviously,  there  are  two  basic  difficulties  to  be 
dealt  with  in  finding  the  valid  cluster  pairs  from 
which  to  derive  the  required  signature  transforma- 
tion. The  first  is  to  reduce  to  a practical  number  the 
sets  of  cluster  pairs  to  be  examined,  and  the  second  is 
to  determine  which  among  the  remaining  candidate 
sets  of  cluster  pairs  are  most  likely  to  be  valid. 
ROOSTER  and  OSCAR  take  varying  approaches  to 
the  solution  of  these  two  problems. 

ROOSTER. — The  Rank  Order  Optimal  Transfor- 
mation Estimation  Routine  (refs.  3 and  4)  selects 
pairs  based  on  channel  ranks.  Channel  ranks  have 
the  important  property  of  being  invariant  with 
respect  to  the  affine  haze/Sun  angle  correction  (eq. 
(1)).  Specifically,  if 

%-  > Rfk  {1) 


then 


ak*tk  + bk  > Wik  + h <3> 


provided  ak  > 0.  Hence,  corresponding  RSEG 
clusters  will  manifest  the  same  order  relationship. 
Using  this  basic  idea,  the  steps  in  the  algorithm  are  as 
follows. 

Step  1.— Cluster  each  segment.  Let  iiik  be  the 
mean  of  the  /th  cluster  in  the  kth  channel  from  the 
TSEG.  Let  jtj*  be  the  mean  of  the  fih  cluster  in  the 
Ath  channel  in  the  RSEG.  Let  MT  be  the  number  of 
clusters  in  the  TSEG  and  MR  be  the  number  of 
clusters  in  the  RSEG.  The  clusters  are  unlabeled; 
therefore,  one  does  not  know  which,  if  any,  of  the 
(ilk  means  corresponds  to  njk. 

Step  2.— For  all  clusters,  / — 1 A/r,  compute 

the  pseudorank  vector  ujk  for  the  k\h  channel  of  the 
TSEG. 


w*< 


where  G(jijk  — nwk:  t)  is  defined  as  follows: 

1 - Kk  > ' , 

(5) 

0i%  Kk  < 

and 

G(^ik  - »wk:t)  = (%  “ + !)/2t  <6> 


if 

k < 1 


The  parameter  /,  an  adjustment  factor  for  determin- 
ing pseudorank,  is  specified  by  the  user  and  is  in- 
tended to  be  small  but  positive.  If  t — 0,  then  u,A.  is 
the  vector  of  ranks  multiplied  by  a constant. 


* 
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Step  3. — For  all  clusters,  j * 1.2 com- 

pute the  pseudorank  vector  for  the  Arth  channel  of 
the  RSEG. 


mk 

V =a7TLT  E 4:') 

K W - 1 

*V  *j 


where  G(n‘k  — n‘wk\  i)  is  defined  in  the  same  manner 
as  GiUj k - mh*;  in  step  2. 

Step  4.— Compute  a measure  of  the  similarity  of 
ranking  cy  of  the  ith  TSEG  cluster  to  the  y'th  RSEG 
cluster  over  A = 1,2 p channels. 

cii  = E |um  - v/*|°  (8) 

k = l 

where  Q is  the  power  for  the  fit  criterion  and  is  a 
user-specified  parameter;  currently,  0=1. 

Step  5. — Rank  cy  in  order  of  ascending  values  and 
select  sof  these  where  s « min  (Mr,  MR).  Pass  over 
any  c(J  for  which  i or  j has  been  previously  selected. 
Relabel  corresponding  pairs  of  cluster  means  (Mj, 

Ml),  (M2«M2> (Mj.Mj)-  The  user-specified 

parameter  s is  the  maximum  number  of  cluster  pairs 
to  be  used  in  the  regression  for  the  coefficients  of  the 
affine  transformation  (eq.  (1)). 

Step  6. — Let  [(HJ  j be  an  r element  subset  of  the 

set  (1,2 s),  where  ris  the  minimum  number  of 

pairs  to  be  used  in  the  regression  for  the  affine 
transformation.  For  each  wand  A,  calculate  awk  and 
fiwk  to  minimize 


The  quantities  awk  and  fiwk  are  the  intercept  and  the 
slope  for  a simple  least  squares  regression. 

Step  7 — Select  the  w so  that 

p 

w = ,nin  E 'wk 

k = 1 


The  coefficients  of  the  affine  transformation  (eq. 
(1))  are  then 

ak  = awk'bk  = 00) 

In  analyzing  the  procedure,  one  can  see  that  if  all 
(or  nearly  all)  classes  are  present  in  both  segments, 
the  pseudorank  vectors  u,and  v;  should  be  nearly  the 
same  for  both  segments,  and  the  proper  matches  can 
be  obtained.  Two  possible  difficulties  are  (1)  the 
cluster  means  are  subject  to  random  variation  that 
can  cause  rank  reversals  from  one  segment  to  the 
other  and  (2)  some  clusters  may  be  found  in  only 
one  segment. 

The  first  difficulty  is  met  by  using  the  G functions 
to  reduce  the  effect  of  small  random  variations  and 
by  forming  pairings  based  on  all  channel  ranks  and 
thereby  gaining  the  advantage  of  cumulative  evi- 
dence. The  problem  of  unmatched  clusters  is  miti- 
gated by  the  use  of  ranks.  One  unmatched  cluster 
will  cause  a difference  of  at  most  \/(MH  — 1)  in  a 
pseudorank  value.  Therefore,  even  in  the  presence  of 
an  unmatched  cluster,  the  correspondence  of 
matched  vectors  will  be  sufficient  to  produce  proper 
pairings.  Numerous  unmatched  clusters  may  present 
problems.  There  is,  however,  a reasonable  chance 
that  rank  errors  caused  by  unmatched  pairs  will 
average  out  to  such  a degree  that  mostly  valid 
matches  will  be  obtained.  Moreover,  the  algorithm 
will  deal  with  unmatched  clusters  more  effectively 
than  the  alternative  that  assumes  all  clusters  are 
matched. 

The  rationale  for  using  regression  to  estimate  ak 
and  bk  (eq.  (1))  is  obvious,  if  one  can  be  sure  that  all 
matches  are  genuine.  The  proposed  algorithm  can 
work  even  if  some  matches  are  spurious.  The  use  of 
spurious  matches  in  selecting  a and  /8  will  tend  to 
produce  a poor  fit  to  the  regression  line.  Thus,  the 
proposed  algorithm  will  tend  to  select  valid  pairs  in 
determining  the  actual  a and  values  used  to  esti- 
mate ak  and  bk. 

Modified  ROOSTER. — Kauth  and  Thomas 
defined  a set  of  four  orthogonal  physically  in- 
terpretable coordinate  axes  that  amount  to  a rotation 
of  Landsat  data  (ref.  5).  These  directions  correspond 
to  (1)  increasing  soil  brightness,  (2)  increasing 
vegetation  greenness,  (3)  increasing  vegetation 
yellowness,  and  (4)  a direction  called  “non-such."  In 
modified  ROOSTER  (ref.  6),  the  cluster  mean  vec- 
tors were  projected  onto  the  soil  brightness  axis 


761 


before  applying  the  seven  steps  of  the  ROOSTER 
procedure  described  previously.  This  procedure  had 
the  effect  of  causing  the  clusters  to  be  ranked  on  the 
basis  of  their  brightness. 

OSC4 R.— The  Optimal  Signature  Correction 
Algorithmic  Routine  (ref.  7)  uses  a “goodness  of  fit” 
function  to  evaluate  candidate  transformations. 
Ideally,  an  ( ak,bk ) transformation  (eq.  (I))  should 
transform  most  training  segment  cluster  mean  vec- 
tors so  that  each  nearly  equals  one  of  the  recognition 
segment  cluster  mean  vectors.  Therefore,  a reasona- 
ble measure  of  goodness  is  how  close  each  recogni- 
tion segment  cluster  mean  vector  is  to  the  closest 
transformed  training  segment  mean  vector. 

In  the  algorithm  ROOSTER,  the  goodness  func- 
tion used  is  the  sum  of  squares  of  the  differences  for 
the  best  fitting  cluster  mean  vectors.  The  ROOSTER 
algorithm  deletes  bad  fits  from  consideration  since  it 
is  unreasonable  to  expect  all  the  signatures  in  one 
segment  to  have  matches  in  the  other  segment.  The 
deletion  rule  (i.e.,  the  selection  of  a value  for  s)  is 
somewhat  arbitrary.  The  OSCAR  algorithm 
sidesteps  this  deletion  problem  by  weighting  all  po- 
tential matches  using  a negative  exponential  of  their 
goodness  of  fit.  The  distance  used  to  measure  this 
goodness  of  fit  also  makes  some  use  of  the 
covariance  structure  of  the  clusters  being  matched,  as 
in  the  Bhattacharyya  distance.  The  rationale  is 
developed  as  follows.  First,  define  the  following: 

1.  fir  nj  - training  and  recognition  segment 
subclass/cluster  mean  vectors,  respectively 

2.  2,,  1 j * training  and  recognition  segment 
cluster  covariance  matrices,  respectively 

An  assumption  is  made  that  the  recognition  seg- 
ment picture  elements  (pixels)  .v  are  multivariate 
normally  distributed.  Using  the  Mahalonobis  dis- 
tance as  a measure,  the  average  distance  from  a point 
in  this  distribution  to  the  training  segment  cluster 
mean  vector  is  determined  by 

Dt?'»rh) = (>'  - *)Th"l(y  - »i)  (11) 


where  D(y,n  hij)  is  the  Mahalonobis  distance,  and  y 
is  a point  from  the  ,'th  recognition  segment 
subclass/cluster  that  is  normally  distributed  with 
mean  n ’ and  covariance  1 j. 


The  average  value  of  this  distance  is 

02) 

where  N(>)  denotes  a normal  density.  Now,  there  ex- 
ists a nonsingular  matrix  Psuch  that 

PrZ(  lP  = If  (13) 

PTlJlP  = I>  (14) 

where  H is  a positive  definite  diagonal  matrix.  Next, 
a change  of  variables  is  made  in  equation  (12)  where 
the  new  variable  w is  defined  to  be 

w = P ‘(.V  - m') 

and  then  >■  is  defined  to  be 

y = + Pw  (15) 


The  Jacobian  of  y with  respect  to  w is  J(y)  “ |P|. 
From  equation  (14),  it  is  evident  that 


ifi-(lv'l)2  (if) 

By  substituting  equations  (13)  and  (14)  in  equation 
(12),  one  obtains 


= i2*) 2 ff y> + /v  ^)r“.  1 

J-tv7* 

• (p'j  + p*'  - Pi)e  * <lw  (17) 
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/:[z>(>,^£,)]  - (M;  - n,)^,  '(if  - *,) 

p 

+ 0 + (2 it)  2 

1 t 

--»■  w 

w‘ llwe  2 dw 

(18) 

£[°(i’^-2/)]  - W - **<)  T-i  ‘w  - *,) 

+ trace  (//)  (19) 

From  equation  (17),  it  follows  that 

PPT  = S'  (20) 

Now, 

trace  (//)  * trace  'p) 

= trace  (x, lPPT) 

* trace  (S/  'Sj)  (21) 

Thus,  by  substituting,  one  obtains 


On  the  basis  of  this  result,  the  quantity  D(J  is  used  as 
the  average  distance  between  RSEG  cluster  j and  its 
corresponding  TSEG  cluster  / with  the  usual  esti- 
mates for  fi  and  X.  For  any  candidate  transforma- 
tion. one  would  like  to  have  a good  match  for  each 


cluster.  For  a recognition  segment  cluster  j,  the 
following  is  defined. 

Gj  * min  Dy  (23) 

l 

In  computing  the  Dq,  the  candidate  transformation  is 
applied  to  each  M/  However,  X,  remains  unchanged. 

To  minimize  the  influence  of  unmatched  clusters, 
it  is  useful  to  transform  G;  to  /,  say  where  / ranges 
from  0 to  1 with  1 indicating  a perfect  fit;  further,  it  is 
desitable  that  / diminish  as  the  fit  becomes  poorer, 
with  / approaching  0 rapidly  as  Gj  becomes  fairly 
large.  The  object  is  to  minimize  the  influence  of  un- 
matched clusters.  Set 

fj  = exp^-Gy/o)  (24) 


wnere  o-  is  a user-supplied  scaling  parameter.  The 
overall  goodness  measure  is  the  sum  of  the/. 

A detailed  description  of  OSCAR  appears  in  the 
appendix.  In  summary,  the  OSCAR  method  consists 
of  four  m^jor  steps. 

Step  1.— Cluster  the  two  segments  and  determine 
rank  vectors  for  each  cluster/subclass.  The  rank  vec- 
tors are  formed  by  computing  the  rank  of  each  com- 
ponent cluster/subclass  within  its  segment. 

Step  2. — Compare  all  training  segment/ 
recognition  segment  pairs  of  rank  vectors.  Tag  all 
those  that  are  sufficiently  close  so  that  there  is  a fair 
probability  that  they  belong  to  the  same  class.  Such 
pairs  will  be  called  admissible. 

Step  3. — For  each  nonoverlapping  pair  of  admissi- 
ble pairs,  find  the  ak,bk  transformation  (eq.  (1))  that 
fits  the  pairs  of  matching  points.  Pairs  of  pairs  are 
nonoverlapping  if  all  four  clusters  are  unique.  Deter- 
mine the  goodness  measure  for  each  resulting 
transformation  that  yields  reasonable  component 
values  for  the  multiplicative  factor  ak. 

Step  4.— Rank  the  resulting  transformations  on 
the  goodness  measure  and  compute  a weighted 
average  of  the  best  candidates.  The  weighting  system 
is  based  on  the  ranks  and  goodness  measures  ob- 
tained for  the  candidate  transformations. 

Modified  OSCAR.— The  algorithm  OSCAR 
chooses  a pair  of  clusters  in  both  the  training  and  the 
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recognition  segments.  Using  these  Tour  clusters,  a 
channelwise  linear  transformation  is  computed  and 
evaluated.  The  “best"  transformation  is  chosen.  In- 
stead of  using  a pair  of  clusters  from  each  segment, 
the  modified  OSCAR  uses  a single  cluster  and  its 
projection  onto  the  soil  line.  The  steps  in  the  pro- 
cedure are  as  follows. 

Step  1.— Rotate  the  means  using  Kauth  rotation 
matrix  R (ref.  S).  (See  the  section  on  the  modified 
ROOSTER  procedure  for  an  explanation  of  the 
Kauth  transform.) 

Step  2.— Find  the  minimum  cluster-mean  values 
in  the  rotated  second  and  third  channels. 

Step  3.— Define  the  rotated  projected  vectors  by 
replacing  the  second  and  third  channels  with  the  sec- 
ond and  third  channel  minimums. 

Step  4. — Rotate  the  projected  vectors  back  into 
channel  space  using  RT. 

Step  5. — For  each  pair  of  mean  vectors  (one  in  the 
training  segment,  one  in  the  recognition  segment, 
and  both  of  which  are  greenness  vectors),  define  the 
transformation  that  maps  the  training  vector  onto 
the  projected  recognition  vector. 

Step  6. — For  each  transformation,  test  whether  it 
is  close  to  a constant  times  the  identity  matrix. 

Step  7. — For  each  transformation  that  passes  the 
test  in  step  6,  compute  the  OSCAR  function,  equa- 
tion (22). 

Step  8. — The  transformation  with  the  largest 
OSCAR  function  value  is  the  desired  transforma- 
tion. 

A detailed  description  of  the  modified  OSCAR 
algorithm  is  given  in  reference  6. 


The  MLEST  Algorithm 

Introduction. — The  MLEST  algorithm  (ref.  8)  ob- 
tains maximum-likelihood  estimates  (MLE's)  of  the 
affine  transformation  that  is  assumed  to  relate  the 
statistics  of  the  training  segment  to  those  of  the 
recognition  segment.  The  MLEST  algorithm  is  based 
on  the  following  major  assumptions. 

1.  The  training  and  recognition  segment  samples 
are  drawn  from  probability  density  functions  that  are 
mixtures  of  normally  distributed  subclasses. 

2.  The  number  of  subclasses  in  the  training  seg- 
ment is  equal  to  the  number  of  subclasses  in  the 
recognition  segment.  Training  segment  subclasses 
that  do  not  exist  in  the  recognition  segment  may  be 
represented  in  the  model  by  a priori  probabilities 
of  0. 


3.  The  training  segment  subclass  statistics  (i.e., 
means  and  covariances)  are  related  to  the  recogni- 
tion segment  subclass  statistics  by  a positive  uefinite 
affine  transformation. 

Mathematical  development. — In  the  mathematical 
development  that  follows,  it  is  assumed  'hat  the 
number  of  training  segment  subclasses  MT  equals 
the  number  of  recognition  segment  subclasses  Mr  - 
Mr\  therefore,  Mr  *•  MR  — M. 

Let  p(x\l),  i - 1,2 A/,  represent  the  prob- 

ability density  functions  for  the  training  segment 
subclasses.  Since  the  training  segment  subclasses  are 
assumed  to  be  normally  distributed. 


where  / “ 1.2 \1.  The  overall  mixture  density 

function  for  the  training  segment  is  given  by 

M 

/>(*)  = 52  (26) 

i = l 

By  assumption  3.  the  training  segment  subclass 
statistics  are  related  to  the  recognition  segment 
subclass  statistics  by  a positive  definite  affine 
transformation.  This  transformation  may  be  repre- 
sented by  the  ( p x p)  real  positive  definite  matrix  A 
and  the  (/>  x 1)  real  vector  B.  It  follows  that  the 
recognition  segment  subclass  statistics  (means  and 

covariance  matrices)  are  given  by 

4 = + B (27) 


and 

= A2r4T\i  = 1.: At  (28) 


764 


From  equations  (26)  to  (28),  it  follows  that  the  mix- 
ture density  function  for  samples  from  the  recogni- 
tion segment  is  given  by 


si 

p(y)  * £ fyiH.vIO  (29) 

/-i 


where 


§-  £ £ *(»w  M'1 

k * 1 /■  I 

•(/*  A»(  - b)  (34) 

If  “ E * 12 M <35> 


/*  in* 


(-»*■ 


• -v-e 


)'('V 


r-ln-B) 


subject  to  the  constraints 


(30) 


9,  > 0;/  * 1,  2 M 


(36) 


and  / - 1,2 A/.  Next,  suppose  that  one  picks  N 

statistically  independent  samples  from  the  recogni- 
tion segment  >•,,  .vj y„.  Then,  the  likelihood 

function  is  given  by 


M 

E ■ 1 


/«! 


(37) 


N 

l(y\  -v2 -'«)  = n /»(.»•*)  (3i> 

A *1 


The  algebra  is  simplified  considerably  if  one  uses  the 
logarithm  of  the  likelihood  function 


L - I %./ 


N 
k * 1 


(32) 


It  may  be  shown  that  the  partial  derivatives  of  L with 
respect  to  the  matrix  A,  the  vector  B,  and  the  a priori 
probabilities  qt,  i ■*  1,2,...,  A/,  are  given,  respec- 
tively, by 


•(■''A  * A»i  8)(.»'a.  - b)  t 


where  lp  is  the  (p  x p)  identity  matrix  and 


P 


TO  |0 

P[>'k) 


(38) 


The  general  MLEST  algorithm  obtains  estimates 
of  the  (p  x p)  matrix  A , the  (p  x |)  vector  B,  and 

the  a priori  probabilities  / - 1,2 A/,  that 

maximize  the  logarithmic  likelihood  function  L. 
Estimates  obtained  in  this  manner  are  called  max- 
imum-likelihood estimates. 

In  practice,  the  optimization  indicated  previously 
is  performed  by  using  the  Davidon-Fletcher-Powell 
(DFP)  constrained  optimization  program  (ref.  9). 
The  DFP  program  uses  equation  (32)  for  the  likeli- 
hood function  and  equations  (33)  through  (35)  for  its 
partial  derivations  to  modify  A . B,  and  qh  i ■»  1.2, 
....  A/,  in  a manner  such  that  L is  maximized.  In 
many  cases,  the  likelihood  function  (eq.  (32))  is  in- 
sensitive to  q,  and  therefore  q,  can  be  set  to  a con- 
stant. q,  - 1/A/,  / - 1,2 A/,  and  not  considered 

in  estimating  A and  B. 


The  UNMLE  Algorithm 


(33)  Introduction. — The  UHMLE  procedure  (refs.  10  to 

17)  obtains  estimates  of  the  recognition  segment 
subclass  statistics  (i.e.,  the  means,  the  covariances, 


765 


and  the  a priori  probabilities)  by  correcting  the  train- 
ing segment  subclass  statistics  for  small  differences 
between  the  training  and  recognition  segment 
subclass  statistics  using  an  iterative  maximum-likeli- 
hood correction  procedure.  The  UHMLE  algorithm 
is  based  on  the  following  major  assumptions. 

1.  The  training  and  recognition  segment  samples 
are  drawn  from  probability  densities  that  are  mix- 
tures of  normally  distributed  subclasses. 

2.  The  number  of  subclasses  in  the  training  seg- 
ment is  equal  to  the  number  of  subclasses  in  the 
recognition  segment.  Training  segment  subclasses 
that  do  not  exist  in  the  recognition  segment  may  be 
represented  in  the  model  by  a priori  probabilities 
of  0. 

3 Good  initial  estimates  of  the  recognition  seg- 
ment subclass  statistics  may  be  obtained  from  the 
training  segment. 

4.  The  differences  between  training  and  recogni- 
tion segment  statistics  at  the  subclass  level  are  caused 
not  only  by  haze  and  Sun  angle  differences  but  also 
by  random  differences  between  the  two  scenes  (i.e., 
differences  in  crop  growth  stages,  soil  color,  soil 
moisture,  leaf  area  index,  etc.)  which  cannot  be 
modeled  by  an  affine  transformaton.  The  parameters 
of  each  subclass  require  a correction  that  is  indepen- 
dent of  the  corrections  applied  to  the  parameters  of 
the  other  subclasses.  This  is  considered  to  be  an  im- 
portant feature  of  the  UHMLE  procedure. 

Mathematical  development. — Let  {>>*),  A:  — 1 

N,  be  an  unlabeled  sample  of  observations  from  the 
recognition  segment.  These  samples  are  assumed  to 
be  drawn  from  a mixture  of  M # populations  in  the 
RSEG.  where  each  population  is  normally  dis- 
tributed. It  is  assumed  that  the  number  of  training 
segment  subclasses  MT equals  the  number  of  RSEG 
subclasses  MK\  that  is,  MT  - MK  — M. 

Let  p(y\i),  i “ 1,2 A/,  represent  the  prob- 

ability density  functions  for  the  RSEG  subclasses. 
Since  the  RSEG  subclasses  are  assumed  to  be  nor- 
mally distributed, 


(39) 


The  overall  mixture  density  function  for  the 
RSEG  is  given  by 


M 

p(y ) * £ ‘f/PWO  (9°) 

!•  i 


The  RSEG  subclass  statistics  |<^,  n 'h  I,'),  I " 1, 
. . . , M,  are  unknown,  but  good  initial  estimates  of 

the  statistics  (#/,  i*),  / - 1 M,  are  assumed 

to  be  available  from  the  TSEG.  Therefore,  using 
unlabeled  independent  samples  • observation  from 
the  RSEG,  {>>*),  At  " 1, ....  A’,  maximum-likelihood 
estimates  of  the  subclass  statistics  | $,£),$)),/■  1 , 

. . . , M,  may  be  obtained  which  locally  maximize  the 
log  likelihood  function 


M 

L m JL  |o8e  p(y< ) <41> 

k- 1 


Clearly,  L is  a differentiable  function  of  the 
parameters  to  be  estimated.  Equating  to  0 the  partial 
derivatives  of  L with  respect  to  these  parameters, 
one  obtains,  after  a straightforward  calculation,  the 
following  necessary  conditions  for  a maximum- 
likelihood  estimate  for  subclasses  /'  — 1 M. 

% m ijH  P^U'k)  <42> 
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where 


*('M 


frW 10 

t 

/*• 


TSEG,  it  will  carry  a wheat  label  through  the  sue* 
cessive  approximations.  On  occasion,  this  successive 
approximation  procedure  may  result  in  a TSEG 
wheat  (or  nonwheat)  subclass  being  associated  with 
(45)  o subclass  of  the  opposite  category  in  the  RSEG.  This 

process  is  called  label  switching.  In  those  cases,  an 
analyst  may  be  required  to  correct  for  subclass 
mislabeling  in  the  RSEG. 


These  are  known  as  the  likelihood  equations. 

An  alternative  set  of  likelihood  equations  pro* 
posed  by  Peters  and  Walker  (ref.  10)  for  subclasses  / 
- 1 Ms  is 

$ « (1  - £ p(/|jfc)  (46) 

*“i 

£ M'K) 

H • (1  - (47) 

£ p(‘\yk) 

L p (*|.*a ) (•*  ^ )(••*  %)' 


As  shown  by  Peters  and  Walker  (refs.  10  to  12, 14, 
15,  and  18),  given  any  sufficiently  small  neighbor* 
hood  of  the  true  parameters  and  for « + 0,  the  prob- 
ability is  I that  if  N is  sufficiently  large,  there  is  a 
unique  solution  of  the  likelihood  equations  in  that 
neighborhood,  and  this  solution  is  a maximum-likeli- 
hood estimate  of  the  true  parameters. 

The  likelihood  equations,  as  written,  suggest  the 
following  iterative  procedure  for  obtaining  a solu- 
tion. Beginning  with  a set  of  starting  values  (obtained 
from  the  TSEG  subclasses),  obtain  successive  ap- 
proximations to  a solution  by  inserting  the  preceding 
approximations  in  the  expression  on  the  right-hand 
sides  of  equations  (46)  to  (48).  Category  labels  (i.e.. 
wheat  and  nonwheat),  attached  to  the  TSEG  subclass 
used  for  starting  values,  are  carried  through  the  suc- 
cessive approximations;  i.e.,  if  a starting  mean  vector 
Mf*was  computed  using  samples  labeled  wheat  in  the 


PERFORMANCE  TESTS  OF  SION ATURt 
EXTENSION  ALGORITHMS  ON  SIMULATED 
DATA  AND  CONSECUTIVE-DAY  DATA 

Comparative  tests  were  performed  on  seven  sig- 
nature extension  algorithms  to  evaluate  their  effec- 
tiveness in  correcting  for  changes  in  atmospheric 
haze  and  Sun  angle  in  a Landsat  scene  (ref.  19).  The 
evaluation  criteria  were  classification  accuracy  and 
proportion  estimation  accuracy.  The  algorithms 
tested  were  the  Maximum  Likelihood  Estimation  of 
Signature  Transformation,  the  University  of 
! louston  Maximum  Likelihood  Estimation,  the  Op- 
. mal  Signature  Correction  Algorithmic  Routine, 
•nodified  OSCAR  (MOD  OSCAR),  the  Rank  Order 
Optimal  Signature  Transformation  Estimation 
Routine,  modified  ROOSTER  (MOD  R),  and  the 
Atmospheric  Correction  program. 


Tho  Data  Sate 

Two  data  sets  were  used— one  consisting  of  simu- 
lated data  described  in  the  section  immediately 
following  and  the  other  a set  of  acquisitions  on  con- 
secutive days  described  in  the  succeeding  section. 
The  simulated  data  provided  for  a controlled  experi- 
ment in  which  the  transformations  were  known  and 
in  which  the  problems  of  nonnormal  distributions 
and  nonrepresentative  statistics  were  avoided.  The 
consecutive-day  data  set  provided  for  a test  of  the 
capability  of  the  algorithms  to  correct  for  at- 
mospheric effects  when  effects  caused  by  differences 
in  the  training  and  recognition  segment  ground 
scenes  are  eliminated.  The  algorithms  ROOSTER. 
UHMLE,  and  MLEST  were  tested  on  the  simulated 
data.  All  the  algorithms  were  tested  on  the  consecu- 
tive-day data  set. 

Simulated  data.— The  1975  data  base  of  the  Earth 
Resources  Interactive  Processing  System  (ERIPS) 
contains  four  passes  of  four-channei  simulated  data 
for  each  of  segments  429  and  432.  Each  segment  has 
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117  lines  and  each  line  196  pixels.  The  field  coordi- 
nates  reside  in  the  ERIPS  field  data  base.  Five  classes 
exist  within  each  segment:  wheat,  barley,  stubble, 
grass,  and  fallowed  ground.  Each  class  is  divided  into 
two  subclasses. 

The  data  were  generated  from  means  and 
covariance  matrices  determined  from  training  fields 
in  Hill  County,  Montana.  An  algorithm  was  used  to 
generate  multivariate  normal  data  with  the  same 
statistics.  This  was  done  separately  for  the  four 
passes  of  segment  429.  Each  pass  of  segment  432  was 
created  from  the  distributions  used  in  the  corre- 
sponding pass  of  segment  429  by  transforming  them 
with  an  affine  transformation  so  that  the  data  corre- 
sponded to  a different  Sun  angle.  Segment  429  was 
chosen  to  be  the  TSEC  and  segment  432  the  RSEG. 
All  classifications  were  made  using  four  channels 
corresponding  to  a particular  pass.  Each  data  set  cor- 
responds to  one  of  the  four  passes:  SIM l,  SIM2, 
SIM3,  and  SIM4. 

Consecutive-day  data. — Seven  sets  of  consecutive- 
day  passes  of  Landsat-1  data  from  intensive  test  sites 
in  Ellis.  Finney,  and  Saline  Counties.  Kansas,  were 
used  in  another  test.  The  first  data  set  is  denoted 
FI  709-8.  (The  F indicates  Finney  County;  1709-8  in- 
dicates the  dates  of  the  training  and  recognition 
passes,  respectively;  i.c.,  the  training  pass  was  made 
1709  days  and  the  recognition  pass  over  the  same 
segment  occurred  1708  days  after  the  launch  of 
Landsat-I.)  In  all,  four  data  sets  from  the  Finney, 
two  from  the  Saline,  and  one  from  the  Ellis  County 
test  site  were  used. 

Ground  truth  was  available  for  all  fields  in  all  test 
sites.  A subset  was  selected  for  training  fields,  and 
fields  were  grouped  into  subclasses  with  the  aid  of 
cluster  maps.  In  general,  the  rectangular  ground-ob- 
served areas  were  not  oriented  so  that  their  sides 
were  parallel  to  the  scan  lines  in  the  Landsat-1  data. 
To  facilitate  the  application  of  the  various 
algorithms,  a "signature  extension  area"  was  defined 
as  the  smallest  rectangular  area  with  sides  parallel 
and  perpendicular  to  the  Landsat  scan  lines  that  in- 
cluded the  ground-observed  area  in  each  case.  For 
Finney  County,  this  included  the  entire  9-  by  11- 
kilometer  segment  (117  lines.  196  pixels)  containing 
the  ground-observed  area.  For  Saline  County,  it  in- 
cluded lines  26  to  91  and  pixels  27  to  146;  for  Ellis,  it 
included  lines  24  to  109  and  pixels  49  to  144. 


Approach 

The  overall  approach  was  to  make  signature  ex- 
tension runs  u;ing  the  algorithms  being  tested  and  to 
compare  the  results  with  local  classification  results 
with  ground  truth.  Local  classification  results  include 
the  PCC  results  and  the  wheat  proportion  estimates 
obtained  from  classification  of  the  recognition  seg- 
ment (or  pass)  with  statistics  generated  from  the 
same  segment  (or  pass).  The  algorithms  were  to  pro- 
vide modified  training  statistics,  which  then  were 
used  to  classify  the  recognition  area.  The  UHMLE 
computes  these  modified  statistics  directly;  all  the 
other  algorithms  compute  an  affine  transformation, 
which  is  then  used  to  modify  the  training  statistics. 

The  algorithms  tested. — A short  description  of  the 
algorithms  and  how  they  were  operated  in  this  test  is 
provided  here.  Detailed  descriptions  of  the 
algorithms  are  provided  elsewhere  in  this  paper.  In 
the  case  of  the  consecutive-day  data,  the  algorithms 
were  usually  run  using  the  data  from  the  signature 
extension  area  already  defined.  Exceptions  will  be 
noted. 

MLEST:  The  MLEST  technique  uses  an  iterative 
gradient  optimization  procedure  (the  Davidon- 
Fletcher- Powell  algorithm)  to  obtain  maximum- 
likelihood  estimates  for  the  affine  transformation 
assumed  to  relate  the  training  and  recognition 
statistics.  The  training  subclass  a priori  probabilities 
and  statistics  are  input  to  the  program,  which  outputs 
the  maximum-likelihood  estimate  of  the  affine 
transformation 

UHMLE:  The  UHMLE  lakes  subclass  statistics 
from  a TSEG  and  image  data  from  an  RSEG  and 
computes  maximum-likelihood  estimates  of  subclass 
proportions  and  statistics  for  the  RSEG.  Two  ver- 
sions of  UHMLE  were  used.  The  first.  Ull  all.  uses 
the  ground-observed  area  as  input  data;  when  this 
version  is  used  to  obtain  maximum-likelihood  esti- 
mates of  proportions  generated  internally  by 
UHMLE.  it  is  referred  to  as  UH  ali  MLE  The  sec- 
ond. UH  fields,  uses  only  the  training  fields  within 
the  RSEG;  when  this  version  is  used  to  obtain  max- 
imum-likelihood estimates  of  proportions  generated 
inter  ulty  by  UHMLE.  it  is  referred  to  as  UH  fields 
MLL  7:ic  second  version  was  introduced  to  elimi- 
nate the  effect  of  insufficient  training.  The  statistics 
generated  by  UHMLE  arc  used  to  classify  the 
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ground-observed  area  in  the  RSEO. 

OSCAR:  The  OSCAR  considers  every  possible 
transformation  defined  by  four  cluster  means:  two  ii. 
the  TSEG  and  two  in  the  RSEG.  From  these 
transformations,  the  algorithm  selects  those  that  are 
"best"  able  to  match  the  training  clusters  with  the 
recognition  clusters.  The  amount  of  computation  is 
kept  to  a manageable  level  (I)  by  rejecting  pairings 
judged  to  be  unreasonable  on  the  basis  of  rankings 
and  (2)  by  testing  the  remaining  transformations, 
using  each  to  transform  all  the  training  clusters,  and 
calculating  a goodness-of-ftt  measure  based  on  the 
distance  of  the  transformed  training  clusters  from 
the  recognition  clusters.  The  five  transformations 
giving  the  “best"  fit  are  then  averaged. 

Modified  OSCAR.  The  MOD  OSCAR,  in  effect, 
defines  a transformation  for  each  pair  of  clusters  — 
one  in  the  PSEG  r.id  one  in  the  TSEG.  Each  cluster 
is  used  with  its  projection  onto  the  soil  line1  to  define 
a transformation.  The  transformations  are  evaluated 
as  in  OSCAR,  and  the  best  transformation  is  output. 

ROOSTER:  To  perform  signature  extension  with 
ROOSTER  (ref.  3).  one  first  obtains  a set  of  class 
means  for  the  TSEG's  and  the  RSEG's  These  class 
means  are  obtained  by  clustering  or  by  deriving  class 
statistics  from  training  fields. 

The  first  step  is  to  derive  rank  vectors  correspond- 
ing to  each  of  the  class  means.  These  rank  vectors  are 
obtained  by  computing  for  each  channel  the  rank  of 
each  mean  relative  to  the  others  for  that  segment. 
The  rank  vectors  are  used  to  match  the  classes  (or 
clusters)  in  the  training  area  with  those  in  the  recog- 
nition area.  Then,  a regression  analysis  is  used  to 
determine  the  affine  transformation  that  best 
transforms  the  mean  vectors  .'rom  the  training  area 
into  the  corresponding  mean  vectors  from  the  recog- 
nition area. 

In  this  study,  the  ROOSTER  was  used  in  three 
different  ways.  The  first,  R(C),  consisted  of  using 
clusters  to  define  the  class  means  for  both  segments 
the  second.  R(S),  used  subclass  means  derived  from 
training  fields  for  both  segments  (it  is  expected  to 
provide  an  estimate  of  how  well  ROOSTER  would 
do  if  an  ideal  clustering  algorithm  were  available); 


* The  toil  line  i*  the  ‘'bottom  of  the  uueled  cup"  or  that  part  of 
channel  space  containtns  bare  soil  (ref  5) 


and  the  third,  R(S/C),  used  subclan  statistics  for  the 
TSEG  and  clusters  for  the  RSEG  (this  is  an  alternate 
way  of  using  ROOSTER  operationally,  since 
subclan  statistics  are  always  available  for  the  train- 
ing area). 

Modified  ROOSTER:  The  MOD  R is  identical  to 
ROOSTER  except  that  the  regreuion  line  it  com- 
puted with  the  cluster  means  and  the  projections  of 
the  cluster  means  onto  the  soil  line. 

ATCOR:  The  ATCOR  program  is  designed  to 
correct  for  differences  in  haze  level  and  Sun  angle 
between  the  training  and  recognition  data  sets.  The 
program  processes  each  of  these  data  sets  separately. 
In  each  case,  the  input  is  the  Landsat-I  data  and  the 
solar  zenith  angle.  The  ATCOR  program  determines 
the  haze  level  from  the  brightness  of  certain  dark 
targets  in  the  scene  and  uses  an  atmospheric  model 
to  calculate  a set  of  coefficients  relating  the  Landsat 
data  for  that  scene  to  the  reflectance  of  the  targets  on 
the  ground.  The  coefficients  obtained  from  the  train- 
ing and  recognition  data  sets  are  then  used  to  com- 
pute the  affine  transformation  to  be  applied  to  the 
training  data  to  transform  ib  to  the  observing  condi- 
tions of  the  recognition  segment. 

RF.GRES:  Rather  than  a signature  extension 
algorithm,  the  REGRES  program  is  a method  for 
finding  the  optimum  affine  transformation  to  be  ap- 
plied to  the  statistics  of  the  consecutive-day  data,  in 
each  channel,  a scatter  plot  is  made  of  the  second-day 
data  versus  the  first-Jay  data.  A straight  line  is  then 
fitted  to  the  data  which  minimizes,  in  the  least 
squares  sense,  the  perpendicular  distance  from  the 
points  to  the  line,  in  principle,  this  line  represents 
the  best  affine  transformation  for  the  training 
statistics. 

Class./lcaiton  and  evaluation. — After  obtaining  the 
modified  statistics,  the  standard  LACIE  classifica- 
tion procedure  was  implemented  on  a Univac  1108 
computer  as  part  of  the  program  EOD-LARSYS  to 
classify  the  RSEG’s.  A two-class  classifier  was  used 
with  equal  a priori  probabilities  for  wheat  and  non- 
wheat. Within  each  class,  the  subclasses  had  equal  a 
priori  probabilities.  A 1 -percent  chi-squared 
threshold  was  used  at  the  subclass  level  to  reject  out- 
liers. For  the  simulated  data,  entire  areas  were 
classified;  for  the  consecutive-day  data,  the  ground- 
observed  areas  were  classified. 

Classification  accuracy:  The  classification  ac- 
curacy was  determined  for  wheat  and  nenwfccat  by 
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using  the  training  fields  previously  defined  as  test 
fields.  From  these,  the  overall  accuracy  was  com- 
puted. 

Overall  accuracy  * qwp(wlw)  + <J^p(0/0)  (49) 

where  p(wfw)  is  the  wheat  accuracy,  />(<p/0)  is  the 
nonwheat  accuracy,  qw  is  the  wheat  proportion  in 
ground-observed  area,  and  fy  is  the  nonwheat  pra 
portion  in  ground-observed  area.  The  proportions  qw 
and  flu  were  known  from  ground  truth.  The  wheat, 
nonwneat,  and  overall  accuracies  were  compared 
with  the  results  obtained  from  local  classification. 

Wheat  proportions:  The  classification  results 
yielded  wheat  proportions  for  the  ground-observed 
areas.  In  addition,  the  UHMLE  program  yielded  a 
maximum-likelihood  estimate  of  the  wheat  propor- 
tions. These  results  were  compared  with  the  ground- 
observed  proportions  and  the  results  obtained  from 
local  classification. 


Results 

The  results  of  this  processing  are  given  in  tables  II 
through  XII.  Table  11  gives  the  a^and  k ■ 1, . . . , 
4,  coefficients  determined  for  the  consecutive-day 
data  by  those  algorithms  that  produce  an  affine 
transformation.  The  algorithms  are  listed  in  the 
order  in  which  they  performed  in  the  accuracy  test. 

On  the  basis  of  numerical  calculations  using  an  at- 
mospheric model  (ref.  1),  certain  constraints  are  ex- 
pected to  apply  to  the  ak  and  bk  coefficients  corre- 
sponding to  a change  in  the  haze  level.  These  should 
apply  to  the  consecutive-day  data  if  the  haze  levels 
present  are  uniform.  Among  these  constraints, 
which  apply  to  all  channels,  are  the  following. 

1.  If  there  is  no  difference  in  haze  level  between 

the  TSEG  and  the  RSEG,  ak  — 1.0  and  bk  — 0.0,  k — 
1 4. 

2.  If  the  TSEG  has  more  haze  than  the  RSEG, 
ak>\.Qm&bk  <0.0,k  — 1,...,4. 

3.  If  the  TSEG  has  less  haze  than  the  RSEG,  ak  < 
1.0 and  bk  >0.0 ,km  1,. . . ,4. 

In  many  cases,  the  data  in  table  II  do  not  obey 
these  rules.  Examples  can  be  found  in  the  following 
anomalies. 

1.  ak  > 1.0  for  some  channels  and  ak  < 1.0  for 
others;  e.g.,  R(S)  for  F1655-4. 


2.  ak  > 1.0  and  bk  > 0.0;  e.g.,  MLEST  for 
FI 673-2. 

3.  ak<  1.0  and  bk  < 0.0;  e.g.,  R(C)  for  F1726-7, 
channel  2. 

These  failures  to  obey  the  constraints  may  be  due  in 
part  to  nonuniform  haze  levels  in  the  data  and  to 
changes  in  the  look  angle  across  the  scene. 

Tables  III  to  VI  give  the  accuracy  results  for  wheat 
and  nonwheat  using  both  data  sets.  The  accuracy  ob- 
tained with  signature  extension  is  expressed  as  a per- 
centage difference  from  the  local  result;  i.e., 

Pcu-rnljft-  Jiffciciuv  - ~ ^JSTPSL  x 100 

local  accuracy 

(50) 

Tables  VII  and  VIII  give  similar  results  for  overall 
accuracy. 

Tables  IX  through  XII  give  the  differences  for 
both  data  sets  (1)  between  results  obtained  using  sig- 
nature extension  and  local  classification  and  (2)  be- 
tween results  obtained  using  signature  extension  and 
ground  truth.  The  means  and  standard  deviations 
were  obtained  using  the  absolute  values  of  the  num- 
bers in  the  tables. 


Analysis 

This  section  reports  a statistical  analysis  of  the 
data  in  tables  VII  and  XI.  Data  for  the  UHMLE 
algorithm  were  not  included  because  of  their  large 
variances. 

First,  an  analysis  of  variance  was  performed  on 
the  data  in  table  VII.  The  purpose  of  an  analysis  of 
variance  is  to  separate  a response  variable  into  com- 
ponent parts.  In  this  way,  the  test  for  a particular  fac- 
tor will  become  more  sensitive  because  variations 
due  to  other  causes  have  been  removed.  In  this  ex- 
periment, two  factors  were  present:  signature  exten- 
sion algorithms  and  the  seven  consecutive-day  ac- 
quisitions. The  variation  over  the  seven  data  sets 
could  have  been  allocated  to  any  of  several  different 
causes:  ground  scene  variations  from  day  to  day, 
variations  from  one  geographic  location  to  another, 
or  changes  in  the  haze  level  from  day  to  day. 

The  last  alternative  was  chosen.  Each  pass  was 
classified  as  either  clear  or  hazy  by  visually  inspect- 
ing the  images  of  the  data  produced  by  ERIPS.  The 
results  are  shown  in  table  XIII.  Three  TSEG-RSEG 
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combinations  occurred;  namely,  haze-clear,  dear- 
haze,  and  dear-dear.  It  was  assumed  that  each  com- 
bination would  produce  different  results  (classifica- 
tion accuracy,  etc.),  thus  the  need  for  this  fauor  in 
the  analysis. 

An  interaction  between  the  algorithms  and  haze 
combinations  (A  x H)  was  also  expected  to  be  pres- 
ent; that  is,  one  algorithm  might  have  performed 
well  for  the  clear-haze  consecutive-day  acquisitions 
and  poorly  for  the  haze-clear  days,  whereas  the  op- 
posite results  might  have  occurred  for  another 
algorithm. 

The  model  for  the  experiment  was 

% " **  + «t  * kf  * athf  * eUk  <S|) 

where  is  the  response  variable,  p is  the  overall 
mean,  a,  is  the  contribution  of  the  Ah  algorithm,  hj  is 
the  contribution  of  theyth  haze  level,  is  the  con- 
tribution of  algorithm  / and  haze  level  j to  the  interac- 
tion, and  e^k  is  the  error  term  for  the  Alh  observation 
for  the  Ah  algorithm  and  the  yth  haze  level.  In  the 
analysis  of  variance  for  overall  accuracy,  is  the 
percentage  accuracy  difference;  that  is,  the  quantity 
given  in  table  VII.  The  results  of  this  analysis  of 
variance  are  given  in  table  XIV,  where  significant 
differences  between  the  algorithms  and  between  the 
haze  conditions  are  apparent. 

Table  XV  gives  the  average  accuracy  difference 
over  the  algorithms  for  each  set  of  haze  conditions. 
Because  the  analysis  of  variance  indicated  significant 
differences  between  haze  conditions,  one  can  infer 
from  table  XV  that  the  presence  of  haze  over  the 
TSEG  is  significantly  different  from  the  other  two 
conditions. 

The  results  for  the  different  haze  conditions  were 
plotted  as  a function  of  the  algorithms  (fig.  1).  The 
condition  with  haze  over  the  TSEG  showed  consis- 
tently better  results  than  the  other  two  conditions.  A 
similar  analysis  was  performed  for  the  wheat  propor- 
tion errors  reported  in  table  XI;  R(S/C)  was  not  in- 
cluded because  of  its  large  variance.  The  results 
showed  a significant  difference  between  the  haze 
conditions  but  not  between  the  algorithms  (table 
XVI).  Table  XVII  gives  the  average  proportion 
difference  over  the  algorithms  for  each  haze  condi- 
tion, and  figure  2 shows  the  performance  of  each 
algorithm  for  each  of  the  three  haze  conditions.  Here 
again,  the  haze-clear  condition  seemed  to  give  the 
best  results. 


Conohmlom 

The  results  of  these  tests  are  summarized  in  table 
XVIII.  The  first  two  columns  list  the  algorithms  in 
the  order  in  which  they  performed  on  the  accuracy 
test  for  the  simulated  and  consecutive-day  data.  The 
numbers  given  are  the  mean  percentage  differences 
(see  tables  VU  and  VIII).  The  minus  signs  indicate 
that  the  algorithm  was  less  accurate  than  local 
classification.  A statistical  analysis  was  performed 
on  the  accuracy  results  for  the  consecutive-day  data 
with  the  exception  of  data  for  the  three  versions  of 
UHMLE  (which  were  omitted  because  of  large 
variances).  The  analysis  indicated  (I)  that  there  were 
no  significant  differences  among  the  algorithms  and 
(2)  that  the  results  obtained  when  the  TSEG  ap- 
peared hazier  than  the  RSEG  were  better  than  in  the 
other  two  conditions  observed;  i.e.,  when  both  were 
clear  or  the  RSEG  was  hazier. 

The  comparison  of  wheat  proportion  differences 
(between  ground  truth  and  proportions  provided  by 
algorithms  and  between  local  results  and  results 
from  algorithms)  in  the  last  four  columns  of  table 
XVIII  shows  the  performance  order  of  the 
algorithms  to  be  nearly  the  same  for  simulated  data 
but  quite  different  for  consecutive-day  data.  Tuis  was 
because  local  proportion  estimates  were  quite 
different  from  ground-observed  proportions  for  the 
consecutive-day  data.  These  four  columns  of  num- 
bers are  the  means  of  the  absolute  values  of  the 
differences  as  given  in  tables  IX  to  XII.  A statistical 
analysis  was  performed  on  the  consecutive-day  data 
for  wheat  proportion  differences  from  local  results. 
Data  from  R(S/C)  and  the  three  versions  of 
UHMLE  were  not  used  because  of  large  variances. 
The  results  given  in  table  XVI  indicate  no  significant 
differences  among  the  algorithms  tested.  Here  again, 
the  best  results  were  obtained  when  the  TSEG  ap- 
peared hazier  than  the  RSEG. 

Finally,  it  must  be  mentioned  that,  because  of 
time  limitations,  this  test  was  performed  using  the 
currently  available  algorithms.  Subsequently,  it  has 
been  discovered  that  some  of  the  algorithms  show 
better  performance  when  later  versions  are  used.  For 
example,  the  program  UHMLE  has  a later  version 
that  uses  a transformation  of  the  form  (x  + b)  to  get 
from  the  TSEG  statistics  an  initial  guess  for  the 
RSEG  statistics.  However,  the  results  presented  in 
this  paper  provide  some  indication  of  how  the 
algorithms  tested  can  be  expected  to  work  when  they 
are  applied  to  the  signature  extension  problem. 


771 


GEOGRAPHICAL  EXTENSION  TESTS  OF  THE 
ATCOR,  OSCAR,  AND  MUST  SIGNATURE 
EXTENSION  ALGORITHMS 

Geographical  tests  were  performed  on  the  AT* 
COR,  OSCAR,  and  MLEST  signature  extension 
algorithms.  The  objectives  of  the  tests  were  to  evalu- 
ate these  algorithms  in  a more  realistic  environment 
using  LACIE  image  data  and  training  field  defini* 
lions  provided  by  analyst  interpreters.  This  is  a more 
realistic  test  than  the  consecutive-day  test  and  the 
simulated  data  test  described  in  the  preceding  sec* 
tion.  All  three  algorithms  were  tested  on  the  same 
data  set,  but  ATCOR  and  OSCAR  were  tested  sepa- 
rately from  MLEST. 


Geographical  Extension  Data  Sat 

The  1975-76  crop  year  operational  data  taken  over 
Kansas,  with  associated  labeled  fields,  were  chosen 
for  use  in  these  tests.  This  set  consisted  of  28  LACIE 
training  and  recognition  segment  pairs,  with  approx- 
imately 7 segment  pairs  in  each  of  the  4 wheat 
biowindows.  Recognition  segments  and  training  seg- 
ments were  paired  so  as  to  minimize  differences  be- 
tween the  segments  due  to  static  factors  in  the  scenes 
such  as  soil  types,  climate,  and  topography. 


Tho  Geographical  Signature  Extenaion 
Teat  of  ATCOR  and  OSCAR 

The  objectives  of  the  ATCOR  and  OSCAR  test2*3 
were  as  follows. 

1.  To  compare  the  performance  of  the  signature 
extension  algorithms  on  the  recognition  segment  to 
that  attained  using  (a)  local  training  signatures  from 
the  recognition  segment  and  (b)  untransformed  sig- 
natures from  the  training  segment 

2.  To  relate  performance  to  the  biowindow  in 
which  the  recognition  and  training  segments  were 
acquired 

3.  To  evaluate  the  effects  of  clustering  on  the  per- 
formance of  the  OSCAR  algorithm 


^S.  G.  Wheeler.  “Signature  Extension  Experiment."  IBM 
Memorandum  IBM-RES-23-9.  Contract  NAS  9-14350,  June  3. 
1976. 

*S.  G.  Wheeler,  “Results  of  the  Signature  Extension  Experi- 
ment," IBM  Memorandum  IBM-RES-23-!  1,  Contract  NAS 
9-14350.  July  29.  1976. 


4.  To  evaluate  the  quality  measures  used  in  the 
algorithms  as  predictors  of  signature  extension  per- 
formance 

Description  $ the  experiment. — Five  classification' 
runs  were  performed  on  each  TSEG  and  RSEG  pair, 
as  follows. 

1.  Local— using  statistics  from  the  recognition 
segment’s  training  subclasses 

2.  Untransformed— using  statistics  from  the 
training  segment’s  training  subclasses 

3.  OSCAR  with  clusters— using  training  segment 
statistics  after  applying  an  affine  transformation 
derived  by  the  OSCAR  algorithm  operating  on 
duster  statistics  in  the  training  and  recognition  seg- 
ments 

4.  ATCOR— using  training  segment  statistics 
after  applying  an  affine  transformation  derived  by 
the  ATCOR  algorithm 

5.  OSCAR  with  subclasses— using  training  seg- 
ment statistics  after  applying  an  affine  transforma- 
tion derived  by  the  OSCAR  algorithm  operating  on 
statistics  from  the  AI-defined  subclasses  in  the  recog- 
nition and  training  segments 

Response  variables  for  the  experiment  were 

1.  Estimated  percentage  of  wheat  in  the  recogni- 
tion segments  at  0-percent  and  1 -percent  thres- 
holding 

2.  Observed  classification  accuracies  in  the  test 
fields  defined  by  the  AI's  in  the  recognition  segment 

3.  Percentage  of  pixels  thresholded  in  the  recogni- 
tion segment  and  in  the  test  fields  at  the  l-percent 
thresholding  level 

The  percentage  of  wheat  in  the  segment  was 
calculated  by  counting  the  number  of  pixels 
classified  as  wheat  in  the  segment,  subtracting  the 
number  of  pixels  classified  as  wheat  within  desig- 
nated “other"  fields,  and  then  dividing  by  22  932  (the 
total  number  of  pixels  in  a LACIE  segment). 
Thresholding  in  the  segment  was  calculated  in  a 
similar  manner.  The  OSCAR  algorithm  produces  a 
quality  factor  that  measures  the  closeness  with 
which  the  transformed  signatures  match  the  recogni- 
tion segment's  cluster-based  (or  subclass-based)  sig- 
natures. Also,  the  ATCOR  algorithm  produces  an 
estimate  of  the  haze  levels  in  the  recognition  and 
training  segments. 

A TCOR  and  OSCAR  geographical  signature  exten~ 
sion  results. — Results  of  the  ATCOR  and  OSCAR 
geographical  signature  extension  tests  are  presented 
in  the  following  paragraphs. 

Percentage  of  wheat  in  the  segment:  The  averages 
of  the  estimated  values  of  the  percentage  of  wheat  in 
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the  segments  at  O-percent  and  1 -percent  thresholding 
are  given  in  table  XIX.  The  same  quantities  are 
given,  by  biowindow,  in  tables  XX  to  XXII. 
(Differences  are  caused  by  the  fact  that  the  category 
OSCAR  with  subclasses  could  not  be  run  on  one  seg- 
ment because  pf  the  small  number  of  subclasses  in 
the  segment.  Values  in  tables  XX  to  XXII  do  not  in- 
clude that  segment.)  The  standard  errors  (i.e.,  stan- 
dard deviations  of  the  average  values)  can  be  used  to 
place  confidence  limits  on  the  mean  values  but  are 
not  appropriate  for  testing  for  differences  among  the 
mean  values.  Significance  tests,  using  the  Fisher's  F- 
factor,  were  performed,  and  appropriate  standard  er- 
rors that  reflect  the  structure  of  the  experiment  were 
calculated. 

Test  statistics  indicated  that  there  were  no  signifi- 
cant differences  among  the  estimates  of  the  percen- 
tage of  wheat  in  the  segment  by  local  classification  or 
the  four  signature  extension  techniques,  no  signifi- 
cant differences  among  the  estimated  percentage  of 
wheat  in  the  different  biowindows,  and  no  signifi- 
cant interaction  between  the  classification  tech- 
niques and  the  biowindows.  The  lack  of  interaction 
suggests  that,  within  the  limits  of  experimental  error, 
all  the  techniques  work  equally  well,  or  poorly,  in  all 
biowindows. 

The  average  values  and  standard  deviations  of  the 
percentages  of  the  pixels  thresholded  while  classify- 
ing the  recognition  segment  using  statistics  derived 
by  the  different  techniques  are  given  in  table  XXII. 
These  tabulated  values  exhibit  large  differences 
among  the  techniques.  However,  an  analysis  of 
variance  test  indicated  there  were  no  significant 
differences  among  the  classification  techniques. 

The  previously  described  tests  showed  little 
difference  between  local  classification  in  the  recogni- 
tion segment  and  the  signature  extension  techniques 
except  for  classification  of  the  RSEG  with 
untransformed  TSEG  statistics,  a technique  that 
caused  undue  thresholding  in  the  segment.  Figures  3 
through  6 are  scatter  plots  of  signature  extension  ver- 
sus local  estimates  of  percentage  of  wheat  in  the  seg- 
ments. (The  solid  line  in  these  plots  is  X - Y and  the 
dashed  line  is  the  regression  line  Y * fiX)  given  in 
the  legend.)  These  plots  show  a large  segment-to-seg- 
ment  variability  between  local  and  signature  exten- 
sion estimates  and,  consequently,  a small  (=0.4)  cor- 
relation between  the  estimates.  Figure  7,  a plot  of 
ATCOR  versus  OSCAR  estimates  of  percentage  of 
wheat,  shows  a surprisingly  high  correlation  between 
these  estimates.  This  strong  relationship  implies 
that,  despite  their  very  different  approaches  toward 


solving  the  signature  extension  problem,  the  tech- 
niques produce  basically  the  same  result  and  a large 
experiment  would  be  required  in  order  to  choose  one 
over  the  other. 

Algorithm  haze  and  quality  factors:  Table  XXIII 
gives  simple  correlation  coefficients  relating  the 
algorithm  quality  factors  and  ATCOR  estimates  of 
the  differences  between  the  haze  levels  in  the  recog- 
nition and  training  segments  to  the  percentage  of 
wheat  estimates.  As  would  be  hoped,  the  threshold- 
ing produced  by  using  untransformed  training  sig- 
natures is  highly  correlated  with  the  change  in  haze 
measurements  predicted  by  ATCOR.  Figures  8 
through  12  show  plots  of  the  significant  factors  in 
table  XXIII  versus  the  difference  in  haze  levels 
between  the  recognition  and  training  segments  as 
predicted  by  ATCOR. 

Since  the  performance  of  the  cluster-based 
OSCAR  algorithm  does  not  correlate  with  any  of  the 
quality  factors,  these  would  not  appear  to  be  useful 
predictors  of  algorithm  performance.  The  quality 
factor  produced  by  using  OSCAR  v/ith  subclasses 
does  correlate  with  some  of  the  thresholding  rates 
but  these  correlations  could  not  generally  be  used  in 
a signature  extension  situation. 

Training  field  accuracies:  Tables  XXIV  and  XXV 
contain  average  training  field  accuracies  for,  respec- 
tively. the  wheat  and  nonwheat  training  fields  when 
classified  without  thresholding.  F-tests  in  the 
analysis  of  variance  with  subsequent  multiple  com- 
parison tests  show  significant  differences  between 
local  classification  and  the  signature  extension  tech- 
niques but  no  differences  among  the  techniques. 
Similar  results  hold  for  the  training  field  accuracies  at 
1 -percent  thresholding  as  shown  in  tables  XXVI  and 
XXVII.  Each  of  these  analysis  of  variance  tests  sug- 
gests that  training  field  accuracies  may  be  dependent 
upon  the  biowindow,  but,  as  in  the  percentage  of 
wheat  estimates,  there  are  no  interactions  between 
biowindows  and  algorithms. 

Thresholding  rates  in  the  training  fields,  tables 
XXVIII  and  XXIX,  show  large  differences  between 
local  classification  and  the  signature  extension  tech- 
niques, particularly  the  untransformed  technique. 
Both  the  average  and  the  variability  in  the  threshold- 
ing rates  are  much  larger  for  the  signature  extension 
techniques  than  for  local  classification.  There  are, 
however,  no  significant  differences  among  the 
results  from  ATCOR  and  the  two  OSCAR 
algorithms. 

Conclusions  from  the  ATCOR  and  OSCAR 
geographical  signature  extension  tests:  In  terms  of 
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average  estimates  of  the  percentage  of  wheat  in  the 
segment,  there  is  little  evidence  for  choosing  local 
classification  or  one  of  the  four  signature  extension 
techniques  over  the  others,  except  for  the 
untransformed  approach,  which  shows  a tendency  to 
produce  far  too  many  thresholded  pixels. 

Local  classification  produces  better  training  field 
accuracies  than  any  of  the  signature  extension 
algorithms,  particularly  untransformed.  Differences 
between  local  classification  and  the  algorithms  may 
not  be  real  since  the  observed  differences  probably 
can  be  explained  by  the  known  bias  in  local 
classification  due  to  reclassifying  training  data.  The 
untransformed  approach,  however,  appears  to  do 
more  poorly  than  the  others. 

There  is  little  evidence  in  any  of  the  data  to  sug- 
gest any  difference  between  the  performance  of 
OSCAR  with  clusters  and  OSCAR  with  subclasses.  It 
thus  appears  that  clustering  neither  helps  nor 
degrades  OSCAR's  performance  appreciably. 

Probably  the  most  important  and  startling  result 
of  this  analysis  is  the  demonstrated  high  correlation 
between  results  from  ATCOR  and  OSCAR.  This 
shows  that  these,  despite  being  completely  different 
in  concept,  produce  very  similar  results  and  only  an 
exceptionally  large  experiment  could  establish  a 
choice  between  them. 


The  Geographical  Signature  Extension 
Test  of  MLEST 

The  objectives  of  the  MLEST  test  (refs.  20  and 
21)  were  as  follows. 

1.  To  compare  the  performance  of  the  MLEST 
signature  extension  algorithm  on  the  recognition  seg- 
ment to  that  attained  using  (a)  local  training  sig- 
natures from  the  recognition  segment  and  (b) 
untransformed  signatures  from  the  training  segment 

2.  To  relate  performance  to  the  biowindow  in 
which  the  recognition  and  training  segments  were 
acquired 

Description  of  the  experiment. — The  experiment 
was  performed  using  the  same  Kansas  data  set  used 
in  the  OSCAR  and  ATCOR  test.  One  of  the  28  train- 
ing/recognition segment  pairs  used  in  the 
OSCAR/ATCOR  test  had  significant  data  dropout, 
which  MLEST  was  not  capable  of  processing.  This 
TSEG/RSEG  pair  was  dropped,  and  the  MLEST  test 
was  performed  on  the  remaining  27  TSEG/RSEG 
pairs. 


The  evaluation  procedure  for  the  27  TSEG/RSEG 
pairs  consisted  of  two  mtyor  steps.  In  step  l,  MLEST 
signature  extension  runs  were  made  for  each  seg- 
ment pair  to  determine  MLE  estimates  for  each  A 
matrix  and  B vector.  In  all  of  these  runs,  the  in- 
dividual subclass  a priori  probabilities  were  assumed 
equal  and  held  constant.  The  MLEST  iteration  was 
begun  with  A being  the  identity  matrix  and  B being 
the  null  vector.  Also,  A was  restricted  to  be  diagonal 
in  all  runs.  In  step  2,  the  affine  transformed  training 
segment  signatures  were  used  to  classify  each  recog- 
nition segment  using  the  LACIE  maximum-likeli- 
hood classifier.  Classification  accuracies  were  com- 
puted for  wheat/nonwheat  over  recognition  segment 
training  fields.  Overall  classification  accuracies  were 
computed  for  each  recognition  segment  using  the 
formula 


^overall  ” °'5(/>Cw)+  0-5  (*CNw) 


where  /*cw  is  the  wheat  classification  accuracy  and 
PCNw  >s  the  nonwheat  classification  accuracy. 

The  classification  results  were  used  to  estimate 
wheat  proportions  at  threshold  values  of  0 percent 
and  1 percent.  The  classification  runs  described  pre- 
viously were  repeated  using  untransformed  statistics 
from  the  training  segment  as  well  as  statistics  from 
the  recognition  segment  training  fields.  Henceforth, 
the  affine  transformed  classification  results  will  be 
referred  to  as  MLEST  results;  the  untransformed 
training  segment  classification  results  will  be  re- 
ferred to  as  UT  results;  and  the  recognition  segment 
classification  results  will  be  referred  to  as  LOCAL 
results. 

MLEST  geographical  signature  extension  results. — 
The  MLEST  program  converged  normally  for  23  of 
the  27  signature  extension  runs  attempted.  However, 
successful  optimization  iteration  sequences  could  not 
be  established  for  four  segment  pairs.  Analysis  of  the 
data  for  these  four  segment  pairs  revealed  that  the 
recognition  segment  data  were  located  relatively  far 
from  the  modes  of  the  corresponding  initial  esti- 
mates (A  =*  l,&  *=  0)  for  the  training  segment  mix- 
ture density  functions  in  spectral  space.  This  resulted 
in  floating-point  underflow  problems  in  the  likeli- 
hood function  computations,  which  in  turn  caused 
the  Davidon  optimization  iterations  to  abort.  The 
MLEST  program  was  rerun  for  these  four  segment 
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pairs  using  the  following  initial  values  for  the  affine 
transformation 


A - / (S3) 


where  / — a 4 x 4 identity  matrix 

ft/  — mean  value  in  channel  i for  the  training 
segment,  i — 1,2, 3, 4 

Ik]  — mean  value  in  channel  / for  the  recog- 
nition segment,  / — 1, 2, 3, 4 
In  other  words,  a mean  level  adjustment  (MLA)  was 
used  for  the  initial  B vector.  The  reruns  were  suc- 
cessful, resulting  in  normal  convergence  for  all  four 
segment  pairs. 

Table  XXX  enumerates  the  classification  ac- 
curacy results  obtained  with  the  geographical  exten- 
sion data  set.  Table  XXXI  lists,  by  biowindow,  the 
average  improvement  in  MLEST  classification  ac- 
curacy over  UT  accuracy  and  the  average  difference 
between  MLEST  classification  accuracy  and  LOCAL 
classification  accuracy.  The  average  improvement 
and  average  difference  percentages  are  defined  as 


Average  improvement  * Avg  (/*Mlest  ~ ^ut) 

(55) 

Average  difference  = Avg(/»L0CAL  - PML EST) 

(56) 


where  /Vilest  is  the  MLEST  classification  accuracy, 
Put  is  the  UT  classification  accuracy,  and  /local >s 
the  LOCAL  classification  accuracy. 

Referring  to  tables  XXX  and  XXXI,  one  can 
make  the  following  observations. 

1 . The  MLEST  classification  accuracies  improved 
upon  UT  classification  accuracies  for  a majority  of 
the  signature  extension  segment  pairs.  Improve- 


ments in  overall  classification  accuracy  are  indicated 
for  22  of  the  27  segment  pairs.  Improvements  in  both 
the  wheat  and  nonwheat  classification  accuracies  are 
indicated  for  14  of  the  27  segment  pairs. 

2.  The  average  improvement  (table  XXXI)  in 
overall,  wheat,  or  nonwheat  classification  accuracy  is 
approximately  10  percent.  The  improvements  in 
classification  accuracy  are  particularly  striking  for 
segment  pairs  1854/1025,  1882/1887,  1850/1887, 
1880/1 875,  and  1883/1884.  The  improvements  in 
wheat  classification  accuracy  for  these  segment  pairs 
range  from  approximately  23  percent  for  segment 
pair  1880/1887  to  approximately  64  percent  for  seg- 
ment pair  1882/1887. 

3.  The  degradations  (PUT  — /Vilest)  in 
classification  accuracy  resulting  from  the  use  of 
MLEST  are  relatively  insignificant.  The  average 
degradation  (five  segment  pairs)  in  overall  accuracy 
is  less  than  2 percent  The  average  degradation  in 
wheat  classification  accuracy  (seven  segment  pairs) 
is  less  than  4 percent  The  average  degradation  in 
nonwheat  classification  accuracy  (nine  segment 
pairs)  is  Hs  than  3 percent. 

4.  The  improvements  in  classification  perform- 
ance do  appear  to  depend  on  the  biowindow  (table 
XXXI)  in  which  the  data  were  collected.  Average 
improvements  in  classification  accuracy  are  approx- 
imately 14  percent  for  biowindows  2 and  3,  approx- 
imately 9.5  percent  for  biowindow  1,  and  approx- 
imately 4 percent  for  biowindow  4.  These  results  are 
reinforced  by  the  well-known  fact  that  biowindows  2 
and  3 provide  maximum  discrimination  between 
wheat  and  nonwheat. 

5.  The  MLEST  classification  accuracies  fall  shirt 
of  the  LOCAL  accuracies.  The  average  difference 
between  MLEST  and  LOCAL  accuracies  is  approx- 
imately 18  percent  for  the  overall  accuracies,  approx- 
imately 21  percent  for  the  wheat  accuracies,  and  ap- 
proximately 15  percent  for  the  nonwheat  accuracies 
(table  XXXI).  However,  the  LOCAL  classification 
accuracies  are  biased  estimates  since  they  were  esti- 
mated over  the  same  fields  that  were  used  to  train 
the  classifier.  Thus,  the  difference  between  the 
MLEST  classification  accuracy  and  the  “true" 
LOCAL  classification  accuracy  should  be  less  than 
this  observed  difference  of  15  to  20  percent. 

6.  MLA  starting  values  for  the  B vector  were  used 
for  segment  pairs  1882/1887,  1880/1875,  1877/1875, 
and  1883/1884.  Considerable  improvement  in 
MLEST  classification  performance  was  noted  for 
these  sites.  The  effect  of  the  MLA  starting  values 
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was  10  plate  the  initial  mixture  density  function  In 
the  general  neighborhood  of  the  recognition  segment 
data.  It  is  conjectured  that  the  use  of  MLA  starting 
values  for  the  remainder  of  the  signature  extension 
data  set  would  have  resulted  in  better  MLEST 
classification  performance. 

Table  XXXI!  lists  the  UT,  MLEST,  and  LOCAL 
wheat  proportion  estimates.  Table  XXX1I1  lists 
mean  absolute  differences  between  MLEST  wheat 
proportion  estimates  and  LOCAL  wheat  proportion 
estimates  and  between  UT  wheat  proportion  esti- 
mates  and  LOCAL  wheat  proportion  estimates. 
These  mean  absolute  differences  are  averaged  sepa- 
rately for  each  biowindow  and  collectively  for  the  en- 
tire data  set. 

Referring  to  tables  XXXI!  and  XXXIII,  one  can 
make  the  following  observations. 

1.  The  MLEST  proportion  estimates  are  closer  to 
the  LOCAL  proportion  estimates  than  are  the  UT 
estimates  in  14  of  27  segment  pairs  with  O-percent 
thresholding  and  11  of  27  segment  pairs  with 
1 -percent  thresholding. 

2.  The  extent  of  improvement  is  erratic;  however, 
the  MLEST  estimates  are  closer,  on  the  average  (ta- 
ble XXXIII)  to  the  LOCAL  estimates  than  are  the 
UT  estimates.  The  average  absolute  differences  com- 
puted for  each  biowindow  between  MLEST  and 
LOCAL  and  between  UT  and  LOCAL  indicate  that 
the  MLEST  proportions  represent  improvements 
over  UT  proportions  for  biowindows  1 to  3.  The 
MLEST  proportions  represent  degradation  with 
respect  to  UT  proportions  for  biowindow  4.  This 
finding  is  reinforced  by  the  classification  accuracy 
results  presented  earlier,  which  showed  that  the 
smallest  improvement  in  classification  accuracy 
using  MLEST  was  in  biowindow  4. 

3.  The  averages  of  the  UT,  MLEST,  and  LOCAL 
wheat  proportion  estimates  (ail  27  sites)  from  table 
XXX  at  O-percent  thresholding  are  approximately 
equal  (within  1 percent  of  each  other).  The  variances 
of  these  estimates  are  also  essentially  equal.  For  a 1- 
percent  threshold,  the  average  MLEST  and  LOCAL 
estimates  are  approximately  equal;  however,  the 
average  UT  estimate  differs  about  5 percent  from 
these  estimates. 

4.  The  amount  of  thresholding  with  the  MLEST 
classifications  is  significantly  less  than  that  obtained 
with  the  UT  classifications.  Drastic  reductions  in 
thresholding  are  indicated  for  segment  pairs 
1854/1025,  1168/1173,  1882/1887,  1880/1887, 
1880/1875,  and  1172/1181. 


Conclusions ftom  the  MUST  geographical  signature 
extension  test.-- On  the  basis  of  tests  conducted  thus 
far,  the  following  conclusions  can  be  made. 

1.  The  use  of  the  MLEST  algorithm  leads  to  im- 
provements in  classification  accuracy. 

2.  The  MLEST  wheat  proportion  estimates  are, 
on  the  average,  closer  to  the  LOCAL  wheat  propor- 
tion estimates  than  are  the  UT  wheat  proportion  esti- 
mates. 

3.  In  reference  to  the  geographical  extension 
results,  the  MLEST  algorithm  performs  best  on  data 
from  biowindows  1 to  3. 

4.  The  use  of  the  MLEST  affine  transformed 
training  segment  signatures  for  classification 
drastically  reduces  the  percentage  of  pixels 
thresholded. 

These  results  demonstrate  the  viability  of  MLEST 
as  a signature  extension  algorithm.  It  is  conjectured 
that  the  use  of  MLA  starting  vectors,  physical  con- 
straints on  A and  B,  and  the  iterative  equations  for 
the  a priori  probabilities  would  lead  to  improvements 
in  the  performance  of  the  MLEST  algorithm. 


SUMMARY  AND  CONCLUSIONS 

This  paper  has  described  the  results  of  the  effort 
to  develop  a technology  for  signature  extension  dur- 
ing LACIE  Phases  l and  II  (1975  and  1976).  A num- 
ber of  haze  and  Sun  angle  correction  procedures  were 
developed  and  tested.  These  included  the  ROOSTER 
and  OSCAR  cluster-matching  algorithms  and  their 
modifications,  the  MLEST  and  UHMLE  maximum- 
likelihood  estimation  procedures,  and  the  ATCOR 
procedure.  All  these  algorithms  were  tested  on 
simulated  data  and  consecutive-day  Landsat  image- 
ry. The  ATCOR,  OSCAR,  and  MLEST  algorithms 
were  also  tested  for  their  capability  to  geographically 
extend  signatures  using  Landsat  imagery. 

Several  conclusions  can  be  drawn  from  these  tests. 

1.  In  general,  the  paired  TSEG/RSEG  segment 
approach  to  signature  extension  described  in  this 
paper  was  not  successful.  This  conclusion  is  based  on 
the  poor  geographical  extension  test  results  pre- 
sented in  the  preceding  section.  The  primary  source 
of  error  appeared  to  be  the  lack  of  representative 
crop  signatures  in  a single  training  segment  for  use  in 
structuring  a classifier  to  classify  a recognition  seg- 
ment. This  conclusion  was  reached  by  comparing  the 
results  from  the  consecutive-day  signature  extension 
tests  with  the  geographical  signature  extension 
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results.  The  consecutive-day  signature  extension 
results  were  very  encouraging  in  that  they  indicated 
that  if  haze  was  the  primary  source  of  signature 
variation,  it  could  be  successfully  corrected  for  with 
little  or  no  degradation  in  the  classifier’s  perfor- 
mance. When  other  sources  of  signature  variation 
were  introduced  in  the  geographical  signature  exten- 
sions, a signTicant  degradation  in  classifier  perfor- 
mance  was  noted  which  was  not  correctable  using 
haze  and  Sun  angle  correction  procedures.  The  tack 
of  success  noted  in  this  paired  segments  approach  to 
signature  extension  led  to  the  development  of  the 
ntultisegment  training  approach  to  signature  exten- 
sion described  in  the  paper  by  Kauth  and 
Richardson. 

2.  The  affine  transformation  appears  to  be  an  ap- 
propriate model  for  use  in  correcting  Landsat  image- 
ry for  uniform  haze  and  Sun  angle  differences. 

3.  Of  the  algorithms  tested,  MLEST  and  ATCOR 
appear  to  offer  the  most  promise  for  signature  exten- 
sion. MLEST  outperformed  all  other  algorithms  on 
the  simulated  data,  the  consecutive-day  date,  and  the 
geographical  extension  data  set.  In  addition,  it  is  sup- 
ported by  maximum-likelihood  estimation  theory. 

In  the  consecutive-day  test  and  the  geographical 
extension  test,  ATCOR  was  shown  to  improve 
classification  performance.  ATCOR  also  has  a 
theoretical  foundation  in  its  physical  model  explain- 
ing the  interaction  of  light  reflected  by  the  surface  of 
the  Earth,  the  atmosphere,  and  the  Landsat  sensor. 
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Appendix 

Detailed  Deacrlptlon  of  08CAR 


As  described  in  the  body  of  this  paper,  the 
OSCAR  algorithm  consists  of  four  major  steps:  com* 
putation  of  pseudorank  vectors,  identification  of 
admissible  pairs,  evaluation  of  candidate  transforma- 
tions, and  computation  of  weighted  average  esti- 
mates. The  following  is  a detailed  description  of 
these  steps. 


Compute  for  each  j and  k 
mr 

"flt  “iff:  -T  £ “fa  ~ 

it  i-i 

1*1 


•TIP  1 . COMPUTATION  OP  PSEUDORANK 
VECTORS 

Cluster  each  segment  using  a suitable  clustering 

algorithm.  Let  p, , / - 1,2 Mp  be  the  cluster 

mean  vectors  for  the  training  segment  and  M/J  “ 1, 
2, . . . , Mr,  be  the  cluster  mean  vectors  for  the  recog- 
nition segment.  We  will  use  A to  index  components 
(channels)  of ji^and**^,  where  k — 1, 2, . . . ,p. 

Compute  for  each  / and  k 


Mt 

a i £ w(%  - »,k>7)  <A1> 

t*i 


where 


2 * % 

~ »tk 

(A2) 

H{Z7T)  ■ 

0;  z < -r 

(A3) 

H(zy)  • 

1;  ft  > t 

(A4) 

H{zrr)  • 

(z  + r)/(2r); 

-r  < z < t 

(0  < r < 5; 

currently,  r ■ 3| 

(A5) 


where  the  definitions  are  similar  to  those  for  equa- 
tion (Al). 

The  H function  is  used  to  render  the  rank  vectors 
robust  with  respect  to  slight  random  variations  (in 
haze  or  in  signatures)  that  could  cause  rank  reversals. 
Differences  greater  than  r are  counted  as  foil  ranks; 
differences  less  than  r are  counted  as  partial  ranks  on 
a sliding  scale  from  zero  to  one. 

The  pseudorank  vectors  are  normalized  to  zero-to- 
one  to  enhance  comparability  between  segments 
with  different  numbers  of  clusters.  Thus,  if  / and  y in- 
dex similar  classes,  vft  should  be  fairly  close  to  w* 
for  all  k. 


STEP  2:  IDENTIFICATION  OP  ADMISSIBLE 
PAIRS 

For  each  IJ  pair,  compute 

CV  * £ - "*/|P  <A7> 

*»i  1 


< p < 2;  currently,  p ■ ij  (A8) 


For  each  /,  find  the  0 lowest  cv  values.  Let  Mlh 
denote  they  index  of  the  Ath  lowest  Cy. 

List  all  IJ  pairs  such  that  either 

a.  For  some  A « a,  there  is  an  Mg, — j. 

b.  For  some  a < A « 0,  there  is  an  AfA*y  and 
< y- 


fO  < a < 0 < 10.0.8  <.  T < ? 3; 
currently,  a * 4.0  • * Ij 
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Pairs  satisfying  the  above  criterion  arc  called  ad- 
missible. Let  admissible  pairs  by  indexed  by  g — 1.2. 

Let  /,  be  the  / index  and  Jg  be  the  j index  for  the 

cth  pair. 

If  all  clusters  represent  classes  found  in  both  seg- 
ments. c<~  re-'.'Jnding  and  w)(k  vectors  will  be 
nearly  e>|M'  The  vectors  will  differ  somewhat 
because  u.idom  variation  can  reverse  ranks  and 
because  some  clusters  will  be  unmatched  (i.e..  found 
in  one  segment  only)  The  approach  is  to  test  all  pairs 
that  appear  to  be  promising. 


STEP  3:  EVALUATION  OF  CANDIDATE 
TRANSFORMATIONS 

The  basic  structure  is  a double  loop  for  all  (g.q) 
pairs  such  that  1 « « N(g).  Thus,  the  basic  loop 

consists  of  examining  all  pairs  of  admissible  pairs. 
The  steps  are 

a.  Set  the  subscript  r equal  to  I 

b.  Take  the  next  (g.q)  pair.  If  starting,  g — I and 
q — 2.  If  all  pairs  have  been  examined,  go-to  step  4. 

c.  If  either  lK  - /„  or  Jg- /,<!/,  - l„\  ^ 0.00001 
means  / — /J.  go  back  to  step  b;  otherwise,  con- 
tinue. 

d.  Compute  for  each  channel  k 


For  each  recognition  cluster  compute  Gt  as 
follows. 

*'</*  “ Wik  + ckr  ~ (Al3» 

Form  the  vector  Z„ 

zii  * rtfp]7  (AI4) 

then 

% * "*•  (-< 'ais» 

and  define 

(J.  * min  j (A16) 

fiJlT 

g Compute 


*k 


V 


(A9) 


U 


Y'f  r 1 I ; |20  < o < 80;  currently,  o 
/ 


- S0| 
( A17) 


e.  If  for  all  k,  f>  ^ vA  « x,  then  continue;  if  one  or 
more  of  the  inequalities  are  not  satisfied,  (hen  go 
back  to  step  b. 

|0.3  < 6 < 0.6,  1.8  < X < 3; 
currently.  6 * 0.6.  X 3 l.8|  (A10) 


h.  Increment  r (r  — r + 1). 

i.  Go  to  step  b. 

The  algorithm  forms  a straight  line  between  the 
means  of  each  pair  of  clusters.  If  the  resulting 
multiplicative  factor  is  reasonable,  the  goodness  cri- 
terion is  evaluated  and  the  transformation  is  stored 
The  rationale  for  the  goodness  criterion  is  given  in 
the  text. 


f.  Set  and  compute  for  all  k 
ckr  " ** 

ckr  “ *'jtk  - ck*lKk 


(All) 
( A12) 


STEP  4:  COMPUTATION  OF  WEIGHTED 
AVERAGE  ESTIMATE 

Find  the  ij»  highest  /,  values.  Let  /,  be  the  rth  high- 
est value  and  let  r,and  er  be  moved  to  correspond  to 
/,;  |l  ^ 0 « 10;  currently. «/«  — 5). 
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For  all  ch*n niii  k - l,  2, ....  a compute  a*,  the 
multiplicative  term  in  the  affine  hate/Sun  angle  cor- 
rection equation  (eq.  (1)). 


The  weighted  averse  is  used  ao  that  the  final  esti- 
mate will  not  be  unduly  influenced  by  a slight 
preference  that  could  be  caused  by  random  variation. 
An  impaction  of  equations  (A18)  and  (A19)  reveal! 
that  ak  and  bk  (see  eq.  (1))  are  formed  by  taking 
weighted  averages  of  the  best  candidate  transforma- 
tions. 


For  all  channels  *■  1,2,... , a compute  the 
additive  term  in  the  affine  haze/Sun  angle  correction 
equation  (eq.  (1)). 


(A19) 
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Table  L— Sources  of  Variation  to  Landsat  Data 
Observed  by  to  Multitpectral  Scanner 


rm 

ClIMpAv 

A.  Dynamic  toon 

AniMMplMrtThui  tevcl  dungm 

B.  Suite  touras  of  dtf- 

8nH  .mini 

ana  onor 

(fcffitott 

Mndpalcrap* 
nimtopcpMc  datum 
Crapping  prank** 
CttlMU 

Raid  *itM  and  thaptt 

Table  II.— Affine  Signature  Transformation  Coefficients  for  Consecutive-Day  Data 


Data 

not 

MUST 

oscah 

SEGUES 

MOD* 

oto 

ktOOOSCAM 

ATPOO 

orso 

•k 

H 

*1 

h 

*» 

•k 

•k 

*k 

•k 

•k 

•k 

*k 

FI709-* 

100 

-7.1 

109 

-2)2 

1.12 

-44 

1.24 

-4.1 

10) 

-04 

1.12 

-ii 

in 

-29 

1.12 

-69 

in 

—16 

in 

-14 

102 

-.9 

101 

-10 

1.14 

-12 

104 

-1.7 

in 

-4.) 

in 

-29 

1.10 

-14 

n 

-IS 

.n 

2) 

10) 

-14 

101 

l.l 

l.l) 

-69 

10) 

.1 

10) 

.) 

in 

-10 

in 

-)» 

1.2) 

-10) 

n 

1.) 

109 

-.7 

.91 

1.) 

1.12 

-21 

10) 

.) 

1.02 

.4 

in 

-0 

107 

-2.) 

10) 

-4,2 

FI67M 

091 

0.1 

10) 

01 

004 

45 

0.71 

IS 

0.94 

1.0 

0.14 

on 

14 

0.9) 

2.1 

on 

0.2 

99 

-.9 

106 

.0 

.90 

10 

12 

2.4 

.94 

.) 

n 

15 

n 

.) 

.99 

2.4 

101 

-.7 

its 

-19 

1.10 

.9 

1.10 

-10 

1.) 

-179 

.91 

-1.2 

1.04 

-16 

95 

19 

.99 

29 

id 

-6.) 

1.16 

-52 

1.19 

J 

1.09 

-1) 

15 

-11) 

1.00 

-0 

106 

-1) 

.94 

19 

.97 

10 

116 

-4.0 

FI6M4 

1.0* 

-01 

IS* 

-1.4 

104 

-01 

10) 

-00 

10) 

-0.2 

in 

19 

107 

-10 

092 

4.7 

I.W 

-It 

in 

-.1 

1.24 

-2.) 

10) 

.5 

104 

10 

1.09 

0 

in 

1.7 

in 

9 

.94 

4.) 

1.26 

—14 

.9* 

65 

112 

-J 

.90 

11 

.9) 

).) 

102 

0 

01 

12.1 

104 

-0 

.94 

)0 

1.22 

-62 

.4) 

2.2 

13) 

-).) 

1.00 

.9 

1.0 5 

.2 

10) 

I 

19 

19 

104 

-.6 

.9) 

1.4 

116 

-1.7 

FI726-7 

0.91 

00 

G.94 

-19 

on 

0) 

09} 

-09 

oot 

04 

on 

0.1 

019 

20 

0.44 

1) 

in 

— 16 

.9) 

—9 

.99 

-1) 

41 

J 

.94 

-1.2 

n 

9 

.41 

-.1 

n 

14 

.9) 

24 

107 

-6.1 

.91 

9 

1.22 

-140 

1.02 

-90 

109 

-19 

.90 

.2 

in 

-6.4 

.17 

21 

.94 

2.) 

ISO 

-107 

.9) 

-9 

117 

-1) 

— 

.) 

.91 

-19 

91 

-) 

n 

11 

.17 

1.) 

.97 

.9 

117 

-19 

SI4SM 

lit 

-49 

091 

0.4 

009 

21 

094 

10 

091 

D 

on 

2.2 

091 

2.1 

091 

01 

212 

-21) 

III 

-19 

.91 

I 

9) 

9 

l.l) 

-10 

92 

9 

40 

1.2 

92 

1) 

.91 

9 

in 

-11.7 

in 

-1.9 

102 

.0 

92 

II 

.99 

-.2 

.97 

-1.) 

in 

-2.) 

92 

12 

n 

.) 

ill 

-70 

1.14 

-1.4 

.97 

I 

91 

9 

M 

-2 

99 

-0 

N 

-1) 

.92 

.) 

in 

.2 

l.» 

-1) 

S172M 

0.9} 

1) 

099 

1.7 

101 

12 

101 

14 

10) 

-j.) 

on 

0) 

in 

l.l 

0.94 

1) 

114 

-4.9 

.91 

29 

10) 

.7 

101 

1) 

104 

.9 

109 

-10 

it 

1.1 

109 

-.7 

4) 

20 

1.22 

-1) 

,97 

14 

104 

1.2 

101 

22 

102 

II 

104 

-.) 

M 

2-9 

104 

I 

.99 

2.) 

1)1 

-44 

1.0} 

J 

104 

9 

too 

II 

too 

10 

1.0) 

-9 

.94 

15 

in 

.2 

.97 

.4 

1-02 

-1.7 

Cl  7J6-5 

109 

-<)) 

10) 

01 

099 

21 

104 

09 

099 

1) 

0.91 

17 

049 

29 

0.92 

4.7 

I.20 

-49 

109 

-.) 

102 

4 

.94 

4.4 

1.00 

2.) 

9) 

4.4 

.9) 

4.1 

109 

-1) 

.94 

4.) 

1.17 

-14 

1.0) 

.1 

101 

0 

99 

)0 

.47 

17 

.91 

11 

99 

2.4 

in 

-9 

.94 

>0 

l.l) 

-4.2 

101 

-7 

102 

0 

tot 

10 

1.01 

10 

IO| 

1.2 

101 

It 

in 

5 

.4) 

1.4 

l.l) 

-20 
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Table  ///. — Whtat  Accuracy  ft*  Stmukmd  Data 


Table  tV.—Nonwheai  Accuracy  for  Simulated  Data 


Am  Local  Ipm^  mmo* 

WBf  WWW  l^rfW'W  Mflff  Wf^^Ip  H^^viuWw 


^VDRf 

to 

A® 

MUST 

usfltm 

B(Q 

i n* 

SIM! 

M4 

-24 

U 

-190 

—214 

-MO 

MM2 

97.1 

4 

12 

-19 

4 

-264 

MM) 

94J 

2 

)i 

24 

-124 

-144 

MM4 

r.9 

-.1 

6.S 

-1)4 

-1.7 

-214 

MM 

NM) 

91.1 

-4 

44 

-214 

-10.7 

-59.7 

SP* 

$.9 

14 

24 

412 

l).l 

M.I 

^AMma  MU  MM  to  «■•  h»  «•«««*■  «■»  lP*M  ManAnim 


Am  Local  hntatOBt  BBSmca  bttwcta  Local  uccaotKy 

actaracy,  Mtf  Ml  oMM  wftk  wMi 

ytrcam  to 


B(S)  MUST  UHfltkb  B(C)  VT 


MMI 

964 

04 

-64 

-02 

-M4 

-99.1 

MM2 

99.1 

4 

-4 

-.1 

4 

-154 

MM) 

97.7 

4 

-1.1 

-24 

-2.9 

-)94 

MM4 

944 

-.1 

-64 

-2.4 

— )2 

-).2 

Mm 

96.9 

♦ 

»• 

-)4 

-14 

-94 

— )94 

8D 

24 

.) 

1.1 

1.) 

14.1 

424 

*A  aMm  awn  mam  At  *J#otnAm  tut  «w  moth  am  tonWtMtfkMMi 


Table  V.— Wheat  Accuracy  for  Consecutive-Day  Data 


Dm  Local  Ptrctntatt  (hfitttnct  between  heal  accuracy  em4  that  obtained  wftfi  eortout  algorithms 


accuracy,  to 

ptrctni  — 


*(S> 

MUST  OSCAB  BECBES  MODh 

B(Q 

MOD 

OSCAB 

ATCOB  VHfleUs 

VT 

Btsta 

VHatt 

PI 7094 

96.7 

04 

1.1 

47 

12 

47 

1.1 

14 

12 

-154 

l.l 

24 

-21.9 

P167J-2 

974 

-4 

.9 

-).9 

-64 

-24 

-)4 

-2.9 

-2.9 

— ).9 

-4 

2 

-22 

P16S5-4 

9)4 

-154 

2.7 

-124 

-11.1 

-12.7 

-174 

-144 

-19.) 

-94 

-174 

-14 

-494 

P1726-7 

«24 

-)4 

-6.7 

-24 

— )4 

-5.9 

-149 

-1.9 

-l.l 

-274 

-24 

-44 

-64 

S14554 

924 

14 

14 

-44 

-2.7 

-7.1 

-11.7 

-6.1 

-.7 

-164 

4 

-2.1 

-424 

517254 

79.7 

74 

24 

44 

74 

-.9 

•4 

14 

-4 

14 

-242 

-13.6 

2IJ 

El 726-5 

924 

-1.9 

l.l 

-4 

-24 

-14 

A 

—4 

-62 

-S49 

-14 

14 

-492 

904 

-14 

.4 

-24 

-24 

-44 

-44 

-12 

-42 

-174 

-54 

-2.9 

-21.5 

SD 

64 

6.9 

)2 

S.4 

5.7 

44 

9.1 

54 

74 

17.5 

9.1 

41 

272 

^ HriuMi  hum  ttn  m im  i[nnm  ctwi  iwi  itmiflmm 
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Table  VI.— ■Nonwheat  Accuracy  for  Consecutive-Day  Data 


Data  Local  Percentage  difference  between  local  accuracy  and  that  obtained  with  various  algorithm 

accuracy.  (a) 

percent 


R(S ) 

MLEST  OSCAR  REGRES  MODR 

R(C) 

MOD 

OSCAR 

ATCOR  VH fields 

UT 

R(SIC) 

man 

F17094 

73.9 

-8.5 

-6.8 

-10.3 

-10.6 

-11.2 

-11.5 

-12.4 

10.7 

-12.0 

-18.7 

19.9 

F1673-2 

95.7 

-2.3 

-1.0 

-3.0 

-11.5 

1.6 

-.9 

.1 

-5.7 

-27.2 

.3 

-2.3 

Pin 

F 1655-4 

95.4 

.4 

-3.4 

1.4 

.7 

.4 

-.5 

.6 

1.4 

-.9 

.6 

-4.4 

-4.0 

FI726-7 

79.1 

3.7 

3.9 

5.8 

7.6 

-.4 

2.3 

3.9 

-7i 

106 

-10J 

-66 

-68 

SI  455-4 

78.9 

-2.3 

-5.1 

-2.8 

-.5 

3.0 

7.6 

3.0 

1.3 

-4.6 

.0 

-5.8 

-8.0 

S1?2S^ 

93.5 

-6.5 

-3.5 

-7.7 

-8.4 

-6.1 

-14.7 

-5.7 

-9.7 

-11.8 

-7.0 

-8.1 

-as 

E 1 726-5 

45.2 

-5.1 

-17.3 

-9.3 

-5.3 

-2.2 

-11.1 

-25.2 

3.1 

86.1 

-28.5 

-31.4 

61.3 

Mean 

80.2 

-2.9 

-4.7 

-3.7 

- 

-2.1 

-4.2 

-5.0 

-4.2 

9.0 

-8.2 

-11.0 

1.2 

SD 

17.9 

4.2 

6.5 

5.9 

C ’ 

5.0 

8.4 

10.4 

6.1 

36.5 

10.4 

10.4 

31.0 

‘a  minus  sign  means  the  algorithm  was  less  accurate  th.n  local  dassifeiiiun. 


Table  VII. — Overall  Accuracy  for  Consecutive-Day  Data 


Data  Local  Percentage  difference  between  heal  accuracy  and  that  obtained  with  various  algorithms 


accuracy,  (a) 

percent  •■■■■■  — ■■■ 


R(SI 

MLEST 

OSCAR  REGRES  MODR 

R(C) 

MOD 

OSCAR 

ATCOR  VH  fields 

UT 

R(S!C) 

UHall 

FI 709-8 

79.5 

-5.8 

-4.4 

-7.0 

-7.1 

-7.6 

-8.1 

-7.8 

-8.5 

2.7 

-8.2 

-12.5 

7.3 

F1673-2 

96.1 

-2.0 

-.5 

-3.2 

-10.2 

.5 

-1.7 

-.7 

-5.0 

-21.3 

.1 

-1.7 

-a7 

F1655-4 

94.9 

-3.3 

-1.8 

-2.1 

-2.1 

-2.7 

-4.7 

-3.0 

-3.6 

-3.1 

-3.8 

-3.8 

-15.0 

F1726-7 

80.0 

1.9 

1.7 

3.8 

4.9 

-1.9 

-1.1 

2.4 

-5.9 

.9 

-8.5 

-7.1 

-6.8 

S1455-4 

86.S 

-.2 

-.9 

-3.5 

-1.8 

-3.2 

-4.4 

-2.5 

.1 

-12.1 

.0 

— 3.S 

-29.5 

SI  725-4 

85.4 

1.1 

-.5 

-.9 

.0 

-3.2 

-1.9 

-5.0 

-4.7 

-4.3 

-14.1 

-11.0 

.9 

EI726-5 

66.2 

-3.2 

-6.0 

-3.8 

-3.5 

-1.8 

-4.1 

-9.8 

-2.7 

1.4 

-11.5 

-9.8 

-7.3 

Mean 

84.1 

-1.6 

-1.8 

-2.4 

-2.8 

-2.8 

-3.7 

-3.8 

-4.3 

-5.1 

-6.6 

-7.1 

-10.6 

SD 

10.2 

2.7 

2.6 

3.3 

4.9 

2.5 

2.4 

4.2 

2.7 

8.7 

5.5 

4.2 

13.1 

*A  minus  sign  means  the  algorithm  was  less  accurate  than  local  class! (Ication. 
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Table  vtti — Overall  Accuracy Jbr  Simulated  Data 


Data  Local  Percentage  difference  between  heal  accuracy 
accuracy,  and  '.hat  obtained  with  various  algorithms 


percent 

(a) 

BIS) 

MLEST  UH fields  B(C) 

UT 

S1MI 

93.S 

0.0 

-3.S 

-21.7 

-29.6 

-99.3 

S1M2 

98.6 

.0 

.0 

-.7 

.0 

-18.3 

SIM3 

97.0 

.1 

.0 

-1.0 

-5.2 

-SO.O 

SIM4 

92.8 

-.1 

-3.2 

-SO 

-2.9 

-8.8 

Mean 

9S.S 

.0 

-1.7 

-7.1 

-9.4 

-44.1 

SD 

2.8 

.1 

1.9 

9.9 

13.6 

40.8 

*A  minus  tijn  muni  the  sHomlim  «u  toss  sonirats  thin  locti  dsuifkanon 


Table  IX.— Wheal  Proportions  for  Simulated  Data  a* 
Determined  Using  Local  Results 


Data 

Local  Signature  extension  preportkm  minus  heal  proportion 

R(S)  It  (Cl  MUST  t IH  UH  UT 

fields  fields 
MU 

SIMI 

24.3 

-OS 

-1.9 

06 

-160 

-21.3 

-143 

S1M2 

24.7 

.0 

.0 

.6 

-1.0 

-l.l 

—12 

SIM  3 

249 

1 

-3.1 

1.5 

1,7 

1.6 

-200 

SIM4 

242 

.0 

^ * 
.* 

6.1 

-.3 

-.6 

-6.3 

Mean 

.2 

1.3 

24 

SO 

6.4 

13.5 

absolute 

values 

SD 

.2 

1.5 

3.0 

7.9 

106 

103 

Table  X.— Wheat  Proportions  for  Simulated  Data  as 
Determined  Using  Ground  Truth 


Data  . 

Ground 

mult 

Signature  extension  proportion  minus  ground-truth 
proportion 

n<si 

n(d 

MLEST 

UH 

flehts 

MU 

UH 

fields 

UT 

SIMI 

23.9 

-0.1 

-IS 

1.0 

-16.4 

-21.9 

-239 

SIM  2 

239 

.8 

.8 

14 

-.2 

- 3 

-2.4 

SIM3 

239 

l.l 

-21 

2.5 

2.7 

26 

-19.0 

SIM4 

23.9 

3 

1 

7.1 

-.3 

-.3 

-6.0 

Mean 

— 

6 

l.l 

3.0 

4.9 

6.3 

128 

absolute 

values 

SD 

—a 

s 

.9 

2.8 

7.8 

10.S 

10.3 

785 


Table  XI.— Wheat  Proportions  fir  Consecutive-Day  Data  as  Determined  Using  Local  Results 


Data  Laeal  Signature  extension  proportion  mMw  heal  proportion 

aetwacy, 

percent  ' 


R(S) 

REGIUS 

OSCAR 

MOD  it 

VT 

MLEST  ATCOR 

MOD 

OSCAR 

RiStQ  RtCI 

Uftatl 

VH 

fields 

VMM 

MLE 

P17094 

35.4 

5.9 

7 3 

9.0 

9.4 

9.8 

70 

10.7 

m 

16.5 

10.3 

-8.9 

—80 

-12 

FWM 

28.9 

3.1 

-2.0 

A 

-2.0 

10 

SO 

—10 

-10 

4.4 

-1.5 

20.7 

113 

215 

F1655-4 

27.7 

- 22 

-2.2 

-11 

-10 

-4.7 

3.9 

-40 

-18 

2.8 

-18 

10 

40 

50 

F1726-7 

28.8 

-1.0 

-2.0 

-1.4 

-1.4 

11 

-.4 

16 

-1.4 

19 

-16 

-14 

-117 

-18 

S1455-4 

53.7 

3 

-3.3 

-3.5 

-8.6 

-10 

4.8 

-10 

— 70 

-J 

-113 

-11.4 

-100 

-13 

SI 7354 

35.3 

4.6 

50 

5J 

2.9 

-0 

16 

4.4 

16 

-0 

90 

190 

S3 

217 

BI7264 

61.9 

1.8 

1.4 

13 

.9 

66 

4.9 

-12 

60 

90 

16 

-2S.1 

-36.1 

-29.7 

Mtu 

««• 

2.7 

3.3 

16 

3.8 

4.1 

42 

4,3 

4.8 

5.2 

6.4 

120 

132 

130 

absolute 

values 

SD 

1.9 

22 

19 

16 

33 

2.3 

30 

13 

19 

4,6 

9.2 

10.7 

10.6 

Table  XII.— Wheat  Proportions  for  Consecutive-Day  Data  as  Determined  Using  Ground  Truth 


Data  Ground  Signature  extension  propanton  minus  ground-truth  proportion 


truth 

R(S) 

REGRES 

OSCAR 

MOD  A 

VT 

MUST 

ATCOR 

MOD 

(SCAR 

R(StO  R(C) 

UN  all  VH 
fields 

UHatt 

MLE 

FI  7094 

240 

16.7 

18.3 

190 

202 

20.6 

18.6 

71.5 

21.1 

27.3 

21.1 

1.9 

2.3 

50 

F1673-2 

240 

7,4 

2.3 

19 

2.3 

62 

9.3 

2.7 

2.4 

8.7 

2.8 

250 

170 

26.8 

F16SS4 

216 

.9 

.9 

1.0 

U 

-10 

70 

-10 

.3 

5.9 

-.7 

4.7 

7.7 

8.9 

F1726-7 

240 

12 

12 

2.8 

20 

7.3 

30 

66 

2.8 

6.1 

to 

1.8  -9.5 

1.4 

S14554 

S8.3 

-11 

-7.9 

-11 

-13.2 

-6.6 

2 

-70 

-12.2 

-4.9 

-17.9 

-160  -152 

-12.9 

S17254 

58.3 

-18.4 

-180 

-17.7 

—20.1 

-230 

-20.4 

-18.7 

-19.4 

-23.9 

—13.4 

-3.4  -17.5 

-2.3 

£1726-5 

412 

19.5 

19.1 

21.0 

18.6 

213 

220 

115 

23.7 

27.5 

21.3 

-7.4  -18.1 

-12.0 

Mean 

100 

90 

108 

111 

12.9 

U.7 

105 

11.7 

14.9 

11.3 

80 

120 

too 

absolute 

values 

SD 

7.9 

8.4 

15 

18 

90 

8.8 

7.8 

9.9 

10.7 

9.3 

8.7 

6.2 

8.7 

786 


Table  XIII.— Haze  Conditions  on  Consecutive-Day 
Data  as  Determined  by  Inspection  of  Images 


Data 

TSEG 

RSEG 

F170M 

Clear 

Clear 

FI673-2 

Haze 

Clear 

F16554 

Clear 

Clear 

FI 726-7 

Haze 

Clear 

SI 455-4 

Clear 

Clear 

SI 7254 

Clear 

Haze 

El  726-5 

Clear 

Haze 

Table  XV. — Accuracy  Percentage  for  the  Three 
Different  Haze  Conditions 


Haze  condition  Percent  accuracy 

— — difference 

TSEG  RSEG 


Haze  Clear  — 1.71 

Clear  Clear  - 4.26 

Clear  Haze  — 4.82 


Table  XIV.— Analysis  of  Variance  for  Overall 
Accuracy 


Source 

Degrees 

of 

freedom 

Sum  of 
squares 

Mean 

square 

F factor  Significance 

Algorithm 

9 

217.61 

24.18 

2.02 

6or7 

(A) 

percent 

Hazel//) 

2 

113.69 

56.85 

4.74 

5 percent 

A x H 

18 

205.32 

11.41 

.95 

NS8 

Error 

40 

480.17 

12.00 

Total 

69 

1 016.79 

*1001140160111. 


Table  XVI.— Analysis  of  Variance  for 
Wheat  Proportions 


Source 

Degrees  of  Sum  of 
freedom  squares 

Mean 

square 

E-factor  Significance 

Algorithm  (/4) 

8 

59.77 

7.47 

X 

NS 

Haze(tf) 

2 

156.27 

78.14 

9.42 

1 percent 

AX  H 

16 

45.21 

2.83 

X 

NS 

Error 

36 

312.95 

8.69 

Total 

62 

S74.20 

Table  XVII. — A verage  Proportion  Differences  for  the 
Three  Different  Haze  Conditions 


Haze  condition 

Proportion  difference 

TSEG 

RSEG 

Haze 

Clear 

2.0 

Clear 

Clear 

5.8 

Clear 

Haze" 

3.9 

787 


Table  X Vlll,— Summary  of  Test  Results 


Simulated  data  Constcitti\&da)!  data 


Percentage  difference  between  heal  accuracy 
and  that  obtained  with  various  algorithms 


R(S) 

00 

R(S) 

-16 

MLEST 

-1.7 

MLEST 

—1.8 

UH  fields 

-7.1 

OSCAR 

-2.4 

R(C) 

-9.4 

REORES 

-18 

UT 

-44.1 

MOOR 

-2.8 

R(C) 

-3.7 

MOO  OSCAR 

-3.8 

ATCOR 

-4.3 

UH  fields 

-S.1 

UT 

-6.6 

R(S/C) 

—7.1 

UHali 

-106 

Wheat  proportion  difference  from  local 


R(S) 

02 

R(S) 

2.7 

R(C) 

1.3 

REGRES 

3.3 

MLEST 

2.4 

OSCAR 

36 

UH  fields  MLE 

5.0 

MODR 

3.8 

UH  fields 

6.4 

UT 

4.1 

UT 

13.5 

MLEST 

4.2 

ATCOR 

4.3 

MOD  OSCAR 

4.8 

R(S/C) 

5.2 

R(C) 

6.4 

UH  all 

12.8 

UH  fields 

13.2 

UH  all  MLE 

13.6 

Wheat  proportion  difference  from  ground  truth 


R(S) 

0.8 

UHali 

8.6 

R(C) 

1.1 

REGRES 

9.8 

MLEST 

3.0 

UHali  MLE 

10.0 

UH  fields  MLE 

4.9 

R(S) 

10.0 

UH  fields 

6.3 

ATCOR 

10.5 

UT 

12.8 

OSCAR 

10.8 

MODR 

11.2 

R(C) 

11.3 

MLEST 

11.7 

MOD  OSCAR 

11.7 

UH  fields 

12.6 

UT 

12.9 

R(S/C) 

14.9 

Table  XIX.— Estimated  A wage  Percentage  etf  Wheat 
and  Thresholding  in  Alt  Recognition  Segments 
Used  In  the  Study 


Classification  Percentage  iff  wheat  in  segment 

technique  —— — — . .. « — 

(^percent  l-percent  Percent  threshold- 

thresholding  thresholding  Ingot  l percent 


Estimate  SE° 

Estimate 

SE 

Estimate 

SE 

Local 

309 

2.8 

29.9 

26 

3.4 

064 

Untransforraed 

316 

2,7 

27.9 

3.0 

13.8 

42 

OSCAR  with 
dusters 

34.2 

2.8 

32.8 

2.7 

4.2 

.89 

ATCOR 

33.7 

2.7 

31.7 

2.7 

6.9 

1.8 

OSCAR  with 
subclasses 

32.2 

2.8 

316 

2.8 

3.8 

.7 

"StuHlird  error 


Table  XX. —Average  Values  and  Standard  Deviations 
Of  the  Estimated  Percentage  of  Wheat 
at  0-Percent  Thresholding 


Classification 

technique 

Biowindow 

Average 

1 

) 

S 

4 

Average  values 

Local 

33.4 

25.0 

31.6 

36.7 

31-5 

Untransformed 

35.0 

31.2 

27.2 

35.1 

31.9 

OSCAR  with  dusters 

37.4 

33.2 

36.6 

32.2 

34.6 

ATCOR 

36.8 

29.3 

35.0 

36.0 

34.1 

OSCAR  with  subclasses 

33.7 

32.2 

30.9 

32.5 

32.2 

Average 

35.2 

30.2 

32.2 

34.5 

32.9 

Observed  standard  deviations 


Local 

15.8 

7.8 

15.0 

17.4 

Untransformed 

18.8 

17.1 

9.9 

13.0 

OSCAR  with  clusters 

17.7 

18.5 

12.5 

12.9 

ATCOR 

17.4 

17.8 

9.6 

13.4 

OSCAR  with  subclasses  15.4 

15.8 

8.5 

19.5 

Number  of  segments 

All  techniques 

5 

7 

7 

7 (26)a 

'Ptrenihews  indieme  lout. 
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Table  XXL— ‘Average  Values  and  Standard  Deviations 
qf  the  Estimated  Percentage  qf  Wheat 
at  I ‘Percent  Thresholding 


Classification 

technique 

Btowindow 

Average 

4 

2 

3 

4 

Average  values 

Local 

33.0 

23.8 

30.6 

35.3 

30.5 

Untra  informed 

31.1 

26.4 

22.7 

34.2 

28.4 

OSCAR  with  clusters 

36.3 

31.6 

34.7 

31.5 

33.3 

ATCOR 

33.8 

27.8 

32.5 

35.1 

32.2 

OSCAR  with  subclasses 

32.6 

32.7 

29.3 

31.8 

31.5 

Average 

33.4 

28.4 

30.0 

33.6 

31.2 

Observed  standard  deviations 

Local 

15.6 

7.1 

14.3 

16.0 

Untransformed 

21.0 

20.4 

8.2 

13.1 

OSCAR  with  clusters 

17.7 

18.3 

11.3 

12.8 

ATCOR 

18.4 

17.8 

8.8 

13.5 

OSCAR  with  subclasses 

15.5 

lb.2 

7.3 

19.2 

Number  of  segments 

All  techniques 

5 

7 

7 

7 

(26)a 

Table  XXIL— Average  Values  and  Standard 
Deviations  qf  Percentage  qf  Thresholding  In  die 
Segments  at  l -Percent  Thresholding 


Classification 

technique 

Btowindow 

Average 

/ 

2 

3 

4 

Average  values 

Local 

1.5 

5.7 

2.1 

3.8 

3.4 

Untransformed 

14.0 

21.3 

14.3 

4.3 

13.4 

OSCAR  with  dusters 

3.4 

3.9 

4.5 

4.5 

4.1 

ATCOR 

7.5 

4.6 

10.0 

3.6 

6.3 

OSCAR  with  subclasses 

3.8 

3.2 

4.0 

4.3 

3.8 

Average 

6.1 

7.7 

7.0 

4.1 

6.2 

Observed  standard  deviations 

Local 

l.l 

3.2 

1.3 

4.8 

Untransformed 

25.6 

31.5 

20.7 

2.0 

OSCAR  with  clusters 

5.1 

5.9 

4.8 

42 

ATCOR 

12.0 

3.7 

14.5 

1.5 

OSCAR  with  subdasses 

5.4 

4.1 

4.3 

2.3 

* Number  of  segments 

All  techniques 

5 

7 

7 

7 

(26)“ 

‘Parentheses  indicate  total 


Parentheses  indicate  total 


Table  XXIII,— Correlation  Coefficients 


Estimate 

Change  in 
haze  as 

OSCAR  quality  factors 

predicted  by 

Cluster- 

Subclass • 

ATCOR 

based 

based 

Percentage  wheat  in  the  segment  at  l patent 


Local 

>005 

0.06 

-042 

Untransformed 

.35 

.25 

-.14 

OSCAR  with  dusters 

.16 

.14 

-.13 

ATCOR 

•4? 

.23 

-.04 

OSCAR  with 
subclasses 

jO? 

.14 

-01 

Differences  between  signature  extension  and  heal  percentage  of 
wheat  at  i percent 


Untransformed 

0.37 

019 

0.21 

OSCAR  with  clusters 

.19 

.07 

.24 

ATCOR 

“48 

.16 

.34 

OSCAR  with 

.09 

.10 

.34 

subclasses 

Thresholding  rates  at  l percent 

Local 

-017 

0.09 

0.02 

Untransformed 

“-.83 

-.34 

-.16 

OSCAR  with  clusters 

-.22 

-.19 

*— .61 

ATCOR 

“-.50 

-.14 

“-.62 

OSCAR  with 

-.30 

-.07 

“-.76 

subclasses 

Difference  between  signature  extension  and  local  thresholding  rates 


Untransformed 

“—0.81 

-0.35 

-0.16 

OSCAR  with  clusters 

-.08 

-.21 

-.51 

ATCOR 

-.40 

-.16 

*-.57 

OSCAR  with 
subclasses 

-.12 

-.12 

“-.59 

Table XXIV. —Average  Wheat  Training  Field 
Accuracies  and  Standard  Deviations 
at  0-Percent  Thresholding 


Classification 

tecfmHpte 

Btowtndow 

Average 

/ 

2 

3 

4 

Average  values 

Local 

85.8 

81.2 

70.9 

87.6 

81.2 

Untranaformed 

74.9 

57.4 

33.0 

5733 

54.9 

OSCAR  with  dusters 

80.0 

52.5 

43.8 

50.4 

55.7 

ATCOR 

80.6 

52.5 

38.5 

58.7 

56.6 

OSCAR  with  subclasses 

72.7 

57.4 

34.1 

58.8 

55.2 

Average 

78.8 

60.2 

44.0 

615 

60.7 

Observed  standard  deviations 

Local 

14.5 

14.8 

6.9 

8.7 

Untransformed 

19.6 

34.9 

19.0 

33.4 

OSCAR  with  dusters 

19.6 

40.5 

25.9 

32J 

ATCOR 

20.1 

33.6 

22.3 

33.1 

OSCAR  with  subclasses 

25.2 

31.7 

16.8 

30.9 

Number  of  segments 

All  techniques 

6 

8 

7 

7 

<28)b 

*AU  Mtmcnu  uacd  in  the  uudy. 
'’Parentheies  indicate  tout. 


*Si|mf!c«nt»i  the  001  level. 


Table  XX  V.—A  verage  Nonwheat  Training  Field 
Accuracies  and  Standard  Deviations 
at  0-Percent  Thresholding 


i — td\  ■ ^ 1 1 ■ ■ ■ 

Classification 

technique 

Biowindow 

Average 

/ 

2 

S 

4 

Average  values 

Local 

92.0 

88.7 

86.7 

95.7 

90.7 

Untransformed 

84.0 

79.4 

78.5 

72.9 

78.5 

OSCAR  with  dusters 

83.3 

78.5 

67.9 

73.9 

75.7 

ATCOR 

84.2 

82.9 

67.1 

73.7 

76.9 

OSCAR  with  subclasses 

84.8 

78.3 

7U 

83.1 

79.1 

Average 

85.7 

81.6 

74.3 

79.8 

80.2 

Observed  standard  deviations 


Local 

5.7 

5.2 

7.2 

3.4 

Untransformed 

17.0 

14.0 

11.7 

15.9 

OSCAR  with  clusters 

16.4 

16.9 

14.5 

20.4 

ATCOR 

15.9 

14.6 

20.1 

16.7 

OSCAR  with  subclasses  15.2 

14.1 

10.9 

11.6 

Number  of  segments 

All  techniques 

6 

8 

7 

7 (28)b 

*AII  Mtflunu  uwd  in  the  study. 
^Parentheses  indicate  total. 


Table  XXVI.— Average  Wheat  Training  Field 
Accuracies  end  Stendord  Deviations 
at  l -Percent  Thresholding* 


Classification 

technique 

Biowtmktw 

Average 

1 

2 

i 

4 

Aver.ne  values 

Local 

85.5 

80.8 

70.4 

87.2 

m 

Untrani  fbraied 

68.2 

50.9 

26.7 

S6.3 

49.9 

OSCAR  with  dusters 

78.0 

49.4 

37.7 

49a 

52.6 

ATCOR 

76.8 

49.4 

34.1 

58.1 

536 

OSCAR  with  subclasses 

71.3 

54.0 

29.1 

S8.4 

52.6 

Average 

76.0 

56.9 

39.5 

62.0 

57.9 

Observed  standard  deviations 


Local 

14.3 

14.8 

7.1 

8.5 

Untransformed 

27.9 

36.0 

17.3 

33.2 

OSCAR  with  clusters 

20.8 

37.2 

mo 

32.1 

ATCOR 

24.7 

31.8 

18.2 

32.8 

OSCAR  with  subclasses  26.2 

29.0 

15.0 

30.8 

Number  of  segments 

All  techniques 

6 

8 

7 

7 (28)b 

*AII  Ktmenu  used  in  the  study 
“Pareniheaei  indicate  total. 
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Table  XXVII.— A mage  Nonwheat  Training  Field 
Accuracies  and  Standard  Deviations 
at  hPercent  Thresholding11 

Classification  BtowMdow  Average 

technique  

12  3 4 


Average  valuer 


Local 

91.8 

865 

86.2 

94.8 

89.6 

Untransfottfed 

74.S 

62.3 

67.3 

66.1 

67.1 

OSCAR  with  clusters 

8t.l 

73.6 

64.4 

614 

71.6 

ATCOR 

80.8 

76.9 

60.0 

68.2 

71.3 

OSCAR  with  subclasses 

82.8 

730 

67.1 

76.9 

74.6 

Average 

82.2 

74.4 

69.0 

74.9 

74.9 

Observed  standard  deviations 


Local 

SJ 

9.4 

7.5 

3.3 

Untransformed 

23.2 

19.1 

20.C 

20.0 

OSCAR  with  clusters 

16.6 

17.4 

14.6 

23.4 

ATCOR 

16.5 

12.? 

20.3 

20.2 

OSCAR  with  subclasses  15.5 

10.3 

13.6 

11.7 

Number  of  segments 

All  techniques 

6 

8 

7 

7 (28)b 

Mtmtnu  uwd  in  the  nudy. 
"iVtnthtta  indicate  total 


Table  XXVIII.— Thresholding  In  Wheat  Training 
Fields  and  Standard  Deviations 
at  l-Percent  Thresholding* 

OatUflcaOm  Bio  window  Average 

technique  ' 

12  3 4 


Average  values 


Local 

0.42 

0.49 

0.54 

0.52 

050 

Untranaformed 

11.7 

29.8 

16.4 

2.7 

15.8 

OSCAR  with  clusters 

3.6 

5.4 

7.9 

2.3 

4.9 

ATCOR 

7.S 

7.5 

8.0 

2.4 

6.3 

OST  \R  with  subclasses 

3.7 

5.0 

8.4 

10 

4.8 

Average 

5.4 

9.6 

85 

2.0 

6.5 

Observed  standard  deviations 


Local 

0.38 

058 

0.51 

0.28 

Untransformed 

20.9 

42.2 

25.6 

1.9 

OSCAR  with  clusters 

4.9 

6.3 

13.2 

2.2 

ATCOR 

13.2 

9.2 

10.5 

1.5 

OSCAR  with  subclasses 

5.4 

8.7 

18.3 

1.5 

Number  of  segments 

All  techniques  6 8 7 ? (28)b 


"All  segments  used  in  the  study 
^Parentheses  indicate  total. 
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Table  XXIX.— Thresholding  in  Nonwheat  Training 
Fields  and  Standard  Deviations 
at  l-Penent  Thresholding0 


Classification 

BtowMow 

Average 

technique 

/ 

2 

S 

4 

Average  values 

Local 

tt3l 

ass 

ttS7 

1.0 

0.62 

Untraniformed 

10.7 

21.S 

13.0 

9.7 

140 

OSCAR  with  clusters 

2.7 

7.8 

4.2 

8.7 

6.0 

ATCOR 

40 

6.7 

90 

83 

74 

OSCAR  with  subclasses 

2.S 

AS 

S.I 

7.1 

49 

Average 

4.1 

8.2 

63 

7.0 

60 

Observed  standard  deviations 


Local 

0.22 

0.32 

072 

13 

Untraniformed 

203 

270 

18.9 

83 

OSCAR  with  duster* 

3.3 

113 

3.6 

7.9 

ATCOR 

6.8 

8.9 

13.1 

7.7 

OSCAR  with  subclasses  3.3 

9.7 

SO 

5.1 

Number  of  segments 

All  techniques 

6 

8 

7 

7 (28)b 

*AU  «n»nu  uMd  In  tht  Hudy 
"fttMlhtm  mdtcsie  lout 


Table  XXX.— Classification  Accuracy  Results  for  tits  Geographical  Extension  Data  Set 


Stsmmtpetr  Whtat  accuracy,  ptrstnt  NonwMol  ocrurao’.  Mfttm  Omall  accuracy,  etrcmt 


UT 

MUST 

LOCAL 

UT 

MUST 

LOCAL 

UT 

MUST 

LOCAL 

BtowhMbwl 

1154/ 1025 

16.17 

53.36 

TIM 

4238 

•539 

83.95 

29.43 

6932 

•197 

1031/1025 

73.76 

91.37 

9832 

95.1  S 

9194 

9637 

•4.45 

95.16 

9734 

1170/1170 

91.71 

93J6 

91X54 

5731 

6230 

•3.73 

7436 

77.78 

87.39 

1889/1053 

9435 

94.29 

9035 

58.94 

56.81 

7117 

7174 

75.55 

•4.41 

1169/1033 

68.44 

79.11 

90.96 

89.95 

8239 

92.44 

7930 

8100 

91.70 

1168/1173 

6232 

SOM 

97.46 

4331 

7331 

9533 

5236 

61.70 

9140 

1174/1033 

80.46 

90.11 

98.75 

98.33 

97.89 

9118 

•930 

94.00 

98.47 

Biowindow  2 

1882/1811 

94.13 

95.24 

87.30 

6534 

7179 

8930 

7933 

8302 

88.45 

1864/1025 

70.89 

73.39 

93.07 

83.22 

8130 

89.71 

7706 

7739 

9139 

1882/1887 

0 

6439 

87.30 

1933 

6600 

8930 

931 

65.15 

8145 

1893/1891 

53.60 

5330 

8233 

6834 

7035 

84.13 

6132 

6107 

8333 

IIS3/187S 

74.12 

82.43 

80.48 

6239 

5102 

7030 

6151 

6932 

7534 

1880/1887 

.28 

23.20 

9430 

5139 

91.41 

88.40 

2933 

5730 

91.30 

1178/1180 

5230 

54.44 

89.95 

5236 

63.50 

75.12 

5153 

58.97 

•234 

BtowindowS 

1854/1852 

50.97 

47.15 

8037 

84.45 

85.11 

79.90 

67.71 

66.13 

8038 

1877/1875 

28.41 

41.47 

67.97 

71.99 

87.38 

77.92 

50.20 

6143 

7195 

1880/1875 

2235 

74.59 

75.14 

4035 

7408 

93.32 

3135 

7433 

8433 

1163/1165 

84.21 

7631 

88.30 

3032 

67.14 

61.46 

57.22 

71.87 

74J9 

1178/1165 

6632 

75.93 

91.82 

52.16 

52.83 

89.82 

59.49 

64.38 

90.82 

1172/1181 

16.61 

42.08 

64.04 

43.49 

45.15 

75.90 

3005 

4331 

69.97 

Bio  window  4 

1859/1861 

82.21 

83.50 

93.83 

56.64 

69.13 

87.99 

69.43 

7632 

90.9 1 

1032/1861 

56.76 

6933 

86.49 

74.45 

73.66 

9234 

6530 

71.46 

89.37 

1031/1027 

6.38 

6.15 

89.48 

4404 

46.95 

87.38 

2531 

2635 

8143 

1892/1885 

53.97 

52.98 

97.02 

78.06 

76.18 

97.53 

6601 

64.58 

9738 

1883/1884 

35.48 

6232 

98.71 

34.92 

4709 

99.47 

3530 

54.71 

9909 

1888/1879 

92.31 

8935 

95.46 

96.14 

95.64 

93.77 

9432 

92.50 

9432 

1176/1177 

92.32 

94.09 

89.34 

70.43 

64.50 

86.89 

81.38 

7939 

88.11 

794 


Table  XXXI.— M LEST  CJaulftcatton  Ptrfbmance 
Vmm  Btowlndow/br  the  (kognphical 
Extension  Data  Sat 


OftrrfM 

/ 

i S 4 

OlMff 

m»m» 

Ovarii!  accuracy 

ft.71 

1)3)  14.74  44) 

1035 

Hffcaal 
W Dill 

Ml 

144ft  144ft  546 

lOJft 

Nonwtrau  accuracy 

lOJl 

12.70  14.7ft  244 

ftfti 

Juawraa  a 

Ovaran  accuracy 

11.7J 

IDO  14.75  2646 

I7J2 

What  icttifin 

1)41 

244$  1D9  2733 

20.04 

NonwheM  accuracy 

1009 

12  S3  HJ)  243ft 

14.76 

Table  XXXII.— Proportion  Estimation  Resuhtftir  Pm  Geographical  Extension  Data  Set 


*>» 

BtMt 

trr 

MUST  LOCAL 

UT 

jmow  iocii 

t/r 

MOST  LOCAL 

1SS4/I02S 

VO 

31.7 

47J 

161 

374 

474 

390 

40 

67 

1031/1025 

4.7 

6.5 

161 

44 

63 

90 

10 

20 

10 

1176/1170 

6141 

569 

447 

39.1 

362 

444 

7.4 

20 

1.7 

109/103) 

410 

436 

32J 

41.3 

414 

32.7 

.9 

0 

4 

1169/1033 

1341 

190 

13.2 

12.9 

19.1 

140 

0 

1.3 

20 

II6S/II7J 

21 J 

141 

ISO 

134 

90 

140 

23.7 

90 

30 

1174/1033 

3tJ 

49.4 

49.7 

37.7 

469 

41.7 

14 

10 

30 

BtowSMfw} 

102/101 

36.9 

30.7 

340 

342 

490 

330 

40 

2.7 

3.1 

104/1023 

29.1 

262 

360 

230 

230 

320 

69 

90 

60 

102/107 

»i 

44.1 

340 

.1 

47.9 

330 

•69 

10 

5.1 

103/101 

16.9 

19.2 

45.1 

16,3 

166 

410 

10 

0 

3.4 

1133/1173 

44.7 

520 

430 

44J 

52.1 

440 

2.5 

10 

to 

100/107 

264) 

167 

190 

60 

17.9 

160 

41.4 

50 

63 

I17I/IIO 

463 

440 

290 

360 

43.7 

290 

60 

0 

20 

(towtodDw  i 


1634/1132 

330 

363 

410 

31.7 

290 

47.9 

30 

2.1 

0.7 

1177/1173 

29.3 

160 

34.4 

270 

170 

310 

40 

3.4 

4.1 

INMI75 

33.3 

31.9 

260 

240 

29.7 

230 

39.3 

6.1 

30 

1163/1163 

650 

320 

400 

634 

320 

464 

10 

1.4 

14 

1171/1165 

430 

430 

320 

430 

43.1 

31.1 

4 

0 

29 

1172/1111 

340 

520 

53.7 

170 

51.2 

42.7 

470 

2.1 

10 

Bkpwtndow  4 


1139/1161 

330 

41.5 

310 

320 

410 

363 

60 

1.4 

20 

1032/1161 

399 

430 

390 

394 

44.7 

390 

20 

24 

4 

1031/1027 

104 

169 

194 

9.3 

90 

19.1 

69 

44 

20 

1192/1  M3 

34.4 

340 

364 

324 

330 

290 

30 

20 

1.7 

IM3/1H4 

461 

530 

62.5 

469 

520 

569 

2.9 

20 

14.1 

INI/1179 

520 

461 

620 

369 

464 

56.7 

3.1 

5.7 

54 

1176/1177 

474 

530 

40.7 

467 

520 

464 

2.5 

20 

19 

796 


Table  XXXUI^Moob  Wheat  Ptoporttoa  firtnwx 
Diffktmeu  VtmaBbwBMbwfirBie 
Geographical  Exmaim  Data  Set 


% 

hcr 

IV0C4t“ 

’won 

/fmwi 

hi«C4t“ 

«i/rt 

K«*4i" 

4*ftorl 

• 

1 

MU4 

4.17 

IIJ» 

4.74 

1 

2 

I3J3 

124 

ISJ4 

12.14 

ft 

J 

14.47 

1141 

tut 

13.32 

4 

447 

9JU 

Sj47 

130 

» 

Oman 

IIJ7 

1023 

1241 

1004 

Wfifwtw  tor  ftt  Hww  In  c— 4tt1n», 


iol b 1 1 1 1 l_t 1 

tftft  140  US  210  24ft  2S0  lift  ISO  345  420  «4S  MO  525  540  545 


tOCfWI 

M*  22 

COM*  341  IftRERCENT  CONFIDENCE  LIMITt  1003.  0441 

mm  id  reorewom  uni  iusmo 
M 24144  13432  X • 34222*  V*  14443  142  34 

V 33.422  14  234  V • 37*37*  K*  21 500  14401 

FIGURE  4.— OSCAR  with  clusters  versus  local  percentage  of 
wheat  in  the  segment  at  1-percent  thresholding. 


tat  140  12  4 214  24  5 244  315  14  0 345  42  0 45  5 440  52  5 540  545 
LOC4W1 


H • V 

COM  • 401  *4*ERCENT  CONFIDENCE  IU1IT4  <002  0441 

ME  AM  SO  MEOMC44IOM  LIME  RES  Ml 

X 24444  13432  X • 34444**.  12414  14212 

V 31224  14020  V • 41413* X*  14344  12720 

FIGURE  5.— ATCOR  versus  local  percentage  of  wheat  In  the 
segment  gt  l -percent  thresholding. 


MEAN  SO  REGRESSION  LIME  RES  MS  MEAN  SO  REGRESSION  LINE  RES  MS 

X 30414  13444  X - 31404*  *»  20  520  *441  * 32  422  U334  X ■ «4M**v  5 2545  *1047 

V 31530  14300  V-  3S430*X*  30  214  118  42  * 31  230  14020  V • I37***X.  4 3*50  59  621 


FIGURE  6.— OSCAR  with  subclasses  versus  local  percentage  of  FIGURE  7.— ATCOR  versus  OSCAR  with  clusters  percentage  of 

wheat  in  the  segment  at  I-percent  thresholding.  wheal  at  1-percent  thresholding. 
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FIGURE  8.— ATCOR  percentage  of  wheal  In  (he  segment  at  1- 
pereent  thresholding  versus  ehange  In  hare  levels. 
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FIGURE  9 — Differences  between  ATCOR  and  local  percental? 
of  wheat  In  the  segment  at  1 -percent  thresholding  versus  change 
in  hare  levels. 
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COR-  03  880RCINI  CONFHHNCt  ilMIIS  I OtZ  08*1 
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FIGURE  10.— Untransformed  thresholding  rale  at  1 -percent 
thresholding  versus  change  in  haze  levels. 
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FIGURE  11.— ATCOR  thresholding  rates  at  1 -percent 
thresholding  versus  change  in  haze  levels. 
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FIGURE  12.— Differences  between  untransformed  and  local 
thresholding  rates  at  l-percent  thresholding  versus  change  in 
haze  levels. 
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Signature  Extension  Methods  in  Crop  Area 

Estimation 

R.  / Kauri?  and  W.  Richardson 


INTRODUCTION 

An  encompassing  rationale  for  crop  area  estima- 
tion over  large  regions  using  remotely  sensed  data 
has  been  developed  over  a period  of  several  years  as 
a result  of  the  stimulation  of  LACIE  and  various 
supporting  research  institutions.  Currently,  most  ele- 
ments of  the  general  idea  have  been  implemented  at 
the  Environmental  Research  Institute  of  Michigan 
(ERIM)  in  a multispectral  scanner  (MSS)  processing 
system  called  Procedure  B.  This  paper  describes  this 
general  idea,  shows  how  various  research  efforts 
have  contributed  to  its  development,  and  indicates 
the  elements  of  the  rationale  that  are  implemented  in 
Procedure  B. 

During  the  last  decade,  remote-sensing  specialists 
have  witnessed  a convergence  of  several  disciplines 
in  the  development  of  MSS  data  processing  tech- 
niques. The  first  techniques  used  in  the  classification 
of  MSS  data  were  methods  of  multivariate  pattern 
recognition  using  a Gaussian  signature  model  (ref. 
1).  Over  this  period  of  time,  emphasis  has  been 
placed  on  trying  to  understand  the  underlying  physi- 
cal reasons  for  the  structure  of  remotely  sensed 
multispectral  data  (refs.  2 through  9),  on  compensat- 
ing for  interfering  external  effects  (ref.  10),  and  on 
trying  to  apply  MSS  “signatures”  obtained  by  train- 
ing in  one  area  to  a wider  region  (ref.  11).  Finally, 
there  has  been  a series  of  statistically  based  attempts 
to  estimate  areal  portions  or  acreages  based  directly 
upon  the  signatures  and  the  multispectral  data  (i.e., 
without  creating  classification  maps)  (ref.  12). 

The  first  viable  attempt  to  combine  a traditional 
pattern  recognition  approach  and  a statistically  based 
stratified  sampling  approach  was  performed  for 
wheat  versus  nonwheat  at  the  NASA  Johnson  Space 


Environmental  Research  Institute  of  Michigan,  Ann  Arbor, 
Michigan. 


Center  (JSC)  during  the  LACIE  program  (ref.  13). 
The  resulting  technique  is  called  Procedure  1 (rtf. 
14).  In  this  procedure,  a classification  map  is  pro- 
duced by  a clustering  technique  based  on  labeled 
samples.  Next,  the  areal  proportion  estimate  repre- 
sented by  that  classification  map  is  “debiased"  by 
using  additional  labeled  samples  to  estimate  the  per- 
centage of  correct  classifications  in  each  of  the  two 
mapped  classes.  In  fact,  this  “bias  correction”  step  in 
Procedure  1 is  precisely  equivalent  to  stratified  sam- 
pling from  the  two  classes  or  “strata”  created  by  the 
operation  of  the  classifier.  The  very  existence  of  Pro- 
cedure 1 is  forcing  a fundamental  change  in  the 
research  community’s  understanding  of  concepts 
that  have  been  taken  for  granted:  classifiers,  sig- 
natures, and  estimation  of  acreage  (or  yield). 

During  this  same  period,  a sequence  of  improve- 
ments in  understanding  the  physical  structure  of 
multispectral  data  has  led  to  the  development  of 
effective  automated  procedures  for  screening  Land- 
sat  multispectral  data  (to  identify  garbled  data  or  pix- 
els that  are  cloud,  cloud  shadow,  water,  etc.),  for  cor- 
recting for  some  of  the  significant  external  effects 
(varying  solar  zenith  angle  and  varying  amounts  of 
ha2e  over  the  scene),  for  extracting  the  most  signifi- 
cant spectral  features  (“tasseled  cap”)  from  Landsat 
data,  and  for  extracting  spatial  features 
(pseudofields)  from  the  data.  These  improvements 
in  preprocessing  as  well  as  the  state-of-the-art 
stratified  sampling  aspects  of  Procedure  1 have  been 
incorporated  in  Procedure  B. 

Differences  between  Procedure  B and  Procedure  1 
are  that  Procedure  B is  both  a multisegment  and  a 
multistratum  procedure.  Multisegmept  means  that 
Procedure  B uses  data  from  several  LAClB-sized  seg- 
ments together  and  makes  a proportion  estimate  for 
the  entire  group  of  segments  as  well  as  for  the  in- 
dividual segments.  Multistratum  means  that,  in  the 
process  of  clustering  data  features.  Procedure  B pro- 
duces multiple  classes  or  strata  rather  than  just  two 
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strata  (as  in  Procedure  1),  and  performs  stratified 
sampling  on  each  of  these  multiple  strata  in  order  to 
make  a proportion  estimate. 

In  the  following  sections,  a detailed  description  of 
Procedure  B is  given,  the  test  results  to  date  for  the 
components  and  for  the  overall  performance  of  Pro- 
cedure B are  presented,  and  the  conclusions  that  can 
be  drawn  from  these  tests  are  discussed.  In  the  final 
section,  the  overall  rationale  of  signature  extension 
for  crop  area  estimation  is  summarized. 


DESCRIPTION  OP  PROCEDURE  B 

This  section  contains  a description  of  Procedure 
B.  Some  of  the  more  detailed  aspects  are  documented 
by  reference  to  other  papers  in  this  LACIE  sym- 
posium. 


Broad  Description 

The  objective  is  to  develop  improved  techniques 
for  estimating  the  amount  of  an  agricultural  crop 
present  in  a large  geographical  region  or  partition. 
Specifically,  the  goal  is  to  define  a training  and 
classification  procedure  (or  more  generally,  a label- 
ing and  estimation  procedure)  that  will  allow  valid 
estimates  for  the  region  to  be  made  on  the  basis  of 
training  information  obtained  from  a few  segments. 
Thus,  the  data  to  be  processed  are  of  two  types:  (1)  a 
small  amount  of  labeled  training  data  from  some  seg- 
ments in  the  region  and  (2)  a large  amount  of 
unlabeled  data  from  these  and  other  segments. 

Procedure  B is  a specific  technique  of  proportion 
estimation  that  tells  an  analyst  which  scene  elements 
to  label  and  then  uses  those  labels  in  an  unbiased  way 
to  produce  a proportion  estimate  for  the  scene.  The 
extent  to  which  Procedure  B is  used  to  perform  sig- 
nature extension  is  discussed  later. 

The  fundamental  concept  of  proportion  estima- 
tion in  Procedure  B is  similar  to  the  concept  used  in 
Procedure  1 (ref.  14);  namely,  a stratified  sampling 
technique  performed  in  the  spectral  feature  domain. 
There  are.  however,  major  differences  in  concept  be- 
tween Procedure  B and  Procedure  1.  The  steps  in 
Procedure  B are  as  follows. 

1.  Data  normalization:  There  is  a preprocessing 
stage  in  which  data  are  screened  and  corrected  for 
effects  of  satellite  calibration.  Sun  angle,  and  haze 
over  the  scene. 


2.  Feature  extraction:  Spectral  and  spatial 
features  are  extracted  from  the  multispectral  image 
data,  and  these  features  are  augmented  with  ancillary 
information  such  as  weather  and  crop  calendar  data. 

3.  Stratification  of  the  feature  space:  An  unsuper- 
vised clustering  algorithm  divides  the  feature  space 
into  domains  or  strata.  The  number  of  strata  pro- 
duced is  larger  than  the  two  (wheat  versus  non- 
wheat) of  Procedure  1.  Typically,  for  single  seg- 
ments, the  number  of  strata  produced  is  around  40. 

4.  Multisegment:  The  stratification  may  be  per- 
formed for  feature  vectors  encompassing  several 
sample  segments.  Thus,  spectrally  similar  features 
from  several  sample  segments  may  be  assigned  to 
the  same  stratum.  It  is  in  this  sense  that  Procedure  B 
performs  signature  extension. 

5.  Sample  selection:  Certain  numbers  of  samples 
are  allocated  to  each  stratum.  Feature  vectors  are 
randomly  drawn  from  each  stratum  and  then  are 
identified  using  analyst  (or  ground  truth)  labels  in  a 
production  (or  research)  version  of  the  procedure. 

6.  Proportion  estimation:  From  the  identified 
samples,  proportion  estimates  are  made  for  each 
stratum,  for  the  entire  group  of  segments,  and  for 
each  segment  in  the  group. 

7.  Performance  monitoring:  Each  component  of 
the  procedure  (in  the  research  version)  is  monitored 
for  performance  according  to  criteria  of  unbiased- 
ness and  low  variance. 

Procedure  B and  some  of  its  differences  from  Pro- 
cedure 1 are  discussed  in  more  detail  in  the  following 
subsections:  Preprocessing  and  Feature  Extraction, 
Stratification  Procedure,  Allocation  of  Samples,  and 
Proportion  Estimation. 


Preprocessing  and  Feature  Extraction 

The  objectives  of  the  preprocessing  and  feature 
extraction  steps  may  be  several  (ref.  6). 

1.  To  make  the  data  more  comprehensible  by  ad- 
justing all  of  them  to  standard  conditions  of  observa- 
tion 

2.  To  eliminate  or  flag  bad  or  noisy  observations 
in  the  data 

3.  To  make  the  data  more  comprehensible  by  ex- 
tracting physically  meaningful  features  or  projecting 
the  data  in  such  a way  as  to  display  their  physical 
structure 

4.  To  compress  the  data,  retaining  most  of  the  in- 
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formation  and  averaging  out  noise  and  redundancy 

S.  To  make  the  distributions  of  the  derived 
features  fit  some  convenient  model  such  as  the 
multivariate  normal  distribution  (This  step  is  not  in 
the  current  implementation  of  Procedure  B.) 

To  define  features  to  be  extracted,  the  major 
characteristics  of  the  data  must  be  kept  in  mind. 
Remotely  sensed  data  have  three  main  attributes: 
spectral,  spatial,  and  temporal.  The  spectral  profile 
for  each  picture  element  (pixel)  is  provided  by  the 
MSS.  The  spatial  characteristics  include  a pixel  scan 
line  and  point  number  and  the  position  of  the 
LACIE  segment  in  the  region.  The  temporal  charac* 
teristics  include  the  changes  associated  with  the 
passage  of  time  during  the  growing  season. 

The  problem  of  estimating  and  classifying  in  a 
wide  region  is  complicated  by  several  sources  of 
variation  in  the  data. 

1.  Systematic  external  effects,  such  as  haze,  view- 
ing angle.  Sun  zenith  angle,  and  scanner  calibration 

2.  Effects  upon  particular  crops  of  ancillary  varia- 
bles (such  as  moisture,  growing  degree  days,  and 
crop  calendar)  which  are  observable  for  an  entire 
segment 

3.  Random  noise  due  to  scanner  noise,  to  within- 
site  variation  of  an  ancillary  variable  whose  site 
average  is  known,  or,  finally,  to  variation  in  underly- 
ing ancillary  conditions  which  are  not  being  cur- 
rently observed  but  which  are  significant  in  their 
effects 

Regarding  random  noise,  the  noise  properties  ex- 
hibited by  MSS  data  are  generally  highly  correlated 
spatially.  A simple  example  is  provided  by  the  fact 
that  the  within-field  variance  of  the  MSS  signals  is 
less  than  the  between-fieid  variance  of  multiple  ex- 
amples of  the  same  crop;  hence,  the  choice  of  a 
reasonable  number  of  pixels  from  a single  field  with- 
in a segment  may  constitute  insufficient  training  for 
that  segment.  Similarly,  the  choice  of  a single  seg- 
ment and  the  fields  within  that  segment  as  a training 
data  base  for  a group  of  segments  may  constitute  in- 
sufficient training  for  that  group  of  segments.  For 
this  reason,  development  of  a training  procedure  has 
proceeded  on  the  assumption  that  multiple  segments 
are  needed  for  training  to  represent  the  variability  of 
data  present  in  a group  of  segments. 

Some  of  the  data  variation  caused  by  external 
effects  can  be  removed  by  correcting  for  known  ex- 
ternal effects,  so  that  all  data  are  transformed  to  a 
standard  reference  condition.  In  Procedure  B,  the 
data  are  screened  so  that  garbled  data  and  pixels  con- 


taining clouds,  cloud  shadows,  and  water  are 
detected  and  flagged,  and  a haze  diagnostic  is  com- 
puted over  the  array  of  good  data  points  (see  the 
paper  by  Kauth  et  a!,  entitled  “Feature  Extraction 
Applied  to  Agricultural  Crops  as  Seen  by  Landsat"). 
Next,  the  data  are  corrected  for  differences  in 
satellite  calibration  (i.e.,  all  Landsat- 1 data  are 
modified  to  simulate  Landsat-2  data),  for  Sun  zenith 
angle  (i.e.,  all  data  are  made  to  look  as  though  they 
were  gathered  with  a Sun  zenith  angle  of  39°),  and 
haze  (all  data  are  transformed  to  a standard  haze  con- 
dition) (see  the  paper  by  Lambeck  and  Potter  entitled 
“Atmospheric  Effects  Compensation  for  LACIE 
Data”). 

The  noise  variation  can  be  lessened  and  operating 
efficiency  greatly  increased  by  adopting  methods  of 
data  compression  that  preserve  useful  information 
while  averaging  out  noise.  In  Procedure  B,  the  data 
are  compressed  in  two  ways,  spectrally  and  spatially. 

Spectral  compression  is  accomplished  by  a linear 
transformation  of  the  four  Landsat  bands  through  a 
matrix  rotation  called  the  tasseled-cap  transform,  or 
the  Kauth-Thomas  transform  (see  the  paper  by 
Kauth  et  al.).  Most  of  the  significant  information 
regarding  agricultural  scenes  has  been  found  to  lie  in 
the  plane  defined  by  the  first  two  components  of  the 
resulting  transformed  data  (ref.  IS);  hence,  these 
components  are  retained  and  the  last  two  compo- 
nents are  discarded,  resulting  in  a data  compression 
by  a factor  of  2. 

Spatial  averaging  is  accomplished  by  grouping 
together  pixels  that  are  near  to  each  other  and 
spectrally  similar.  These  fleldlike  groups  are  referred 
to  as  “blobs"  (ref.  16).  The  spectral  average  of  the 
group  of  pixels,  the  number  of  pixels  in  the  group, 
and  the  average  spatial  position  of  the  group  are  re- 
tained as  data  features  describing  that  group  (blob). 
As  a result  of  “blobbing,"  the  data  are  further  com- 
pressed  by  a factor  of  30. 

The  final  step  in  feature  definition  is  to  associate 
with  each  blob  certain  ancillary  data  that  vary  from 
segment  to  segment,  such  as  view  angle,  crop  calen- 
dar, available  soil  moisture,  and  latitude  and 
longitude.  The  idea  is  that,  to  the  extent  possible,  the 
physical  factors,  that  affect  the  MSS  data  should  be 
associated  with  those  data,  whereas  an  arbitrary  fac- 
tor, the  segment  identification,  should  be  ignored.  In 
Procedure  B,  the  ancillary  data  features  are  treated  as 
equal  to  the  other  features,  and  the  net  result  is  to 
perform  a “soft"  geographic  partitioning  to  be  dis- 
cussed further  in  later  sections. 
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Stratification  Procedure 

The  words  “stratification,”  “partitions,”  “groups,” 
etc.,  have  been  heavily  used  during  the  course  of 
LACIE,  particularly  in  discussions  of  signature  ex- 
tension. At  this  point,  about  all  one  can  do  is  to 
define  and  use  these  terms  in  a consistent  way.  In 
general,  to  stratify  is  to  divide  the  space  of  observa- 
tion into  mutually  exclusive  regions  (called  strata) 
preparatory  to  making  some  kind  of  estimate  sepa- 
rately within  each  of  these  regions.  This  definition 
includes  diverse  concepts  within  its  scope,  as  the 
following  examples  show. 

The  first  step  in  the  application  of  Procedure  B 
ought  to  be  to  limit  the  range  of  application  to  a large 
region  of  moderately  constant  ancillary  data  condi- 
tions. The  region  might  be  one-third  to  one-half  the 
size  of  Kansas  and  might  contain  from  10  to  60 
LACIE  sample  segments.  A number  of  different  ap- 
proaches to  such  large-scale  geographic  partitioning 
have  been  developed  (see  the  paper  by  Thomas  et  al. 
entitled  “Development  of  Partitioning  as  an  Aid  to 
Spectral  Signature  Extension”  and  the  paper  by 
Hallum  and  Basu  entitled  “Natural  Sampling 
Strategy”).  In  general,  these  partitions  can  be  thought 
of  as  strata  defined  on  the  space  of  the  ancillary 
variables. 

In  Procedure  1,  a wheat-nonwheat  classifier  sepa- 
rates the  data  into  two  classes.  Here,  the  classes  pro- 
duced can  be  regarded  as  strata  defined  on  the  space 
of  the  spectral  variables. 

In  Procedure  B,  the  strata  are  domains  defined  on 
the  feature  space,  which  includes  both  spectral- 
spatial  features  and  ancillary  features.  Hence,  there  is 
a spectral  stratification  similar  to  Procedure  1 and  at 
the  same  time  a “soft”  geographic!  partitioning.  Fur- 
thermore, the  strata  are  not  recombined  into  two 
classes  as  in  Procedure  1 but  are  left  as  multiple 
strata. 

An  unsupervised  clustering  technique  is  used  to 
group  the  blobs  into  strata.  The  algorithm,  called 
BCLUST  (i.e.,  blob-clustering),  consists  of  the 
following  steps. 

1.  The  blobs  from  a number  of  segments  are  or- 
dered randomly.  Omitted  from  the  list  are  the  so- 
called  small  blobs,  which  are  blobs  that  have  no  in- 
terior pixels.  (An  interior  pixel  is  one  that  faces  pix- 
els from  the  same  blob  on  all  four  sides.)  The  small 
blobs  (usually  stringy  boundary  areas  between 
fields)  are  omitted  because  they  are  difficult  to  label 
and  subject  to  registration  errors.  The  blob  algorithm 
parameters  are  set  so  that  only  a small  proportion  of 


the  pixels  in  the  segment  are  contained  in  small 
blobs. 

2.  The  big  blobs  are  clustered  using  the  spectral 
mean  vectors  and  the  ancillary  variables  jointly.  The 
spectral  means  are  computed  from  only  the  interior 
pixels  because  these  pixels  give  a purer  representa- 
tion of  the  crop  present  in  the  blob.  The  distance 
measure  used  in  the  clustering  is 
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where  is  the  data  vector,  Ku 

xnctianj is  lhe  mean  vector  of  cluster  / and  nchan  is  the 
total  number  of  spectral  and  ancillary  variables. 

3.  For  a given  channel  J,  the  weight  w,  is  constant; 
i.e.,  not  varying  from  cluster  to  cluster.  The  means 
are  updated  for  clusters  as  new  points  are  added. 
The  clustering  parameters  are  chosen  to  produce  be- 
tween 40  and  100  groups  of  blobs.  These  groups  of 
blobs  are  termed  B-clusters. 


Training  Selection  Procedure 

In  the  current  configuration  of  Procedure  B,  the 
term  “training  selection”  is  a misnomer.  Neverthe- 
less, a discussion  of  how  this  terminology  came  into 
being  is  instructive. 

In  the  initial  development  of  Procedure  B,  it  was 
intended  that  signature  extension  would  be  ac- 
complished through  a process  of  training  a classifier 
on  a subset  of  the  segments  in  a large  region.  At  that 
time,  it  was  believed  that  it  might  be  necessary  to 
train  on  several  segments  simply  to  acquire  robust 
signatures;  the  question  being  addressed  then  was 
this:  Without  reference  to  ground  truth,  which  seg- 
ments and  which  feature  vectors  (blobs)  within  the 
segments  should  be  chosen  for  training?  The  prob- 
lem was  to  choose  training  segments  and  blobs  with- 
in those  segments  such  that  there  was  sufficient 
training  for  each  class,  even  though  the  information 
about  class  membership  was  not  available  at  the  time 
the  choice  of  segments  was  made.  Faced  with  this 
problem,  it  at  least  seemed  reasonable  to  choose 
training  blobs  representative  of  the  distribution  of 
unlabeled  feature  vectors. 

The  blob-clustering  algorithm  described  in  the 
previous  section  was  originally  developed  to  provide 
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a convenient  representation  of  the  empirical  fre- 
quency function  of  the  feature  vectors  in  the  entire 
region  and  within  each  segment.  The  procedure  for 
training  segment  selection  considered  many  possible 
combinations  of  segment  choices  and,  for  each  one, 
computed  a value  function  according  to  certain 
heuristically  derived  rules.  The  rules  were  based  on 
the  idea  that  training  segments  should  be  chosen 
that,  in  combination,  would  provide  several  bibbs  to 
be  labeled  within  the  domain  of  every  important  B- 
cluster.  The  details  of  the  segment  selection  and  blob 
selection  algorithm  that  accomplished  this  type  of 
selection  are  given  in  reference  17. 

Having  made  the  training  selections,  it  was  origi- 
nally intended  to  create  signatures,  classify  all  the 
blobs  into  two  classes,  and  finally  correct  for  bias 
over  all  the  segments  by  drawing  and  labeling  addi- 
tional samples,  as  in  Procedure  1.  However,  in  order 
to  obtain  an  early  check  on  the  performance  of  the 
segment  selection  program  alone,  from  a set  of 
labeled  feature  vectors  on  hand,  it  was  decided  to  go 
directly  to  proportion  estimation  by  making  esti- 
mates for  each  B-cluster  and  aggregating  those  esti- 
mates. It  now  appears  that,  in  the  absence  of  good 
prior  signature  information,  the  labeling  effort  is 
more  efficiently  used  if  samples  are  drawn  from 
strata,  labeled,  and  used  directly  in  proportion 
estimation  (refs.  18  and  19). 

The  term  “training  selection”  is  a misnomer 
because,  in  its  current  configuration.  Procedure  B has 
no  training  phase.  Samples  drawn  are  only  for  the 
purpose  of  making  a stratified  sample  estimate  (bias 
correction),  not  for  training.  The  segment  selection 
and  blob  selection  algorithms  have  been  modified  to 
optimize  sampling  efficiency  in  view  of  this  revised 
purpose. 

If  the  best  efficiency  is  found  by  just  sampling, 
one  may  ask  what  has  happened  to  the  concept  of 
training?  How  much  prior  information  is  needed  to 
make  training  useful?  These  issues  certainly  deserve 
further  discussion  by  the  entire  community. 


Allocation  of  8amplaa 

The  Procedure  B selection  algorithm  has  three 
steps.  First,  a choice  of  segments  is  made  that  is  best 
with  respect  to  a certain  value  function.  Second,  the 
number  of  blobs  to  be  labeled  in  each  chosen  segment 
represented  in  each  B-cluster  is  specified.  Finally,  the 
particular  blobs  to  be  labeled  are  chosen  at  random 
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from  those  available  in  each  chosen  segment  repre- 
sented in  each  B-cluster. 

Segment  selection .—  The  objective  of  the  segment 
selection  algorithm  is  to  provide  a supply  of  blobs 
from  which  proportional  allocation  can  be  made,  as 
described  in  the  next  section.  The  algorithm  for  seg- 
ment selection  is  as  follows. 

1.  Choose  the  number  S of  segments  to  be 
selected. 

2.  For  each  possible  set  of  5 segments,  compute  a 
value  function. 


value  * ^ (2) 
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where  N,  is  the  number  of  pixels  in  the  Ah  B-cluster 
and  Hj  is  the  “hit  function”  measuring  how  well  the 
choice  of  segment  intersects  the  Ah  B-cluster.  The  hit 
function  is  calculated  as  follows. 

1.  Count  a hit  for  a segment  only  if  the  segment 
has  at  least  LB  blobs  in  the  Ah  B-cluster  (typically, 
use  LB  “ 2). 

2.  Count  IF,  for  the  first  segment  hit,  W2  for  the 
second  segment  hit,  and  so  on;  Ht  is  the  sum  of  the 
W's  for  B-cluster  / (typically,  W «■  50, 10, 2, 0, 0 . . .). 

The  reason  for  introducing  the  parameter  LB  is 
that  it  would  not  be  desirable  to  pick  a segment 
which  had  only  one.  perhaps  spurious,  example  in  it 
to  represent  a B-cluster.  The  choice  of  a relatively 
large  value  for  Wx  is  to  ensure  that  the  segments 
chosen  will  first  of  all  do  a good  job  of  getting  at  least 
one  blob  for  training  in  almost  every  B-cluster  with 
emphasis  on  the  large  B-clusters.  But  when  these ob- 
jectives are  mostly  satisfied,  then  the  weights  W2, 
W3,  and  so  on,  ensure  that  the  larger  B-clusters  will 
be  represented  by  more  than  one  segment  if  possible. 

Determination  of  numbers  of  blobs  to  be  labeled. — In 
the  case  where  single  pixels  are  sampled  and  where 
there  is  no  prior  information  about  the  true  propor- 
tion Pt  in  each  stratum,  it  can  be  shown  that  the 
minimum  variance  allocation  of  a fixed  total  number 
of  pixels  is  to  allocate  them  in  proportion  to  the 
number  of  pixels  in  each  stratum.  Procedure  B ap- 
plies this  idea  in  determining  the  numbers  of  blobs  to 
be  labeled  in  each  B-cluster  and  chosen  segment. 

First,  it  is  decided  how  many  blobs  are  a reasona- 
ble number  to  be  labeled.  This  decision  involves  the 
weighing  of  costs  and  is  outside  the  scope  of  this  re- 
port. This  number  of  chosen  blobs  is  then  divided 
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among  B-clusters  in  proportion  to  the  number  of  pix- 
els in  each. 

The  number  of  blobs  in  each  B-cluster  is  then  split 
up  among  the  chosen  segments  represented  in  that 
cluster.  The  split  is  proportional  to  the  number  of 
blobs  in  each  “cell”  (intersection  of  the  ^cluster  and 
a segment).  The  purpose  of  this  proportional  alloca- 
tion within  B-clusters  is  to  make  the  estimation  for- 
mula approximately  unbiased  (see  the  following  sec- 
tion) and  to  approximate  a minimum  variance 
allocation. 

Choice  of  blobs  to  be  labeled.— After  the  number  of 
training  blobs  allocated  to  each  B-cluster/segment 
cell  has  been  determined,  Procedure  B is  used  to 
make  the  actual  choice  at  random  from  all  the  blobs 
in  each  cell.  The  method  of  random  selection  that  is 
used  ensures  that  the  crop  proportion  of  the  pixels  in 
the  sampled  blobs  is  an  unbiased  estimate  of  the  crop 
proportion  of  the  pixels  in  the  ceil.  Rather  than  a 
simple  random  sample  of  blobs,  which  results  in  a 
biased  estimate,  the  method  is  to  choose  the  first 
blob  with  probability  proportional  to  size  and  the 
others  (if  any)  with  equal  probability  (ref.  20). 


Proportion  Estimation 


(/A th  cell  (i.e,  the  intersection  of  B-cluster  / and  seg- 
ment./) and  A^is  the  number  of  pixels  in  the  (tj)  th 
cell.  Finally, 
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where  is  the  proportion  of  wheat  in  the  Mi  sam- 
ple blob  in  the  (/,/) th  cell  and  is  the  number  of 
pixels  in  the  Ath  sample  blob  in  the  (lj)xh  cell. 

The  formula  for  the  proportion  estimate  of  a 
single  segment  (say  the  Ah  segment)  on  the  basis  of 
the  multisegment  B-cluster  is 
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where  M is  the  number  of  B-clusters  and  Nj  is  the 
number  of  pixels  in  the  Ah  segment 


Proportion  estimates  can  be  made  either  for  in- 
dividual sample  segments  or  for  an  entire  group  of 
segments  to  which  Procedure  B has  been  applied. 
The  proportion  estimation  formula  for  a group  of 
segments  is 

M A 

x>/, 

»"T—  <3> 

/«! 

where  fyis  the  estimated  proportion  of  wheat  in  the 
/th  B-cluster,  M is  the  number  of  B-clusters,  and  N,  is 
the  total  number  of  pixels  in  the  Ah  B-cluster. 


V2>*  <’> 
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If  these  estimates  are  aggregated  over  the  entire 
group  of  segments,  f is  obtained  again;  i.e.. 


a G N, 

* 9 E-jvf «/  <8) 

/-i 

where  G is  the  number  of  segments. 

Inserting  equation  (6)  into  equation  (8)  and 
reversing  the  order  of  summation,  the  following  is 
obtained. 


(4) 


where  P0is  the  estimated  proportion  of  wheat  in  th 


which  is  equivalent  to  equation  (2). 


(9) 
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Thus,  the  same  proportion  estimate  Tor  a group  of 
segments  is  obtained  whether  proportion  estimates 
are  made  on  single  segments  tuid  then  aggregated  or 
whether  a single  estimate  is  made  directly  for  the  en- 
tire group  of  segments.  These  single-segment  esti- 
mates based  on  multisegment  B-dustas  are  not  the 
same  estimates  as  those  obtained  by  running  Proce- 
dure B on  single  segments,  because  the  latter  are 
basal  on  single-segment  B-clusters. 

Bias  of  the  estimator.— In  this  section,  it  is  first 
shown  that  the  proportion  estimator  is  an  unbiased 
estima’e  of  the  crop  proportion  measured  on  the  pix- 
els of  the  big  blobs.  (Recall  that  a big  blob  is  a blob 
that  has  at  least  one  interior  pixel  and  that  mi  interior 
pixel  is  one  that  faces  pixels  from  the  sam  s blob  on 
all  four  sides.)  The  blobs  were  drawn  from  each  B- 
cluster/segment  cell  in  such  a way  that  the  crop  pro- 
portion in  the  sample  blobs  is  an  unbiased  estimate 
of  the  crop  proportion  in  the  cell. 

The  cell  proportions  are  combined  into  a cluster 
proportion  by  the  same  weighted  average  operation 
that  combines  the  cell  estimates  into  a cluster  esti- 
mate. Therefore,  the  cluster  estimate  is  an  unbiased 
estimate  of  the  cluster  proportion.  Again,  the  duster 
proportions  are  combined  into  an  overall  proportion 
by  the  same  weighted  average  operation  that  com- 
bines cluster  estimates  into  an  overall  estimate. 

Thus,  the  overall  estimate  is  an  unbiased  estimate 
of  the  overall  proportion.  Thi*  conclusion  applies 
only  to  the  pixels  in  the  bis  blobs.  Leaving  out  the 
small  blobs  introduces  a oias. 

In  the  research  version  of  Procedure  B,  the  wheat 
proportion  of  each  selected  blob  is  measured  exactly 
from  a pixel-by-pixel  specification  of  the  ground 
truth  and  thus  has  no  bias.  This  proportion  would 
not  be  easy  to  estimate  in  practice  because  the  pixels 
on  the  edge  of  a blob  are  likely  to  be  on  or  near  a field 
boundary  and  hence  subject  to  multitemporal 
registration  error  and  mixed  spectral  response. 
Therefore,  two  other  more  realistic  estimates  of  the 
wheat  proportion  of  a blob  have  been  considered. 
One  is  to  estimate  the  proportion  from  the  ground 
truth  of  the  blob  interior  (i.e.,  the  interior  pixels). 
The  other  is  to  label  the  blob  purely  wheat  or  not 
wheat  on  the  basis  of  the  percentage  of  wheat  in  the 
blob  interior.  The  two  estimates  are  very  much  alike 
because  the  tests  on  Kansas  and  North  Dakota  seg- 
ments show  that  the  blob  interiors  are  generally  pure. 

The  bias  in  the  proportion  estimate  for  a segment 
when  using  either  of  these  two  practical  methods  of 


estimating  the  wheat  proportion  in  a blob  is 


* ■ h - rs)  <"» 


where  Ms  the  number  of  pixels  in  the  segment,  Ns is 
the  number  of  pixels  in  small  blobs,  PB  is  the  true 
proportion  of  wheat  on  the  big  blob  pixels,  Ps  ts  the 
true  proportion  of  wheat  on  the  small  blob  pixels, 
and /90  is  the  bias  in  the  estimate  of  PB 

If  the  whole  blob  is  labeled,  then  0gis  zero,  as  we 
have  seen,  and  the  bias  is  0 1»  Ns/N(Pa  - P$.  In  13 
Kansas  segments  tested,  the  bias  in  the  wheat  per- 
centage estimate  was  found  to  be  as  indicated  in 
table  I. 

To  determine  the  variance  of  the  proportion  esti- 
mate, let 


M 

EA 
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where  AT  is  the  number  of  pixels  in  the  group  of  seg- 
ments and  N,  is  the  number  of  pixels  in  the  Ah  B- 
duster.  From  equation  (3)  and  using  the  standard 
formula,  the  variance  of  P is 


VarP  - £ VarPl  (13) 


Table  Wheat  Percentage  Estimate  Bias 
in  1.1  Kansas  Test  Segments 


Identif, ration  method 

Range  of  bias 

Average 

absolute  bias 

All  pixel*  in  the  blob 

-3.1  to  1.2 

1.2 

Blob  interior* 

-3.810  1.0 

1.3 

Pure  blob  interior* 

— 4.8  to  0.6 

1.4 
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The  variance  or  Pt  is  more  difficult  to  analyze.  If, 
for  example,  «,  statistically  independent  single  pixels 
were  drawn  from  the  Ah  Baluster  and 

i 04) 

i nt 

was  set,  where  is  the  number  of  pixels  found  to  be 
wheat,  then  the  binomial  variance  of  Pt  would  be 

M3)  • OS) 

If  n,  is  replaced  by  the  number  of  blobs  (e.g.,  bt) 
drawn  for  training  from  within  the  Ah  B-cluster, 
then,  because  each  blob  represents  a number  of 
pixels, 

Var(P()  < ^ (16) 


Variance  due  to  segment  selection.— So  far,  the 
variance  of  the  proportion  estimates  due  to  random 
sampling  within  each  B-cluster  (eq.  (17))  or  within 
each  B-ctuster/segment  cell  (eq.  (18))  has  Inen  dis- 
cussed. In  the  application  of  Procedure  B to  signature 
extension,  samples  are  identified  from  only  a subset 
of  the  segments.  If  the  wheat  proportion  in  B-dusters 
is  constant  over  all  segments,  then  it  does  not  matter 
which  segments  are  sampled  from.  In  practice, 
however,  it  hu  been  observed  that  the  wheat  propor- 
tions within  B-dusters  are  not  homogeneous, 
especially  when  Procedure  B is  applied  to  a very  large 
region,  such  as  the  entire  state  of  Kansas.  This  obser- 
vation implies  that,  even  when  the  segments  are 
chosen  to  represent  the  B-dusters,  the  particular 
choice  of  sample  segments  to  be  used  for  labeling 
affects  the  estimates  P, . Thus,  the  process  of  ran- 
domly choosing  subsets  of  segments  may  introduce 
an  added  random  variation,  whereas  the  process  of 
systematically  choosing  segments  may  introduce  a 
systematic  variation,  or  bias.  These  questions  have 
been  addressed  empirically  and  the  results  are  in- 
cluded within  the  next  section. 


COMPONENT  PERFORMANCE  TESTS 


Therefore,  an  upper  bound  on  the  variance  of  P is 
given  by 


VarP  < 


(17) 


If  a random  sample  is  made  from  each  B> 
cluster/segment  cell,  rather  than  just  from  each  B' 
cluster,  and  then  equation  (4)  is  used  to  estimate  Ph  a 
two-way  stratified  sampling  is  being  executed  over  B- 
clusters  and  over  segments.  In  this  case,  an  upper 
bound  on  the  variance  of  the  estimate  for  a group  of 
segments  is 


As  will  be  shown  later,  the  variance  of  two-way 
stratification  is  always  smaller  than  the  variance  of 
one-way  stratification. 


In  performance  testing  of  Procedure  B,  two  classes 
of  tests  are  distinguished:  (1)  component  perform- 
ance tests,  which  are  discussed  in  this  section,  and 
(2)  overall  performance  testing,  which  is  discussed  in 
the  following  section. 

The  purpose  of  component  testing  is  to  determine 
optimal  parameter  settings  for  each  component  of 
Procedure  B.  The  approach  is  to  measure  component 
performance  as  a function  of  the  parameter  values 
and  then  adjust  the  parameter  values  to  achieve  the 
bat  performance. 

The  performance  measures  used  are  bias, 
variance,  and  variance  reduction  factor  (RV).  Not  all 
these  measures  are  appropriate  for  every  component. 
Following  are  the  major  components  of  Procedure  B 
being  examined. 

1.  Spectral/spatial  stratification  (BLOB) 

2.  Spectral/ancillary  data  stratification  (BCLU5T) 

3.  Training  segment  selection 

4.  Training  blob  selection 

5.  Proportion  estimation  algorithm 

The  appropriate  measure  of  performance  for  the  first 
two  components  is  the  RV.  The  performance 
measures  for  the  last  three  components  are  the  bias 
and  the  variance  of  the  overall  proportion  estimate. 


Perimeter  selection  for  e«ch  or  these  components 
will  be  described  in  reverie  order.  Por  (he  left  four 
components,  die  data  base  consisted  of  nine  aeg> 
ments  in  Kansas  for  which  ground  truth  wss  known. 
The  percentate  wheat  for  each  blob  was  animated  by 
comparing  blob  maps  with  ground-truth  photo- 
graphs. The  teats  of  the  spectral-spatial  stratification, 
which  required  a more  accurately  specified  ground 
truth,  were  based  on  a pixel-by-pixel  ground-truth 
tape.  This  tape  included  die  original  nine  segments 
and  four  more. 
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rfoporMon  MMiMuun  Mjjuvitnin 
Performance  Te«t» 

Parameter*— Although  the  algorithm  does  not 
have  parameters  to  be  optimized,  the  concern  was 
that  the  estimator  be  unbiased  with  respect  to  the 
source  of  labels  (i.e„  the  LACIE  analyst  or  pound 
truth). 

Experimental  procedure.—?  mcedure  B was  run  for 
all  the  nine  ground-truthed  segments  and  every 
available  blob  was  allocated  for  training.  (There  were 
about  3000  in  all.)  The  Procedure  B estimate  of  the 
total  percentage  of  wheat  in  the  group  of  nine  seg- 
ments was  compared  with  that  measured  by 
planimetry  at  JSC. 

Discussion.— Initially,  the  mathematical  form  of 
the  estimator  did  not  account  for  the  size  (popula- 
tion) of  each  blob  in  estimating  the  proportion  of 
wheat  in  a B-duster.  The  resulting  proportion  esti- 
mate was  found  to  be  positively  biased.  This  situa- 
tion has  been  corrected  and  the  difference  between 
the  pound-truth  proportion  and  the  Procedure  B 
wheat  proportion  estimate  is  about  0.3  percent. 
Using  about  3000  blobs,  die  proportion  estimate  for 
the  9 sample  segments  is  21 J9  percent  wheat.  This 
number  will  be  used  as  the  standard  of  comparison  in 
further  work  with  this  set  of  nine  segments. 


Blob  Boloetion  Performance  Touts 

Parameters .—  The  parameters  that  can  be  varied 
are  the  total  number  of  Mobs  selected  for  labeling 
and  the  number  of  segments  in  which  labeling  is  per- 
formed. For  any  given  number  of  segments  to  be 
labeled.  Procedure  B chooses  a specific  subset;  hence, 
the  choice  of  a subset  is  not  a parameter  to  be 
optimized. 


In  this  section,  the  aspect  to  be  studied  is  tits 
variance  introduced  by  random  selection  of  blobs. 

Experimental  procedure.— Pot  each  setting  of  the 
parameters,  IS  wheat  proportion  estimates  were 
made.  Each  of  these  estimates  sms  based  on  a 
different  random  drawing  of  blobs  according  to  the 
constraints  of  Procedure  B.  Following  ate  the 
parameter  settings  chosen  for  testing. 

1.  Number  of  blobs  selected  for  labeling:  300, 600, 
1500,3000. 

2.  Number  of  sample  segments  in  which  labeling 
was  performed:  1, 3, 6, 9 

There  are  300  to  400  Mobe  suitable  for  labeling  in 
each  sample  segment  Hence,  certain  combinations 
are  not  possible;  for  example,  it  to  not  possible  to 
draw  1S00  blobs  from  only  three  segments. 

Figures  I through  3 give  the  resuite  of  this  tost 
Figure  1 shows  resuits  when  300  blobs,  orawn  from 
9,6,3,  and  1 sample  segments,  were  chosen  for  label- 
ing. The  ordinate  to  the  resulting  wheat  proportion 
estimate  for  the  entire  group  of  nine  segments.  Each 
dot  represents  a proportion  estimate  obtained  from  a 
particular  random  draw  of 300  blobs.  The  abscissa  in- 
dicates the  number  of  sample  segments  from  which 
these  300  biota  were  drawn.  In  this  figure,  the  don 
have  been  spread  out  slightly  along  the  abscissa  in 
order  to  make  individual  points  more  distinguish- 
able. 

The  standard  deviation  of  each  group  of  data  is 
shown  by  the  error  tars.  The  important  thing  to 
notice  to  that  this  standard  deviation  to  about  the 
same  when  Mobe  to  be  labeled  are  selected  from 
nine,  six,  or  three  segments,  but  this  standard  devia- 
tion to  notably  smaller  for  the  onoaegmenl  case.  This 
to  because  300  Mobs  drawn  from  1 segment 'nearly 
exhaust  the  available  blobs.  Hence,  the  oossiMe 
selections  are  fewer  end  the  variance  of  the  resulting 
proportion  estimates  to  smaller. 

Figure  2 shows  the  proportion  estimates  resulting 
when  600  blobs  ere  selected  from  9, 6,  end  3 sample 
segments,  whereas  figure  3 shows  the  result  of  select- 
ing 1300  Mobs  from  9 and  6 segments,  in  general,  the 
previously  mentioned  trend  is  preserved.  The  case  of 
3000  biota  and  9 segments  leads  to  a single  point  (no 
variance). 

These  results  apply  to  the  case  where  no  ancillary 
variables  were  used.  As  will  be  discussed  later,  the 
use  of  an  appropriate  ancillary  variable  in  the 
BCLUST  algorithm  improves  the  R V.  This  entire  ex- 
periment was  repeated  wing  the  choice  of  an  ancil- 
lary variable,  November  soil  moisture,  which  was 
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NUMBER  OF  MOMENTS  USED  FOR  TRAINING 

FIGURE  I.—' Wtoai  proportion  hMmni  Nr  9 Mtmaatt  «Mi  approximate!}  M Maks  selected  hr  trolnlns  and  with  m ancillary  tarta- 
Mr  (tree  pnprrtloa  « 21.19  percent). 


NUMBER  OF  SEGMENTS  UMD  FOR  TRAINING 


FIGURE  2-WkNt  properties  estimate*  for  9 setments  wish  approximate!}  999  Maks  hIkM  for  tralnlni  on9  with  no  anctUar}  vorla- 
Mo  (into  proportion  “ 21.99  percent). 


FIGURE  I Whit)  proportion  estimate*  for  9 somoni*  with  approximately  I Wo  hi  obi  selected  for  trolnlni  and  with  no  nncillary 

variable  tiro*  proportion  “ 21.99  percent). 


determined  (in  other  tests  to  be  described)  to  yield 
the  best  RV.  These  results  and  also  the  results  of 
figures  1 through  3 are  given  in  table  II.  This  table 
will  be  referred  to  later  in  the  discussion  on  con- 
structing a model  to  explain  the  results. 


Stfitnont  Sanction  Performance  Teste 

Parameters.— Tht  segment  selection  program  has 
several  heuristkally  developed  internal  parameters 
that  might  be  adjusted;  however,  these  parameters 
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Table  II  —Summary  of  Random  Replications  of  a Proportion  Estimation  Procedure  as  a Function  of  the  Number 
of  Segments  Used  for  Labeling,  the  Number  of  Mobs  Labeled,  and  the  Ancillary  Variable  Treatment 


No.  of 

segments  used 
for  tabling 

No.  of 
blobs 
labeled 

Wheat  proportion  estimates 
without  using 
an  ancillary  variable 

Wheat  proportion  estimates 
with  November  soli  moisture 
used  as  an  ancillary  variable 

Mean. 

percent 

Variance. 

percent 

SD.a 

percent 

Mean. 

percent 

Variance, 

percent 

SDf 

percent 

9 

3000 

21.89 

0.0090 

0.095 

21.80 

0.0104 

0.102 

1500 

21.88 

.2401 

.490 

21.87 

.1832 

.428 

600 

22.23 

.9006 

.949 

21.66 

1.0506 

1.025 

300 

21.74 

2.1287 

1.459 

22.47 

2.3104 

1.520 

6 

1500 

21.34 

.0590 

.243 

22.19 

.1376 

.371 

600 

21.11 

.4382 

.662 

21.95 

.9565 

.978 

300 

21.15 

1.9488 

1.396 

21.48 

2.2134 

1.521 

3 

600 

22.00 

.4597 

.678 

19.03 

.2520 

.502 

300 

21.18 

1.7742 

1.332 

18.98 

1.1535 

1.074 

1 

300 

21.77 

.2043 

.452 

37.51 

.5852 

.765 

Standard  deviation. 


have  not  been  optimized.  What  is  perhaps  more  im- 
portant is  to  measure  the  variance  introduced  by  the 
process  of  selecting  a subset  of  segments  and  to  see 
whether  the  use  of  a systematic  routine  in  any  way 
reduces  that  variance  or  introduces  a bias. 

Experimental  procedure. — There  are  84  possible 
combinations  of  9 segments  taken  3 at  a time  and  the 
same  number  of  combinations  taken  6 at  a time.  Of 
these  possible  combinations,  IS  combinations  of  3 
and  of  6 were  chosen  for  labeling.  The  selection  was 
done  in  two  ways:  for  one  set,  a random  selection  of 
IS  combinations  was  made;  for  the  second  set,  the 
best  IS  combinations  of  segments  as  defined  by  the 
value  function  (eq.  (2»  were  selected.  Procedure  B 
was  run  for  both  of  these  sets  with  no  ancillary  varia- 
ble, then  the  experiment  was  repeated  with  Novem- 
ber soil  moisture  used  as  an  ancillary  variable.  In 
each  case,  all  of  the  blobs  available  from  the  selected 
three  or  six  sample  segments  were  used  for  training, 
so  that  the  only  source  of  variability  would  be  the 
selection  of  segments. 

Results. — The  results  of  this  experiment,  ex- 
pressed in  percentage  of  wheat,  are  given  in  table  III. 
Table  IV  shows  these  same  results  expressed  in 
terms  of  their  deviation  from  the  “true”  value  of  the 
wheat  proportion,  21.89  percent. 

In  no  case  was  the  mean  of  IS  combinations  sig- 
nificantly biased.  Table  III  shows  that  segment  selec- 
tion introduces  a variance  into  the  estimate  of  per- 


centage of  wheat  that  amounts  to  a standard  devia- 
tion of  1.5  to  S.3. 

Is  systematic  selection  of  segments  really  helpful? 
An  answer  to  this  question  is  found  in  the  com- 
parison between  the  variance  of  the  systematic  and 
random  selections.  For  a choice  of  three  segments 
out  of  nine,  systematic  selection  had  just  as  large  a 
variance  as  random  selection,  but  for  a choice  of  six 
segments,  systematic  selection  had  a smaller 
variance.  The  improvement  is  significant  at  the  0.01 
level  for  the  no-ancillary-variable  ease. 


BCLU3T  Performance  Teats 

The  spectral  stratification  algorithm  BCLUST  has 
a number  of  internal  parameters,  as  mentioned 
earlier.  Preliminary  tests  were  run  to  establish  a 
reasonable  value  for  the  growth  parameter  KTAU. 
This  parameter  was  chosen  such  that  the  BCLUST 
tended  to  equalize  the  size  of  the  B-clusters  that  were 
produced.  The  parameters  that  were  evaluated  in  the 
main  test  series  were  the  spectral  variables  used,  the 
ancillary  variables  used,  and  the  relative  weight  ap- 
plied to  each  variable. 

Performance  measures. — The  R V was  the  perform- 
ance measure  used  in  these  tests.  This  measure  has 
the  advantages  that  it  can  be  computed  directly  (if 
one  has  detailed  ground  truth)  without  making  final 
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Table  II/.— Results  A^hlewd  With  Procedure  B With  and  Without  the  Use  of  an  Ancillary  Variable  Using 
Random  Setecuc:  and  the  Highest  Ranked  Selections  of  Sample  Segments  Jbr  Labeling 

[True  mean  - 21.89  percent  I 


No.  of 

segments  toed 
fat  labeling 

Wheat  proportion  estimates 
without  using 
an  ancillary  variable 

Wheat  proportion  estimates 
with  November  soil  moisture 
used  at  an  ancillary  variable 

Mean, 

Variance. 

SD. 

Mean. 

Variance, 

SD. 

percent 

percent 

percent 

percent 

percent 

percent 

Random  selection 


3 23.13  27.98  $.29  20.11  18.81  4.34 

6 22.11  9.14  3.02  22.16  6.27  2.50 


Systematic  selection 


3 21.74  33.68  $.80  19.46  20.60  4.54 

6 21.11  2.34  1.S3  22.82  4.19  2.0$ 


Table  IV. — Procedure  B Results  Shown  in  Table  III,  Expressed  as  Deviations  From  True 


No.  of 

segments  used 
for  labeling 

Estimates  without  using 
an  ancillary  variable 

Estimates  with  November 
soil  moisture  used  as 
an  ancillary  variable 

Error  of  SD  of  mean 

Significant 

Error  of  SD  of  mean 

Significant 

mean 

at  0.05 ? 

mean 

at  0.05? 

Random  selection 


3 1.23  1.37  No  -1.79  1.12  No 

6 .21  .78  No  .26  .61  No 


Systematic  selection 


3 -0.16  1.5  No  -2.44  1.17  No 

6 -.79  .40  No  .92  .53  No 


proportion  estimates  and  that  it  relates  directly  to  the 
justification  for  doing  any  machine  processing  of  the 
data.  This  point  merits  further  discussion. 

A wheat  proportio  •.  estimate,  Pa  — njn,  can  be 
made  directly  by  a LACIE  analyst  labeling  n ran- 
domly chosen  pixels,  where  nwis  the  number  labeled 
wheat.  The  variance  of  this  estimate  is 


Hr  A ■ ffl  - n 
0 n 


where  P is  the  true  proportion  of  wheat  in  the  area 
for  which  the  estimate  is  made. 

If  the  area  is  first  stratified  and  then  ^samples  are 
randomly  drawn  from  the  /th  stratum,  the  propor- 
tion can  also  be  estimated  as 


<*» 

i-i 
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where  « the  number  of  pixels  in  the  Ah  stratum 

N — the  total  number  of  pixels 

M - the  total  number  of  strata 

n " 

n(  — the  number  of  samples  drawn  from  the 
Ah  stratum 

flw/  ~ the  number  of  samples  in  the  Ah 
stratum  found  to  be  wheat 

The  variance  of  this  estimate  is  given  in  equation 
(17). 

The  larger  strata  will  tend  to  contribute  more  sig- 
nificantly to  the  variance  because  of  the  squared  fac- 
tor (NJN)2.  This  tendency  can  be  offset  by  allocating 
more  samples  to  the  larger  strata,  and  in  fact  it  can  be 
shown  that,  if  there  is  no  prior  knowledge  of  Pjt  the 
best  allocation  (i.e.,  least  variance)  is  proportional  to 
the  sizes  of  the  strata. 


ni  - «aT 


(21) 


With  this  allocation,  equation  (17)  becomes 


N n 

i=l 


(22) 


The  RV  due  to  stratification  is  defined  as 


RV 


VarP 

A 


Vat 


Zip<('  - Pi) 

i= 1 

*0  - P) 


(23) 


The  RV  is  not  a function  of  the  number  of  sam- 
ples drawn.  It  can  be  interpreted  either  as  the  ratio 
between  variances  of  stratified  and  unstratified  esti- 
mates (as  it  has  been  defined)  or  equivalently  as  a 
measure  of  the  number  of  samples  required  to 
achieve  a certain  variance  of  the  resulting  proportion 
estimate.  The  R V is  thus  better  if  it  is  smaller.  Small 


values  of  RV  are  obtained  by  making  Ps  or  1 - Pt 
very  small;  i.e.,  the  RV  is  a measure  of  the  purity  of 
strata. 

In  defining  the  RV,  some  assumptions  are  made 
that  are  not  completely  fulfilled  in  Procedure  B.  For 
example,  the  RV  is  defined  as  a pixel-by-pixel 
measure,  whereas  the  sampling  in  Procedure  B is  per- 
formed using  blobs.  Also,  the  assumption  of  propor- 
tional allocation  defined  in  equation  (21)  is  not 
strictly  enforced  and,  therefore,  the  RV  is  a lower 
bound  on  the  effective  reduction  of  variance. 
Nevertheless,  it  is  believed  that  the  RV  should  cor- 
relate very  well  with  the  overall  variance  of  Proce- 
dure B estimates  and  is  therefore  a useful  and  valid 
measure  for  optimization  of  BCLUST. 

Experimental  procedure. — BCLUST  was  run  using 
the  nine  segments  for  which  ground-truth  proportion 
of  blobs  was  known.  Thus,  the  total  number  of  pixels 
and  the  number  of  pixels  known  to  be  wheat  could 
be  calculated  for  each  B-cluster,  so  that  the  R V could 
be  calculated. 

Six  spectral  variables  (brightness  and  greenness  in 
the  first  three  LACIE  biowindows)  were  first  con- 
sidered. An  optimal  set  of  weights  (wy  in  eq.  (1))  was 
established  among  these  six  spectral  variables.  Next, 
the  RV  was  computed  using  only  the  greenness 
variables  with  the  previously  determined  optimal 
weights.  Finally,  maintaining  the  same  relative 
weights  among  the  spectral  variables,  various  ancil- 
lary variables  were  added  and  their  weights  relative 
to  the  spectral  variables  were  increased  until  best  per- 
formance (i.e.,  minimum  RV)  was  reached.  (This 
best  ancillary  variable  combination  was  used  in  the 
tests  of  blob  selection  and  segment  selection.) 

The  number  of  strata  created  affects  the  vrflue  of 
the  R V.  In  fact,  it  can  be  shown  (ref.  16)  that  the  RV 
either  decreases  or  stays  constant  whenever  the 
number  of  strata  is  increased  by  splitting  an  existing 
stratum.  The  top  curve  in  figure  4 illustrates  this  fact. 
(The  reader  should  ignore  the  lower  curves  for  the 
time  being.  Two-way  RV  will  be  discussed  at  the  end 
of  this  subsection.)  Because  the  curve  seems  to  level 
out  reasonably  well  for  the  9 segments  at  about  90  B- 
clusters,  it  was  decided  to  compare  RV  values  using 
this  number  of  B-clusters.  The  number  of  B-clusters 
can  be  controlled  by  a parameter,  r,  internal  to 
BCLUST. 

Results. — The  optimal  weights  for  the  six  spectral 
variables  were  found  by  a search  of  six-dimensional 
space,  starting  at  a point  where  the  weights  were  in 
inverse  proportion  to  the  ranges  of  the  values  of  the 
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NUMBER  OF  CLUSTERS 


FIGURE  4. — RV  due  to  clustering  as  a function  of  the  number  of 
dusters  using  six  spectral  variables. 


variables.  For  each  setting  of  the  weights,  the  R V for 
90  B-clusters  was  calculated.  The  search  pattern  was 
to  follow  the  path  of  steepest  descent  to  a setting  of 
the  weights  that  would  result  in  the  lowest  RV.  It 
happened  that  the  optimal  setting  was  found  at  the 
starting  point. 

Using  weights  determined  in  this  manner,  the  R V 
for  90  B-clusters  was  also  computed  for  the  case 
where  the  three  greenness  variables  alone  were  used 
by  BCLUST.  The  comparisons  are  shown  in  table  V. 
It  was  judged  that  the  improvement  due  to  retaining 
the  three  brightness  variables  was  substantial.  Thus, 
all  six  spectra!  variables  were  used  in  subsequent 
tests. 


Table  V. — R Vfor  Three  Greenness  Variables 
Compared  to  RV  for  Three  Greenness  and 
Three  Brightness  Variables 


RE 

Three 

Six  * 

greenness 

spectral 

variables 

variables 

One-way 

0.622 

0.539 

Two-way 

.430 

.406 

The  two-way  RV  shown  in  figure  4 will  now  be 
discussed.  (See  alio  eq.  (18)  and  related  discussion.) 
Two-way  RV  depends  on  a two-way  stratification,  by 
B-cluster  and  by  segment.  Two-way  variance  reduc- 
tion can  be  achieved  by  using  all  the  sample  seg- 
ments for  labeling  and  allocating  the  samples  drawn 
not  only  to  B-clusters  in  proportion  to  their  size  but 
also  to  each  segment  in  proportion  to  the  population 
of  the  B-cluster  within  that  segment.  In  fact.  Proce- 
dure B allocates  samples  in  this  way  when  all  the  seg* 
ments.are  used.  It  appears  that  roughly  a 20-percent 
sampling  advantage  might  be  obtained  by  sampling 
for  two-way  reduction  of  variance,  at  the  expense  of 
using  all  the  segments  for  training  rather  than  a 
subset  of  them. 

A number  of  different  ancillary  variables  were 
considered,  as  well  as  combinations  of  ancillary 
variables.  The  details  of  each  are  not  critical  to  this 
discussion.  In  general,  the  ancillary  information  falls 
into  two  categories:  ancillary  information  derived 
from  sources  external  to  Landsat  data,  such  as  availa- 
ble soil  moisture  on  certain  dates,  or  the  latitude  and 
longitude;  and  ancillary  information  derived  from 
the  Landsat  data  itself,  such  as  the  green-index  me- 
dian, which  is  derived  for  these  segments  from  the 
green-index  numbers  computed  by  Wehmanen  (see 
the  paper  by  Thompson  and  Wehmanen  entitled 
“Application  of  Landsat  Digital  Data  for  Monitoring 
Drought”).  This  and  the  mean  of  the  green  arm  are 
diagnostic  features  derived  from  each  LACIE  sam- 
ple segment.  They  are  believed  to  be  a measure  of  the 
general  vigor  of  vegetation  within  the  segment. 

With  each  combination,  the  procedure  was  the 
same.  BCLUST  was  run  with  six  spectral  variables 
plus  the  ancillary  variable  with  a certain  weight. 
Then  the  value  of  r was  adjusted  until  there  were  90 
B-clusters.  The  value  of  r*is  thus  an  indirect  measure 
of  the  weight  applied  to  the  ancillary  variable.  Then 
the  weight  was  changed,  and  the  procedure  was  re- 
peated. Figure  5 shows  the  RV  versus  r for  the  case 
of  November  soil  moisture.  This  happens  to  be  the 
case  that  resulted  in  the  lowest  minimum  value  of 
the  one-way  RV.  The  pattern  is  fairly  typical, 
however.  As  the  weight  of  the  ancillary  variable  in- 
creases, the  one-way  RV  decreases  to  a minimum 
and  then  increases  again.  The  explanation  of  the  pat- 
tern follows. 

Initially,  with  no  weight  on  the  ancillary  variable, 
many  B-clusters  extend  across  all  the  segments.  In 
some  of  the  segments,  a particular  B-cluster  may  con- 
tain mostly  wheat  blobs;  in  some  other  segments,  it 
may  contain  mostly  nonwheat  blobs.  Thus,  that  B- 
cluster  is  “mixed"  in  the  sense  that  P,  is  neither  very 
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FIGURE  5.— RV  due  to  clustering  November  1 soil  moisture 
along  with  the  spectral  variables;  r measures  the  weight  put  on 
the  ancillary  variable  (r  — 72  corresponds  to  zero  weight). 


near  zero  nor  very  near  one.  The  resulting  RV  is 
0.S39  averaged  for  all  B-clusters.  As  the  weight  on  the 
ancillary  variable  is  increased,  blobs  from  segments 
with  differing  values  of  the  ancillary  variable  are 
forced  into  different  B-clusters,  even  though  they 
may  have  nearly  the  same  value  of  spectral  variables. 
B-clusters  no  longer  can  extend  across  all  segments. 
The  typical  B-cluster  is  now  split  into  two  parts,  and, 
if  the  signature  of  wheat  depends  on  the  ancillary 
variable,  it  is  likely  that  each  of  the  two  parts  will  be 
purer  than  the  original  cluster,  so  the  RV  will 
decrease. 

This  trend  cannot  continue  indefinitely.  As  the 
weight  on  the  ancillary  variable  is  increased,  the 
effective  weight  on  the  spectral  variables  is 
decreased.  In  effect,  the  clustering  procedure 
becomes  incapable  of  distinguishing  very  well  on  the 
basis  of  spectral  information.  As  the  ancillary  varia- 
ble weight  becomes  too  large,  each  segment  contains 
its  own  little  group  of  B-clusters  that  do  not  cross 
over  to  any  other  sample  segment.  In  the  case  of  9 
segments  and  90  B-clusters,  the  average  number  of 
segments  per  B-cluster  will  be  10.  The  likelihood  is 
small  that  many  of  these  10  B-dusters  will  be  nearly 
pure. 


The  effect  of  the  ancillary  variable  at  its  optimum 
weight  is  to  create  a sort  of  “soft”  partitioning  of  the 
segments;  i.e.,  a subset  of  the  segments  is  found  to 
intersect  a certain  subset  of  the  B-clusters.  The  ancil- 
lary variable  prevents  an  attempt  at  “signature  exten- 
sion” when  it  is,  in  fact,  not  feasible  to  extend  sig- 
natures. 

Table  VI  lists  a few  of  the  ancillary  variables  that 
have  been  tried  and  the  weight  at  which  they  were 
found  to  have  the  best  effect.  (A  weight  of  1 means 
that  the  weight  was  inversely  proportional  to  the 
range  of  the  variable.) 


BLOB  Performance  Teste 

The  spectral/spatial  processing  algorithm  BLOB  is 
the  final  step  in  feature  extraction  in  Procedure  B. 
BLOB  creates  pseudofields  by  grouping  together  pix- 
els that  are  both  spectrally  and  spatially  near  each 
other.  The  mean  vector  of  the  pixels  assigned  to  each 
blob  is  then  calculated  and  this  value  becomes  the 
spectral  feature  which  is  used  in  BCLUST.  BCLUST 
then  performs  a further  grouping  of  blobs  (and 
therefore  of  pixels)  into  B-clusters. 

Parameters.—^ The  BLOB  algorithm  contains  a 
number  of  parameters.  The  algorithm  is  designed  to 
add  two  more  channels,  line  number  and  point  num- 
ber, to  the  multispectral  data  channels.  Line  and 
point  are  spatial  variables.  A standard  clustering 
algorithm  is  then  used  to  add  pixels  to  existing 
clusters  and  to  create  new  clusters  when  the  pixel  is 
not  close  enough  to  any  existing  cluster. 


Table  VI. — Ancillary  Variables  Considered  and 
Performance  Obtained  With  Each 


Variable 

One-wav 
RV  ’ 

Weight 

No  ancillary  variables 

0.539 

November  soil  moisture 

.492 

0.5 

Crop  calendar,  April 

.502 

.5 

Green  median: 

•it  biowindow 

.511 

1.18 

2nd  biowindow 

SCO 

.5 

Jrd  biowindow 

.497 

2.0 

Green  arm  mean: 

1st  biowindow 

.522 

1.03 

2nd  biowindow 

.519 

2.0 

Jrd  biowindow 

.494 

48 
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The  distance  function  used  in  the  clustering  is 
determined  as  follows.  Each  new  data  point 
(spectral:  x » (*,,  Xj, . . . , X*);  spatial:  x - (line, 
point))  is  tested  for  admission  into  each  existing  blob 
by  computing 


distance  2 


fcl  - *|)  3 
VAR\ 


♦ 


. (*■  - M.) a 

YARN 


{i-*£ 

VARL  VARP 


(24) 


where  (Ht, ll8,  M2,  MJ  is  the  blob  mean  vec- 

tor and  VAR1,  ....  VARN,  VARL,  VARP  are 
wrights  expressed  as  variances. 


The  pixel  x is  added  to  the  existing  blob  with  the 
smallest  distance  unless  this  minimum  distance  is 
greater  than  a parameter  r in  which  case  the  pixel 
becomes  the  seed  point  of  a new  cluster. 

In  this  task,  six  spectral  variables  have  been  used, 
brightness  and  greenness  from  three  biophases.  The 
next  problem  is  to  specify  the  parameters  of  the 
BLOB  algorithm,  namely  VAR1  through  VAR6, 
VARL,  VARP,  and  TAU. 

Another  parameter  of  Procedure  B that  relates  to 
blobs  is  the  question  of  whether  to  use  only  big  blobs 
in  the  operations  from  BCLUST  on.  Figure  6 is  a 
gray  map  of  the  blobs  in  segment  1865.  The  big  blobs 
are  represented  by  printed  characters  and  the  small 
blobs  by  blanks.  Although  there  are  many  small 
blobs,  they  do  not  contain  many  pixels.  The  small 
blobs  mostly  represent  boundaries  between  fields. 


FIGURE  Gray  map  of  big  blobs  in  KguKnl  1865,  Kansas. 
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The  previously  reported  component  tests  of  Proce- 
dure B have  been  run  using  big  blobs  only. 

Performance  measures.— The  blob  R V is  analogous 
to  that  for  B-clusters: 


RV  = 


all  blobs 


P(l  - P) 


(25) 


where  /*  is  the  proportion  of  wheat  in  blob  /,  is  the 
number  of  pixels  in  blob  /,  Pis  the  overall  wheat  pro- 
portion in  the  segment,  and  N is  the  number  of  pixels 
in  the  segment.  The  blob  RV  is  a measure  of  blob 
purity:  the  purer  the  blob,  the  smaller  the  RV.  This 
measure  also  provides  a lower  bound  for  the  B- 
cluster  RV.  No  matter  how  well  the  blobs  are  com- 
bined into  B-clusters,  it  is  impossible  to  improve  the 
RV  found  by  blobbing  alone,  nor  can  the  blob  RV  be 
any  better  than  the  pixel  RV,  which  is 


deviation.  This  proportion  is  not  1 to  1 because 
points  are  sampled  more  frequently  than  lines. 

The  next  step  was  to  determine  the  balance  be- 
tween the  set  of  spectral  variances  and  the  pair  of 
spatial  variances  by  holding  the  spectral  variances 
constant  and  comparing  the  blob  RV’s  for  three  sets 
of  spatial  variances,  small,  medium,  and  large.  (If  all 
the  parameters  are  increased  by  the  same  proportion, 
the  algorithm  remains  exactly  the  same.  That  is  why 
holding  the  spectral  variances  constant  and  varying 
the  spatial  variances  and  r is  permissible.) 

Figure  7 shows  the  result  of  computing  the  blob 
R V’s  for  eight  Kansas  segments  and  three  settings  of 
the  spatial  variances.  The  lower  setting  corresponds 
to  more  emphasis  on  the  spatial;  the  higher  setting, 
to  more  emphasis  on  the  spectral.  The  gentle  trends 
indicate  an  optimal  setting  somewhere  between  the 
lower  and  middle  settings. 


.6 


£ M1  ~ pj) 

RV  - «»  pixels 

MXl  P ) 


(26) 


where  p}  is  the  proportion  of  wheat  in  pixel  j. 

The  usefulness  of  the  blob  RV  and  the  pixel  RV 
as  performance  measures  was  made  possible  by  the 
recent  provision  of  highly  accurate  pixel-by-pixel 
ground  truth.  The  other  performance  measures  are 
the  bias  and  average  absolute  error  of  wheat  esti- 
mates based  on  classes  of  blobs  and  pixels,  such  as 
big  blobs,  small  blobs,  and  blob  interiors. 

Experiments  and  results.— The  parameters  of  the 
BLOB  algorithm  were  determined  as  follows.  The 

spectral  weights  VAR1 VAR6  were  set  by 

referring  to  the  previous  work  on  finding  the  optimal 
spectral  weights  for  grouping  blobs  into  B-clusters.  A 
search  pattern  in  six-dimensional  space  indicated 
that  for  B-clustering,  the  best  spectral  weights,  ex- 
pressed as  variances,  are  proportional  to  the  effective 
ranges  of  the  variables.  Using  the  same  proportion 
for  blobbing  weights,  the  spectral  weights  were  deter- 
mined relative  to  each  other. 

As  for  the  spatial  weights,  the  proportion  of  the 
line  variance,  VARL,  was  set  to  the  point  variance, 
VARP,  so  that  the  line  standard  deviation  represents 
the  same  geographic  distance  as  the  point  standard 


.2  I 1 I i 

VARL  = 2.0  4.9  11 

VARP  = 3.5  8.5  19 

FIGURE  7, — Blob  RV  tor  three  sets  of  spatial  weights. 
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The  problem  with  choosing  r is  that  when  r is  de- 
creased, the  number  of  blobs  is  increased  and  the  RV 
is  generally  decreased.  However,  decreasing  r too 
much  reduces  the  number  of  interior  pixels  that  are 
depended  on  for  B-clustering.  Also,  the  bias  resulting 
from  omitting  small  Mobs  is  likely  to  be  larger.  A 
value  of  t was  chosen  so  that  the  number  of  big  blobs 
would  roughly  correspond  to  the  number  of  fields, 
and  the  r value  was  kept  constant  for  all  the  seg- 
ments. 

The  final  parameters  are  given  in  table  VU.  They 
apply  to  brightness  and  greenness  variables  in 
biophases  1,  2,  and  3.  If  a subset  of  these  spectral 
variables  (e.g.;  size  m)  were  used,  the  appropriate 
weights  would  be  obtained  by  keeping  the  corre- 
sponding spectral  weights  the  same,  multiplying 
VARL  and  VARP  by  6/m,  and  multiplying  r by  m/6. 

When  these  parameters  were  used  for  blobbing  13 
segments  in  Kansas,  big  blob  pixels  averaged  81  per- 
cent and  ranged  from  6S  to  93  percent,  interior  pixels 
averaged  30  percent  and  ranged  from  IS  to  43  per- 
cent, and  edge  pixels  (i.e.,  those  that  are  in  big  blobs 
but  are  not  interior)  stayed  nearly  constant  at  SO  per- 
cent. The  number  of  big  blobs  was  remarkably  con- 
stant (from  342  to  489),. whereas  the  total  number  of 
blobs  ranged  from  S90  to  1991. 

To  determine  limits  of  the  efficiency  of  Procedure 
B,  RV  factors  were  computed  for  pixels,  blobs,  and 
B-clusters.  The  results  are  given  in  table  VIII. 
Remembering  that  RV  measures  purity  and  that  a 
low  score  is  good,  it  is  demonstrated  that  the  pixel 
RV  is  a lower  bound  for  the  blob  RV,  which  in  turn 
is  a lower  bound  for  the  B-cluster  RV. 

Noticeable  features  of  table  VIII  are  the  con- 
siderable variability  from  segment  to  segment  and 


Table  VU — Blob  Parameters  Appropriate  for  Using 
the  Brightness  and  Greenness  Transformed  Channels  in 
the  First  Three  Biophases 


Description 

Name 

Weight 

Brightness,  biophase  1 

VAR1 

25.0 

Greenness,  biophase  1 

VAR2 

5.3 

Brightness,  biophase  2 

VAR3 

14.0 

Greenness,  biophase  2 

VAR4 

9.0 

Brightness,  biophase  3 

VARS 

18.4 

Greenness,  biophase  3 

VAR6 

9.0 

Line  variance 

VARL 

3.46 

Point  variance 

VARP 

6.0 

^Distance  limit 

T 

23.2 

Table  VIII. — R Vs  for  Blob  Tests  on  13 
Kansas  Segments 


Segment 

Pixels 

Blobs 

B-clusters 

Blob 

Interiors 

1020 

0.09 

0.24 

0.25 

0.04 

1035 

.11 

.36 

.66 

.19 

1041 

.11 

.41 

.74 

.21 

1165 

.16 

.49 

.86 

.20 

1851 

.11 

.36 

.48 

.16 

1852 

.08 

.30 

.48 

.14 

1861 

.08 

.26 

.43 

.09 

t86S 

.08 

.26 

.56 

.09 

1886 

IS 

.43 

.61 

.17 

1163 

.18 

.52 

.71 

.28 

1167 

.15 

.39 

.63 

.18 

1860 

.13 

.41 

.53 

.15 

1887 

.12 

.39 

.59 

.17 

Average 

.12 

.37 

.58 

.16 

“The  lower  the  RV.  the  purer  (he  groups  (perfect  purity  scores  0*.  perfect 
homogeneity  scores  II. 


the  high  correlation  (0.80)  between  the  13  B-cluster 
RV’s  and  the  limit  set  by  blobbing  (i.e.,  the  blob 
RV). 

At  this  point,  it  is  not  dear  why  good  (i.e.,  small) 
RV’s  are  obtained  for  some  segments  and  poor  (i.e., 
large)  RV’s  are  obtained  for  other  segments.  The 
effects  of  multitemporal  misregistration  and  of 
smaller  field  sizes  are  being  examined. 

The  RV  score  for  blob  interiors  is  very  good, 
showing  that  the  blob  operation  is  doing  its  job  in  the 
sense  that,  although  there  may  be  some  confusion  at 
the  edges  of  the  blobs,  the  interiors  of  the  blobs  are 
quite  pure. 

The  average  RV  for  B-clusters,  0.S8,  implies  that 
the  stratified  estimate  (i.e.,  the  Procedure  B esti- 
mate) would  require  only  58  percent  of  the  iden- 
tifications needed  for  an  unstratified  probability 
sample  of  pixels  in  the  segment  to  achieve  the  same 
variance.  This  column  is  a misleading  indication  of 
the  value  of  Procedure  B because  Procedure  B sam- 
ples blobs  rather  than  pixels.  Sampling  blobs  rather 
than  pixels  would  be  expected  to  reduce  the  variance 
because  the  proportion  of  wheat  in  a blob  randomly 
chosen  from  a B-cluster  would  be  expected  to  have 
less  variance  than  the  proportion  of  wheat  in  a pixel 
randomly  chosen  from  a B-cluster. 

The  difference  between  the  average  B-cluster  RV 
of  0.58  given  in  table  VIII  and  the  two-way  RV  of 
0.406  given  in  table  V results  from  the  different  types 
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of  ground  truth  used  as  a standard.  The  table  V result 
was  based  on  wheat  percentages  of  blobs  estimated 
from  ground-truth  photographs  and  maps  of  blob  in- 
teriors, whereas  the  table  VIII  results  are  based  on  a 
pixel-by-pixel  ground-truth  tape.  Because  the  blob  in- 
teriors are  pure,  the  estimated  wheat  percentages 
based  on  them  tended  to  be  closer  to  0 or  100  than 
was  actually  the  case.  This  resulted  in  much  smaller 
blob  RV's,  which  in  turn  produced  smaller  B-cluster 
RV's, 

The  difference  between  the  average  B-cluster  RV 
and  the  two-way  RV  is  not  a reflection  of  the 
difference  between  the  single-segment  and  multiseg- 
ment use  of  Procedure  B.  Using  the  estimated 
ground  truth,  the  average  single-segment  B-cluster 
RV  was  0.418  and  the  composite  RV,  obtained  by 
collecting  the  single-segment  strata  into  one  big  pool 
of  strata,  was  0.377.  In  short,  the  comparison  of 
single-segment  with  multisegment  use  of  Procedure 
B,  as  judged  by  the  earlier,  estimated  ground  truth, 
was  a standoff. 

The  purity  of  blob  interiors  offers  some  hope  of 
solving  a practical  difficulty  with  sampling.  The  cur- 
rent wheat  estimates  are  all  based  on  samples  pro- 
vided by  the  judgments  of  analyst  interpreters 
(AI's).  If  these  AI's  are  asked  to  identify  a pixel  at 
random,  the  chance  is  70  percent  that  it  will  come 
from  an  edge  or  a small  blob  and  is  therefore  likely  to 
be  on  or  near  a field  boundary.  What  with  multitem- 


poral registration  errors  and  mixed  spectral 
responses,  it  would  seem  a formidable,  if  not  an  im- 
possible, task  to  identify  such  a pixel.  But  if  asked  to 
identify  a relatively  pure  blob  interior,  the  AI  would 
have  a much  better  chance  to  respond  accurately. 

Table  IX,  a table  of  different  methods  of  estimat- 
ing wheat  percentage,  gives  empirical  information 
about  the  accuracy  of  such  a procedure.  The  “all  pix- 
els” column  is  the  percentage  of  wheat  computed 
from  every  known  pixel  in  the  segment.  Those 
figures,  and  indeed  all  the  others  except  those  in  the 
adjoining  column,  are  based  on  the  pixel-by-pixel 
ground-truth  data  recently  computed  at  ERIM.  The 
third  column  is  a measurement  of  the  percentage  of 
wheat  in  the  scene  made  by  planimetry  at  JSC  a cou- 
ple of  years  ago.  The  average  absolute  difference  be- 
tween these  two  columns  is  0.8  percent,  showing  that 
even  the  most  careful  measurements  from  high- 
resolution  photography  are  subject  to  an  error  of 
about  1 percent.  The  discrepancy  in  segment  1865, 
which  is  caused  by  the  failure  of  the  photography  to 
cover  the  top  quarter  of  the  segment,  is  left  out  of  the 
calculation. 

The  next  three  columns  are  the  percentage  of 
wheat  computed  on  various  subsets  of  pixels.  The  big 
blob  pixel  estimate  is  quite  close  to  the  measured 
truth,  whereas  the  small  blob  pixel  and  the  interior 
pixel  estimates  are  erratic. 

The  estimate  made  from  small  blob  pixels  has  a 


Table  IX. — Various  Estimates  of  Wheat  Proportion 


Segment 

A'l 

pixels 

JSC 

wheat. 

percent 

Big 

Mobs 

Small 

blobs 

Interior 

pixels 

Extrapolated 

from 

interior 

Extrapolated 
from  pure 
interior 

1020 

26.1 

25.3 

24.0 

43.2 

21.1 

23.9 

24.1 

1035 

17.7 

17.5 

17.5 

18.3 

13.9 

17.7 

17.4 

1041 

14.4 

14.4 

14.3 

15.3 

14.5 

14.7 

14.4 

116$ 

7.1 

6.5 

6.2 

8.9 

7.3 

6.2 

6.6 

1851 

22.8 

21.9 

20.4 

33.6 

16.6 

19.9 

19.6 

1852 

23.4 

22.3 

24.6 

15.6 

26.0 

24.4 

24.0 

1861 

34.9 

34.4 

34.4 

42.5 

28.3 

34.2 

33.9 

1865 

28.5 

20.4 

26.6 

34.5 

23.6 

26.6 

26.6 

1886 

29.7 

28.9 

29.9 

28.4 

29.6 

29.8 

29.9 

1163 

9.3 

8.7 

8.0 

13.7 

5.9 

8.2 

7.8 

1167 

10.1 

8.0 

7.0 

15.7 

5.7 

6.3 

5.3 

I860 

26.1 

24.8 

26.2 

25.1 

24.8 

26.4 

26.6 

1887 

11.4 

10.9 

10.2 

17.8 

7.0 

10.2 

10.0 

Average  bias 

-0.8 

-0,9 

3.9 

-2.9 

-1.0 

-1.2 

Average  absolute  error 

.8 

1.2 

5.5 

3.3 

1.3 

1.4 
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bias  that  averages  4 percent  and  ranges  from  -8  to 
17  percent  The  bias  in  the  estimate  from  the  small 
blob  pixels  has  to  be  balanced  by  a bias  in  the  op* 
posits  direction  on  the  big  blob  pixels.  The  big  blob 
bias  is  smaller  because  there  are  more  pixels  in  the 
big  blobs.  The  estimate  made  from  interior  pixels  has 
a bias  that  averages  -3  percent  and  ranges  from 
-6.5  to  2.S  percent. 

The  interior  pixel  estimate  was  computed  by 
simply  totaling  the  wheat  percentages  for  the  interior 
pixels  and  dividing  by  the  number  of  interior  pixels. 

Another  estimate  based  on  interiors  is  to  assign  to 
all  pixels  in  a big  blob  the  proportion  of  wheat  found 
in  the  blob  interior.  This  is  the  "extrapolated  from 
interior”  column.  This  estimate  is  less  biased  and  less 
erratic  than  the  simple-minded  interior  pixel  esti- 
mate. In  fact,  its  bias  and  average  absolute  error  are 
close  to  the  error  between  the  two  planimetry 
measurements.  This  estimate,  moreover,  is  more 
easily  achieved  by  the  A!  because  it  is  based  only  on 
interiors  of  blobs. 

A more  realistic  estimate  yet  is  obtained  on  the 
assumption  that  in  the  relatively  pure  interiors,  the 
AI  will  identify  either  100  percent  wheat  or  0 percent 
wheat.  The  extrapolation  from  pure  interiors  is  such 
an  estimate.  Either  100  percent  or  0 percent  is  ex- 
trapolated to  all  the  pixels  of  the  blob,  and  then  the 
percentage  of  wheat  of  all  pixels  is  obtained.  This 
estimate  also  gives  very  good  results  and  has  the  ad- 
vantage of  representing  a practical  sampling  proce- 
dure. 


and  the  cost  per  blob,  respectively.  The  total  variance 
is 


' “ + *W6>  <27> 

and  the  total  cost  is 


C 


sC  + 

MS 


bC 


blob 


(28) 


For  a fixed  tout  variance,  the  minimum  cost  must  be 
found.  This  can  be  done  iteratively  providing  data  on 
811(1  (^(6)  are  available. 

A reasonable  model  for  Kbtob(W  is  that  the 
variance  follows  the  variance  of  the  hypergeometric 
distribution,  corresponding  to  sampling  without 
replacement.  For  a single  stratum,  the  variance  of  the 


(29) 


where  P is  the  overall  proportion  of  wheat,  B is  the 
total  number  of  blobs  in  the  stratum,  and  b is  the 
number  of  sample  blobs  drawn. 

For  multiple  strata,  / » 1, . . . , M,  the  variance  of 
the  hypergeometric  distribution  is 


OVERALL  PERFORMANCE 
OF  PROCEDURE  B 


As  nearly  as  can  be  determined  from  the  tests  run 
so  far.  Procedure  B is  not  importantly  biased  with 
respect  to  the  source  of  labeling  information.  Thus, 
the  primary  overall  performance  measure  should  be 
the  variance  of  the  proportion  estimate  as  a function 
of  the  cost  of  labeling.  This  cost  has  two  main  com- 
ponents: the  cost  of  a single  label  and  the  cost  of 
looking  at  a segment.  The  variance  also  appears  to 
have  two  main  components,  the  variant*  *^(6) 
due  to  sampling  blobs  and  the  variance  ^(s)  due  to 
sampling  segments.  Let  s be  the  number  or  segments 
selected  and  b be  the  number  of  blobs  selected  for 
labeling.  Let  CKf  and  CWob  be  the  cost  per  segment 


where  B, , bh  and  P{  refer  to  the  hh  stratum;  Nf  is  the 
number  of  pixels  in  the  Ah  stratum;  and  N is  the 
number  of  pixels  overall.  As  long  as  A,  is  1 and  as 
long  as  bj  is  allocated  in  direct  proportion  to  B,,  the 
last  bracket  will  be  a constant  fraction  equal  to  (B  - 
b)IB,  and  can  be  taken  outside  the  summation.  Thus, 
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If  it  is  assumed  that  the  bt  are  chosen  in  propor- 
tion to  the  number  of  pixels  in  each  stratum, 


w*> 


b - b\^NM  - n 


(V# 


(32) 


The  summation  is  similar  in  form  to  the  RV  fac- 
tor computed  in  equation  (22)  except  that  blob  sam- 
pling (which  may  be  slightly  more  efficient  than  pix- 
el sampling)  is  involved  and  the  proportional  sam- 
pling rule  is  only  approximately  enforced.  Allowing  a 
constant  factor  K to  account  for  these  differences,  it 
is  assumed  that  the  summation  is  equal  to 

yK/Xl  - ^(RV) 


Thus, 


*W»  m “ bT“ W - f)(RV)  (33) 


Currently,  a model  of  this  ,ype  is  being  constructed 
and  fit  to  the  data. 

Similar  but  more  complicated  considerations  are 
involved  in  creating  a model  for  fse,(s).  In  general, 
the  behavior  of  the  model  must  be  that  the  variance 
goes  to  zero  as  s — S,  as  with  the  hypergeometric  dis- 
tribution. However,  it  also  may  be  true  that  the 
variance  due  to  segment  sampling  approaches  zero 
for  a fixed  ratio  of  s/5  as  the  number  of  segments  in- 
creases in  some  large  region.  When  larger  numbers 
are  considered,  the  distance  between  segments 
decreases  and  the  representativeness  of  that  fraction 
chosen  should  increase.  The  nine  segments  ex- 
amined do  not  provide  sufficient  data  to  resolve  this 
issue.  Tests  with  additional  segments  are  in  progress. 


preceding  and  following  labeling,  these  considera- 
tions may  be  expressed  as  bias  and  variance  with 
respect  to  the  labeling  source.  For  the  labeling  por- 
tion of  the  system,  the  accuracy  of  the  analyst  labels 
and  the  ease  of  working  with  the  image  products  in 
conjunction  with  the  machine  may  be  considered. 

On  the  basis  of  the  tests  to  date.  Procedure  B is  not 
importantly  biased  with  respect  to  analyst  labels.  In 
fact,  it  appears  reasonable  to  allow  some  small  bias  to 
make  the  analyst's  job  more  convenient  by  asking 
the  analyst  to  label  a blob  interior,  which  is  almost  al- 
ways pure,  rather  than  a random  dot,  which  is  often  a 
boundary  or  an  edge  pixel. 

The  efficiency  of  Procedure  B may  be  expressed 
in  terms  of  the  variance  reduction  factor.  For  single 
segments,  the  RV  is  around  0.6.  In  terms  of  a 
classifier  performance,  this  corresponds  to  a classifi- 
cation accuracy  of  about  0.84.  For  multiple  segments, 
the  RV  is  higher,  indicating  a loss  in  purity  of  some 
strata  as  they  are  extended  over  wider  regions.  With 
respect  to  two-way  reduction  of  variance,  the  RV  fac- 
tor is  again  about  0.6.  In  order  to  exploit  either  single- 
segment or  two-way  RV,  samples  must  be  drawn 
from  every  segment. 

The  purpose  of  the  multisegment  mode  of  opera- 
tion is  to  reduce  costs  by  allowing  analysts  to  label  in 
only  a subset  of  segments.  However,  this  is  a type  of 
block  sampling  that  introduces  an  additional  source 
of  variance.  It  is  still  an  open  question  whether  gains 
can  be  made  by  sampling  from  only  a subset  of  seg- 
ments. The  fact  that  multisegment  Procedure  B 
achieves  better  results  with  an  ancillary  variable  than 
without  is  a hopeful  sign  that  further  gains  are 
possible. 

Another  consideration  of  efficiency  is  the  ques- 
tion of  whether  it  is  better  to  blob  and  then  B-cluster 
or  to  B-cluster  without  any  spatial  processing.  Tests 
are  being  conducted  to  determine  this.  Even  if  it 
turns  out  that  blobbing  reduces  the  R V only  slightly, 
there  are  considerable  benefits  from  data  compres- 
sion and  from  providing  the  analysts  with  pure  field 
interiors  to  work  with. 


CONCLUSIONS 

It  is  evident  that  all  the  answers  about  Procedure 
B are  not  yet  in.  Before  attempting  to  draw  conclu- 
sions, some  of  the  questions  should  be  reviewed. 

The  two  important  considerations  for  the  entire 
man/machine  system  for  area  estimation  are  ac- 
curacy and  efficiency.  For  the  machine  processing 


A LOOK  TO  THE  FUTURE 

Procedure  B has  a modular  structure  in  which  pro- 
posed improvements  can  be  evaluated  objectively.  In 
the  category  of  preprocessing,  improvements  may  be 
made  by  developing  individual  detector  calibration 
procedures,  by  implementing  a spatially  varying  haze 
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effect  correction,  and  by  including  the  effects  of  view 
angle  and  background  albedo.  The  Landsat-3  return 
beam  vidicon  (RBV)  may  provide  additional  useftil 
information  on  the  atmosphere. 

In  the  category  of  feature  extraction,  spectral 
features  other  than  brightness  and  greenness  and 
spatial  features  other  than  blobs  should  be  evaluated 
(ref.  21).  The  effects  of  the  muter  data  processor  for 
Landsat-3  on  spatial  feature  extraction  should  be  ex- 
amined. Again,  the  RBV  may  be  of  assistance  in 
defining  improved  spatial  features. 

The  largest  gains  will  probably  be  made  from  an 
impiwement  in  stratification  and  an  improvement 
in  ease  and  accuracy  of  labeling.  In  stratification,  the 
problem  is  to  improve  the  purity  of  strata  while  at 
the  same  time  reducing  the  number  of  strata. 
Possibilities  include  the  use  of  prior  information  and 
the  improvement  of  the  clustering  method  itself.  For 
example,  certain  spectral  classes  are  most  unlikely  to 
be  wheat  and  could  be  eliminated  a priori,  thus 
reducing  the  portion  of  the  data  that  must  be 
sampled.  Signature  data  from  previous  years  or  sig- 
nature models  based  on  field  measurements  may  be 
used  to  isolate  a subset  of  data  likely  to  be  wheat, 
leaving  the  more  difficult  cases  to  the  analyst. 
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An  Evaluation  of  Procedure  1 

S.  G.  Wheeler ,a  R.  P.  Heydom,b  P.  N.  Mlsra,0  W.  Lee,  Jr.,a  andR.  T.  Smart a 


INTRODUCTION 

LAC1E  Procedure  1 has  undergone  continuous 
testing  and  evaluation,  starting  with  analytical  and 
experimental  studies  before  it  was  implemented  in 
the  Earth  Resources  Interactive  Processing  System 
(ERIPS)  software  and  continuing  to  the  present  with 
performance  evaluations  using  blind-site  data.  In  this 
paper,  some  of  the  evaluation  studies  and  their 
result-  are  described,  some  of  the  strengths  and 
weaknesses  of  the  procedure  are  indicated,  and  some 
areas  for  possible  improvement  are  identified. 

In  concept.  Procedure  1 is  simple  and  straightfor- 
ward. A detailed  description  of  this  procedure  is 
given  in  another  paper  (“Classification  and  Men- 
suration of  LACIE  Segments"  by  Heydom  et  al.) 
but,  for  completeness,  a short  introduction  is  in- 
cluded here.  The  steps  required  to  estimate  the  pro- 
portion of  a segment  devoted  to  small-grain  produc- 
tion are  the  following. 

1.  Select  a sample  of  pixels  and  label  them  as 
either  small  grains  or  other.  These  selected,  labeled 
pixels  are  called  type  1 dots. 

2.  Employing  multispectral  scanner  (MSS)  inten- 
sity values  from  some  or  all  of  the  type  l dots  as 
cluster  starting  vectors,  cluster  the  pixels  in  the  seg- 
ment. 

3.  Assign  a label  to  each  cluster  using  an  automat- 
ic labeling  technique  based  on  the  labels  and  inten- 
sify values  of  the  type  l dots. 

4.  Classify  the  pixels  in  the  segment  as  either 
small  grains  or  other  using  a LACIE  mixture  density 
classifier  with  cluster  statistics  serving  as  subclass 
mean  vectors  and  dispersion  matrices. 

5.  Calculate  a classificat.on-based  proportion  esti- 
mate by  counting  the  number  of  pixels  assigned  a 
small-grain  label  by  the  classifier. 

6.  Select  and  label  a second  sample  of  pixels  from 
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the  segment.  These  pixels  are  called  type  2 dots. 

7.  Generate  an  estimate  of  the  segment's  small- 
grain  proportion  by  adjusting  the  classification-based 
proportion  estimate  for  the  classification  errors  ob- 
served in  the  type  2 dots. 

Further  details,  in  the  form  of  equations,  will  be 
given  in  the  section  entitled  “Analytical  Results." 

This  set  of  steps  has  been  greatly  simplified.  In 
practice,  for  example,  one  may  be  interested  in  more 
than  two  classes  of  crops,  and  there  may  be  data 
problems  such  as  cloud  interference  or  bit  drops.  For 
simplicity,  the  simple  two-class  case  without  data 
problems  will  be  assumed.  Generalizations  of  the 
analytical  expressions  to  be  presented  are  readily 
available,  and  the  experimentation  to  be  reported 
did,  in  fact,  take  into  account  cloud  interference  and 
other  data  problems. 

A number  of  questions  arise  as  to  the  exact  way 
each  of  the  steps  should  be  performed.  For  example: 
How  many  type  1 dots  should  be  selected  and  what 
fraction  of  these  should  be  reserved  strictly  for 
cluster  labeling?  What  clustering  algorithm  should 
be  used?  How  should  clusters  be  labeled?  How  many 
type  2 dots  should  be  selected  and  how  should  they 
be  labeled?  Also,  what  is  the  effect  of  using  analyst 
interpreter  (AI)  labels  for  the  type  1 and  type  2 dots 
rather  than  generally  unavailable,  error-free  ground- 
truth  labels?  The  aim  of  the  evaluation  studies  re- 
ported in  this  paper  has  been  to  help  answer  these 
questions  by  estimating  the  effects  of  the  different 
factors  on  the  sampling  properties  of  the  segment 
proportion  estimators.  A few  results  stem  from 
analytical  expressions;  others,  from  experimentation 
guided  by  the  analytical  results. 


ANALYTICAL  RESULTS 

Much  of  Procedure  1 is  too  complicated  to  allow 
its  sampling  properties  to  be  fully  described  by  trac- 
table analytical  expressions.  Classification  and 
clustering,  for  example,  have  been  studied  for  a long 
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lime  by  many  people  and  yet  their  properties  have 
been  successfully  evaluated  in  only  the  simplest, 
single-channel  situations.  Thus,  Procedure  l's 
analytical  results  are  limited  to  a few  expressions 
which  describe  the  sampling  properties  of  the  pro- 
portion estimators  in  terms  of  the  results  of  a fixed, 
given  classification  of  a segment.1  These  expressions 
are  very  important  to  LACIE  because  they  identify 
the  features  of  the  labeling,  clustering,  and  classifica- 
tion steps  to  which  the  proportion  estimators  are 
most  sensitive. 


ClaMlftcatlon-BaMd  Proportion  Estimation 

To  start  the  discussion,  some  notation  will  be  es- 
tablished. Assume  that  the  N pixels  in  a segment 
have  been  classified  and  that  have  been  assigned  a 
small-grain  classification  label.  Then,  the  ratio 

*,(0  - -Vj /N  (I) 

is  called  the  classification-based  estimator  of  the 
small-grain  proportion  of  the  segment.  It  is  readily 
shown  that  ir^(C)  is  a generally  biased  estimator 
with  bias 

W (‘  - />,)»«,  - 'A.  <2> 

where  ps  is  the  true  proportion  of  small  grains  in  the 
segment,  90s  is  the  probability  of  erroneously 
classifying  a small-grain  pixel  as  other,  and  9 M is  the 
probability  of  erroneously  classifying  an  other  pixel 
as  snr.ll  grain.  The  bias  in  the  classification-based 
estimator  thus  depends  entirely  on  the  error  prob- 
abilities and,  since  all  the  pixels  in  the  segment  are 
assumed  to  be  classified,  there  is  no  within-segment 
error  variance.  One  purpose  of  Procedure  1 has  been 
to  remove,  or  at  least  decrease,  the  bias  of  the  propor- 
tion estimates.  As  will  be  seen,  this  decrease  in  bias  is 
purchased  at  the  cost  of  adding  a random  error,  with 
variance,  to  the  estimates. 


*ln  particular,  statements  regarding  expected  values  and 
variances  of  proportion  estimators  are  conditional  on  the  results 
of  a fixed  classification  rule. 


Analyst-Bated  Proportion  Eatimatlon 

At  this  point,  analyst  labeling  errors  will  be  dis- 
cussed. Results  from  this  discussion  will  be  used  in 
developing  the  bias  and  variance  of  the  Procedure  1 
proportion  estimator.  Consider  a situation  in  which 
an  analyst  “classifies"  a subset  of  the  pixels  in  a seg- 
ment by  assigning  a label  to  each  of  a set  of  randomly 
chosen  pixels.  If  the  analyst  labels  a total  of  n pixels 
and  assigns  a small-grain  label  to  n,  of  them,  then  the 
analyst-based  estimate  of  the  segment’s  small-grain 
proportion  is 

its(A)  = n./n  (3) 


Unless  the  analyst  is  using  ground  truth  to  label  the 
pixels,  this,  also,  is  a biased  estimator  with  bias 

b(A)  = (l  - P,)a„  - PiAos  (4) 

where  Aos  is  the  probability  that  the  analyst  labels  a 
small-grain  pixel  as  other  and  Aso  is  the  probability 
the  analyst  labels  an  other  pixel  as  small  grain.  Since 
the  analyst  has  labeled  a sample  of  the  pixels  in  the 
segment,  this  estimator  has  a variance.  Assuming 
that  pixel  labels  are  independent,2  this  variance  is  ap- 
proximately 

=Ps{1  ~ Aos)  + (>  ~ Ps)Aso  (5) 


where 


^Since  agricultural  crops  arc  grown  in  fields,  pixels  labels  are 
clustered  rather  than  independent.  However,  assuming  a very 
small  sample  constrained  to  be  well  spread  throughout  the  scene, 
independence  is  probably  not  a bad  assumption. 
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is  the  probability  that  the  analyst  will  assign  a small- 
grain  label  to  any  randomly  chosen  pixel.  The  ap- 
proximation is  accurate  in  this  case  since  the  analyst 
would  usually  label  a very  small  fraction  of  the 
22  932  pixels  in  a segment. 


Procedure  1 Proportion  Eetlmetion 

Procedure  1 attempts  to  correct  for  the  errors  in 
the  classification  estimator  9,(0  by  using  estimates 
of  the  misclassification  probabil  ies  0Mand  0M  to  ad- 
just w5(C).  According  to  the  lescription  of  Pro- 
cedure 1 in  the  introduction,  classification  in  Pro- 
cedure 1 depends  on  a sample  of  type  1 dots  used 
first  to  guide  a clustering  run  and  then  to  label  the 
resulting  clusters.  To  estimate  the  classification  error 
rates,  a second  sample  of  pixels,  the  type  2 dots,  are 
selected  at  random,  labeled  by  the  analyst,  and  used 
to  evaluate  the  classifier  and  to  estimate  the  small- 
grain  (SG)  proportion.  Table  I shows  the  results  of 
such  a step.  Here,  m y is  the  number  of  the  m type  2 
dots  that  were  classified  into  the  Ah  class  and 
assigned  a yth  class  label  by  the  analyst.  A plus 
subscript  denotes  summation  over  that  subscript; 
eg.,  ml+  ~ m,,  + mi2  is  the  number  of  type  2 dots 
that  were  assigned  a small-grain  label  by  the 
classifier.  The  analyst,  of  course,  does  not  know  the 
classification  result  when  the  labels  are  assigned.  In 
fact,  the  type  2 dots  are  selected  and  labeled  together 
with  the  type  1 dots  before  the  classification  is  per- 
formed. This  labeling  is  done  for  convenience  since 
the  variance  of  the  proportion  estimators  can  be 
reduced  by  controlling  the  numbers  m|4.  and  m2+. 
This  point  will  be  addressed  later. 

The  classification  bias,  equation  (2).  can  be  put 
into  another  form  by  defining  two  alternate 
measures  of  classification  error.  Let  X^  be  the  prob- 
ability that  a pixel  classified  as  small  grain  is,  in  fact. 


Table  /. — Results  of  Comparing  the  Classy  .don  and 
Analyst  Labels  for  the  Type  2 Dots 


Classification 

Analyst  label 

Total 

label 

SG 

Other 

SG 

m 

u 

m 

12 

Other 

m 

21 

m 

22 

m2+ 

Total 

m+t 

m+J 

— m 

an  other  pixel,  and  let  Xw  be  the  probability  that  a 
pixel  classified  as  other  is,  in  fact,  a small-grain  pixel. 
Then,  since  (1  - ir)Xro,  equation  (2)  leads  to 
the  expression 

m • - [»  - ^(o]xto  (6a) 

or  alternately 

P,  - -.(OC  - K.) 

- [<  - *,<o  \»]  <»> 

If  the  error  probabilities  Xoc  and  XTO  were  known,  this 
relation  could  be  used  to  generate  an  unbiased  pro- 
portion estimator  from  irf(C).  The  error  prob- 
abilities are  not  known,  but  they  can  be  estimated 
using  the  type  2 data  results  as 


Ks  = WI2/W1+ 
and 

\o  = m2l/m2+ 


Using  these,  the  Procedure  I proportion  estimator  is 


It  is  relatively  easy  to  show  that  the  Procedure  I 
proportion  estimator  is  a maximum-likelihood 
estimator  of  the  segment's  small-grain  proportion 
under  the  assumptions  that  (I)  ns(C)  is  not  a ran- 
dom variable.  (2)  the  type  2 dots  are  a multinomial 
sample  from  the  segment,  and  (3)  the  analyst  labels 
are  error  free.  If  the  analyst  labels  have  errors,  f>s  is 
still  a maximum-likelihood  estimator,  but  it  esti- 
mates the  proportion  of  the  segment  the  analyst 
would  have  labeled  as  small  grains  rather  than  the 
true  proportion  of  small  grains  in  the  segment.  The 
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fact  that  fis  is  a maximum-likelihood  estimator  under 
the  reasonable  multinomial  assumption  implies  that 
this  proportion  estimator  has  the  usual  large-sample 
optimality  associated  with  maximum-likelihood 
estimators  and  probably  cannot  be  improved  upon 
easily  as  long  as  information  is  limited  to  the 
classification  results  and  the  type  2 dot  labels.  Also, 
the  estimator  is  asymptotically  normal,  with 
asymptotic  mean  and  variance  given  by  standard 
maximum-likelihood  theory. 

Since  £(m|i/m(+)  1 — Aos  and  f(w2l/m2+)  * 

Aso,  the  bias  in  fis  is  virtually  the  same  as  the  bias  in 
ns(A)  given  by  equation  (4).  That  is. 


cedure  l estimator  is  closely  approximated  by 

V(fit)  - (l  - P2)  ^[^)] 

“ (>  - P2)as  (*  - as)/m  <9> 


where  Y[vs(A)\  is  the  variance  of  the  analyst-based 
proportion  estimator.  The  p in  equation  (9)  is  the 
correlation  between  the  random  variables  Xt  and  Y, 
and  is  equal  to 


b(K)  = « (l  - P^so  ~ PSos 

The  variance  of  fis  can  be  derived  by  noting  that  it  is  a 
standard  form  of  a regression  estimator  used  in  sam- 
ple survey  theory.  That  is,  if  one  attaches  random 
variables  Xf  and  Yt  to  the  fth  type  2 dot  where 


1 , if  the  dot  is  classified  „ small  grain 
0,  otherwise 


and 


1 , if  die  dot  has  a small-grain  analyst  label 
0,  otherwise 


then  ps  is  equivalent  to  a regression  estimator  of  the 
mean  of  the  K/s  and  can  be  expressed  as 

fis  • a + b[r,(0  - x]  (8) 

where  a and  b are  least  squares  estimators  calculated 
from  the  {XhYt)  pairs.  Using  this  and  following 
reference  1,  it  is  seen  that  the  variance  of  the  Pro- 


PjSS)  - vft 

yl”s(i  ~ W1  - 


(10) 


where  P(S,5)  is  the  probability  that  a randomly 
chosen  pixel  is  both  classified  and  labeled  by  the 
analyst  as  a smalt-grain  pixel.3 *  If  the  type  2 dots  were 
selected  after  classification  so  that  the  number  of 
dots  classified  as  small  grains  m!+  were  equal  to  the 
product  mns(C),  these  would  be  exact  equations  for 
the  bias  and  variance.  However,  since  the  type  2 dots 
are  selected  before  classification,  ml+  is  a random 
variable.  The  error  in  the  approximations  given  in 
equations  (4)  and  (9)  for  the  bias  and  variance  of  fis  is 
caused  by  the  probabilities  that  ml+  can  take  the 
values  0 and  m.  If  m and  ps  are  moderately  large, 
these  will  be  very  small  probabilities  and  therefore 
equations  (4)  and  (9)  will  give  very  good  approxima- 
tions. The  true  bias  can  be  either  larger  or  smaller 
than  that  given  in  equation  (4),  but  true  variance  al- 
ways exceeds  that  given  by  equation  (9).  In  cases  for 
which  both  the  true  variance  and  have  been 
calculated.  V(fis)  was  found  to  underestimate  by 
about  2 to  4 percent,  with  a maximum  error  of  about 
10  percent  if  the  true  variance.  Equation  (9),  then, 
should  be  sufficient  for  the  purpose  of  evaluating 
Procedure  1. 

Since  both  the  analyst-based  and  the  Procedure  1 
proportion  estimators  have  the  same  expected  value, 
it  is  useful  to  consider  the  rat.o  of  their  variances  as  a 
measure  of  their  relative  efficiencies.  Remembering 


3l  — p-  is  equivalent  lo  the  variance  reduction  coefficient  R 

discussed  in  the  paper  by  Heydorn  et  al. 
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that  Procedure  1 uses  two  sets  of  dots,  the  variance  of 
an  analyst-based  estimator  using  these  dots  is 


ai(*  ~ *,) 

i»i  + m2 


whereas  the  variance  of  the  Procedure  1 estimator  is 


(1  - P2) 


«,(»  - «,) 


m-. 


affected  by  changing  the  clustering  or  classification 
and  this  only  through  R •g  1 - p2.  The  value  it, 
then,  should  provide  the  most  sensitive  measure  of 
the  effect  of  these  changes.  In  practice,  one  has  the 
classification  estimator  *S(Q  and  the  type  2 dot 
results  as  shown  in  table  I from  which  to  estimate  R 
or,  equivalently,  p.  As  seen  in  equation  (10),  p is  a 
function  of  the  proportion  of  the  segment  which  the 
analyst  would  have  labeled  as  small  grains  and  the 
classification  errors  with  respect  to  the  analyst  labels. 
Two  heuristic  estimators  of  R have  been  proposed 
and  studied.  One  of  these  uses  equation  (10)  with 
P(S,S)  and  a,  replaced  by  their  estimators  based  on 
the  type  2 dots;  i.e., 


where  and  m2  are  the  numbers  of  type  1 and  type  . 

2 dots.4  The  efficiency  of  Procedure  1 with  respect  to  1\S,S)  = m,,  /m 

the  analyst  estimate  is 


and 


- m+l/m 


m2  1 m2  1 
m\  * m2\  _ pi  Am,  + m2R 


(ID 


The  term  irs  comes  from  the  classification  results. 
This  first  estimator  of  p has  the  form 


If  R < (roj/fm,  + mj)!,  the  variance  of  the  Procedure 
1 estimator  is  less  than  that  of  an  analyst-based 
estimator  and  there  has  been  a gain  in  sampling  effi- 
ciency due  to  the  clustering  and  classification  pro- 
cessing in  Procedure  1.  Otherwise,  there  is  a loss  of 
efficiency  and  a better  estimate  would  result  by  skip- 
ping the  machine  processing.  It  should  be  noted  that 
the  efficiency  of  Procedure  1 would  traditionally  be 
defined  as  the  reciprocal  of  that  in  equation  (11). 
This  definition,  however,  is  well  established  in  the 
LACIE  community  and  will  be  adopted  here.  In  the 
sequel,  R will  be  referred  to  interchangeably  as  effi- 
ciency or  variance  ratio. 

Note  that  altering  the  clustering  and  classification 
schemes  of  Procedure  1 will  not  affect  the  bias  of  its 
proportion  estimators  since  this  depends  only  on  the 
analyst  labeling  errors.  Only  the  variance  would  be 


4In  the  implementation  of  Procedure  1,  boundary  dots  (i.e., 
pixels  on  or  “near"  the  boundary  of  a Held)  were  not  used  as  type 
1 dots.  In  this  discussion,  however,  it  is  assumed  that  all  pixels  are 
used  and  that  the  same  selection  and  labeling  rules  apply  to  both 
type  ! and  type  2 dots. 


mn  ~ 

Vns(i  - ».)"*♦  lw+2 


(12) 


The  second  estimator  of  p was  developed  by  assum- 
ing that,  since  it  is  a correlation  coefficient,  there  is 
very  little  information  to  be  gained  by  using  the  N — 
m pixels  which  are  not  included  as  type  2 dots.  Con- 
sequently, the  standard  product  moment  estimator 
of  the  correlation  between  the  X-s  and  f/s  from  the 
type  2 dots  is  used  to  estimate  p.  Because  of  the 
special  0,1  nature  of  these  variables,  this  reduces  to  a 
particularly  simple  function  of  the  entries  in  table  I; 
i.e.. 


^2 


r 


m\\m22  ~ m2lm\2 
^m1+m2+m+1m+2 


(13) 


In  either  case,  the  estimator  of  R is  1 — p2. 

Table  II  includes  the  results  of  a small  Monte 
Carlo  experiment  to  compare  these  two  estimators. 
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Table  II.— Comparison  of  the  Sampling  Properties  of  Two  Estimators  of  the  Correlation  Between  Analyst  Labels 

and  Classification  Labels 


Proportion 
small  grqlns 

Classi- 
fication 
error  rates 

Errors  between  true  and  estimated  correlations 

Product  moment  (r) 

Mixed  |) 

°os 

«*> 

Av 

SD 

MSE 

Av 

SD 

MSE 

0.3 

0.6 

0.2 

mm 

mm 

mm 

-0.03 

0.17 

0.04 

.3 

.5 

.2 

KKXi 

I HE 

.09 

.21 

.3 

.4 

.2 

.02 

.1$ 

.02 

.03 

.24 

.05 

.3 

.3 

.2 

-.03 

.it 

.01 

-.03 

.16 

.03 

.5 

.6 

.2 

0 

.13 

.02 

.02 

2S 

.06 

.5 

5 

.2 

.01 

.6 

.03 

.05 

.27 

.07 

.5 

.4 

.2 

.02 

.11 

.01 

.08 

.19 

.04 

.05 

.5 

.2 

.001 

.19 

’ .01 

-.003 

.21 

.02 

For  this  experiment,  sets  of  22  932  labels  were  gener- 
ated for  simulated  segments  using  binomial  distribu- 
tions appropriate  to  the  segment  wheat  proportions 
and  classification  errors  given  in  the  table.  The  ex- 
periment was  replicated  20  times  for  each  set  of  con- 
ditions with  the  resulting  observed  averages,  stand- 
ard deviations  (SD's),  and  mean  squared  errors 
(MSE’s)  of  the  differences  between  the  true  and  the 
estimated  correlations.  In  every  case,  the  product 
moment  estimator  performed  better  than  the  estima- 
tor that  uses  the  classified  small-grain  proportion. 
The  observed  mean  squared  error  was  at  least  twice 
as  large  in  every  case  far  the  mixed  as  compared  to 
the  product  moment  estimator.  For  this  reason,  all  R 
values  reported  in  this  paper  are  calculated  as  1 - r2, 
where  r is  the  product  moment  correlation  given  by 
equation  (13).  As  a point  of  interest,  r is  a standard 
measure  of  association  for  contingency  tables.  It  is 
asymptotically  normal,  is  unbiased,  and  has 
asymptotic  variance  (ref.  2) 


^)o,2  = - 


m 


1 - p2  + (p  + 

(^|+  ^*2+)  j^+l  _ P+1 


where  Py  is  the  probability  associated  with  cell  ij  in 
table  I.  Since  the  expected  value  of  R — 1 — r2  is 


E&  = 1 - £(£r)2  + F(r)J 
= l - p2  - fV) 

k underestimates  R on  the  average.  It  has  the  ex- 
pected value 


where  Ap-P)  is  small  with  respect  to  m.  Thus,  at  least 
approximately,  R underestimates  R by  a factor  of  l 
- (1/m).  Nearly  all  the  R values  reported  in  this 
paper  were  calculated  using  60  to  140  dots;  thus,  the 
attenuation  factor  can  be  ignored. 

The  remainder  of  this  paper  consists  of  results 
from  experimental  studies  of  Procedure  1.  These  are 
given  in  terms  of  proportion  estimation  errors,  prob- 
abilities of  correct  classification,  and  the  variance 
ratio  /?,  which  measures  the  return  from  machine 
processing. 


+ ?p2 


' +1 


EXPERIMENTAL  RESULTS 

A number  of  experimental  studies  have  been  con- 
ducted to  evaluate  the  perfoimancc  of  Procedure  1. 
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This  paper  will  be  limited  to  a discussion  of  the 
resuits  from  three  of  the  experiments  and  an  evalua- 
tion of  LACIE  Procedure  1 proportion  estimates  for 
some  blind-site  segments. 

The  type  1 and  type  2 dot  labels  used  in  the  first 
two  experiments  were  derived  from  ground-truth  in- 
formation; therefore,  the  corresponding  Procedure  I 
proportion  estimates  will  be  unbiased.  The  purpose 
of  these  experiments  was  to  gain  an  understanding  of 
Procedure  1 by  validating  the  analytical  expressions 
for  means,  variances,  and  sampling  efficiencies  and 
by  studying  effects  of  modifying  some  of  the  Pro- 
cedure 1 parameters.  Ground-truth  labels  were  used  . 
in  these  studies  because  they  removed  a source  of 
variation— the  analyst  labeling  errors— from  the  ex- 
periments. Both  analyst  labels  and  ground-truth 
labels  were  used  in  the  remaining  experiment  and  in 
evaluating  the  LACIE  estimates.  Their  results  will 
include  comparisons  between  results  using  ground- 
truth  and  analyst  labels. 


Experiment  1 

The  first  experiment  was  designed  to  validate  the 
analytical  expressions  and  to  measure  effects  of 
modifying  the  numbers  of  cluster  starting  vectors, 
cluster  labeling  vectors,  and  channels  used  in  pro- 
cessing the  imagery.  Each  of  these  three  factors  was 
used  at  two  different  levels  in  a 23  factorial  experi- 
ment over  the  four  segments  identified  in  table  III. 
This  table  contains  the  segment  numbers  and  loca- 
tions, the  ground-truth  small-grain  (in  this  case, 
wheat)  proportions,  and  a variance  factor  which  is  N 
times  the  variance  of  an  analyst-based  proportion 
estimator  calculated  from  N accurate  pixel  labels. 
The  factor  levels  used  in  this  experiment  are 

1.  Number  of  cluster  starting  vectors  — (20,40) 


Table  Hi— Segments  Used  in  Experiment  I 


Segment 

Ground-truth 

wheat 

proportion 

N x variance 
of  anal}*.  ■ based 
wheat  proportion 

Number 

Location 

1965 

North  Dakota 

0.42 

0.24 

1884 

Kansas 

.37 

.23 

1591 

Nebraska 

.05 

.05 

1988 

Kansas 

.33 

.22 

2.  Number  cf  additions  labeling  vectors  used 
with  the  cluster  starting  vectors  to  label  cluster  * 
(0,20) 

3.  Number  of  channels  used  in  processing  the  im- 
agery - (4,8) 

All  clustering  runs  were  performed  using  the  ERIPS 
Iterative  Self-Organizing  Clustering  System  (some- 
times called  ISOCLS;  herein  called  Iterative).  The 
cluster  parameters  were  taken  from  the  results  of  a 
study  on  clustering  made  by  the  Mission  Planning 
and  Analysis  Division  (MPAD)  of  the  NASA 
Johnson  Space  Center.5  Classifications  were  per- 
formed using  the  standard  LACIE  mixture  density 
algorithm  with  cluster  statistics  used  as  subclass 
parameters  and  prior  probabilities  set  proportional  to 
cluster  population  sizes. 

The  first  step  in  evaluating  the  experimental  data 
was  to  validate  the  expressions  for  the  mean  values 
and  variances  of  the  Procedure  1 proportion  estima- 
tors. As  anticipated,  test  results  showed  that  these 
expressions  were  valid  and  useful.  Other  results  from 
this  small  experiment  were  rather  disappointing.  The 
experiment  produced  evidence  that,  though  Pro- 
cedure 1 has  a potential  for  generating  estimates  that 
are  very  efficient  when  compared  to  hand  counting 
by  an  analyst,  it  did  not  do  so  for  most  of  the  seg- 
ments at  most  of  the  factor  levels.  Perhaps  not 
surprisingly,  given  the  small  number  of  segments, 
none  of  the  three  experimental  factors  had  a signifi- 
cant effect  on  the  Procedure  1 proportion  estimators. 
More  surprising  was  an  almost  complete  lack  of  con- 
sistency in  the  effects  from  segment  to  segment.  For 
example,  when  processing  segment  1S91  using  20 
starting  and  labeling  vectors,  the  R value  was  in- 
creased from  0.76  to  0.999  with  the  addition  of  a sec- 
ond acquisition.  Under  the  same  conditions,  the  R 
value  decreased  from  0.82  to  0.26  for  segment  1988. 

The  distribution  of  the  observed  R values  and 
averages  of  the  R values  for  combinations  of  the  ex- 
perimental factors  are  shown  in  figure  L6  The  me- 


5 A.  D.  Wylie  and  W.  C.  Beam,  “MPAD  LACIE  Clustering 
Study."  JSC  Internal  Note  76-FM-U6,  NASA  Johnson  Space 
Center,  1977. 

6The  "box”  plot  in  figure  I is  a stylized  histogram  showing, 
from  top  to  bottom,  the  maximum.  75(h  percentile,  median,  25th 
percentile,  and  minimum  of  the  data.  The  box  shape  is  merely  to 
aid  the  eye  and  its  width  has  no  meaning  in  this  paper. 
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FIGURE  I.— Procedure  1 samplin'  efficiencies  (8)  from  experi- 
ment I. 


dian  R is  about  0.92,  with  75  percent  of  the  values 
falling  above  0.78.  All  of  the  small  R values  are  from 
segment  1988  with  eight-channel  imagery. 

Figure  2 illustrates  the  relationships  between  the 
measured  sampling  efficiencies  and  the  variances  of 
the  proportion  estimators.  This  plot  shows  the 
variances  of  two  different  analyst-based  proportion 
estimators  and  two  proportion  estimators  based  on 
Procedure  1.  The  difference  between  the  analyst- 
based  estimators  is  that  one  assumes  that  only  the 
type  2 dot  labels  are  used  to  generate  the  estimator, 
whereas  the  other  assumes  that  an  additional  20 
labels  are  available  which  would  have  been  used  as 
type  1 dots  in  Procedure  1.  The  two  curves  for  Pro- 
cedure 1 correspond  to  the  best  (0.259)  and  worst 
(0.823)  observed  R values  from  this  segment.  Since 
analyst-based  estimates  would  be  calculated  using  all 
available  pixel  labels,  only  the  best  case  for  segment 
1988  shows  a gain  in  sampling  efficiency  due  to 
machine  processing,  but  it  is  a very  significant  gain. 

The  results  of  this  experiment  indicate  a weakness 
in  the  Iterative  clustering  algorithm  since  the  R 
values  should  improve  as  more  information  is  pro- 
vided through  additional  starting  vectors  and  acquisi- 
tions. The  results  may  also  imply  that  the  acquisition 
selection  technique  is  faulty.  This  implication  may 
be  true  but  it  is  not  pertinent  *ince  the  clustering 


algorithm  should  be  capable  of  extracting  some  in- 
formation from  every  acquisition  or  at  least  it  should 
not  give  a degraded  performance  with  additional  ac- 
quisitions. Note  in  this  regard  that,  excepting  clouds, 
etc.,  which  were  accounted  for  in  this  experiment,  all 
acquisitions  contain  some  information;  poorly 
selected  acquisitions  do  not  contain  poor  data, 
merely  a suboptimal  choice  of  data. 

To  better  evaluate  the  clustering  algorithm,  the 
cluster  assignments  of  the  labeled  pixels  were  tabu- 
lated for  each  clustering  run.  An  example  of  such  a 
tabulation  is  given  in  table  IV,  which  shows  cluster 
assignments  for  the  labeled  pixels  in  segment  1988 
when  clustered  using  20  starting  vectors  and  4 chan- 
nels of  data.  Judging  by  the  pixel  assignments,  there 
were  at  most  four  pure  clusters  (i.e.,  clusters  7, 14, 17, 
and  18)  and  none  of  these  was  pure  wheat.  Also,  to 
point  out  a worst  case,  cluster  12  was  apparently  half 
wheat  and  half  other.  Problems  with  the  clustering 
labeling  algorithm  also  were  indicated  since  cluster  4, 
which  almost  certainly  contained  between  79  percent 
and  99  percent  class  other,  changed  label  from  other 
to  wheat  when  20  additional  labeling  dots  were  pro- 
vided. Results  from  the  remaining  segments  were 
similar  to  those  shown  in  table  IV. 

Since  other  experiments  were  larger  than  this  one, 
discussion  of  the  proportion  estimates  from  this  ex- 
periment will  be  omitted.  The  main  conclusion 
drawn  from  this  experiment  was  that  Procedure  1 
has  potential  to  produce  significant  sampling  effi- 
ciencies and,  consequently,  good  proportion  esti- 
mates, but  it  does  require  improvement  if  it  is  to  at- 
tain this  goal.  In  particular,  performance  of  the 
ERIPS  Iterative  clustering  algorithm  appears  poor 
despite  the  years  of  use  and  of  study  devoted  to  its 
development.  One  result  of  this  finding  was  that  a 


FIGURE  2.— Effect  of  Procedure  I efficiencies  on  variances  of 
protMffion  estimators  for  sample  segment  1988. 
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nearest  neighbor  clustering  algorithm  was  considered  Under  the  first  scheme,  ail  grid  pixels  not  covered  by 

and  subsequently  adopted  for  LACIE.  The  next  ex-  clouds  or  heavy  haze  and  not  located  within  "desig- 

periment  to  be  discussed  includes  a comparison  of  nated  other"  areas  were  available  for  labeling.  Under 

the  performances  of  Procedure  1 when  used  with  the  second  scheme,  only  Held-center  pixels  and  pure 

these  two  clustering  algorithms. 


Experiment  2 

The  second  experiment  was  designed  to  provide  a 
detailed  evaluation  of  Procedure  1 when  ground- 
truth  labels  are  available.  A set  of  13  Phase  II  seg- 
ments was  processed  using  numerous  variations  on 
the  steps  of  Procedure  1.  These  segments  are  iden- 
tified in  table  V,  which  also  gives  their  ground-truth 
small-grain  proportions,  estimates  of  their  small- 
grain  proportions  based  on  counting  labels  from  (ap- 
proximate!:.'} 209  pixels,  and  a run  type.  The  seven 
segments  with  run  type  "sequential"  were  processed 
using,  in  turn,  one,  two.  three,  and  four  acquisitions 
of  data.  The  remaining  six 'T-4”  segments  were  pro- 
cessed using  only  four  acquisitions.  The  acquisitions 
were  selected  by  an  automated  selection  algorithm 
developed  for  Procedure  1. 

All  segments  were  processed  using  both  the  Itera- 
tive and  nearest  neighbor  (NN)  clustering 
algorithms,  and  proportion  estimates  were  generated 
both  with  and  without  the  classification  step  of  Pro- 
cedure 1.  The  latter  without-classiftcation  process 
used  the  cluster  label  for  each  pixel  in  a duster  as 
that  pixel's  classification.  The  with-ciassification 
process  used  the  standard  Procedure  1 mixture  den- 
sity classification  algorithm. 

Proportion  estimates  were  also  produced  with  and 
without  the  "bias  correction”  steo  <%r  Tiocedure  1. 
The  without-bias-correction  estimate  is  merely  the 
proportion  of  the  segment  assigned  a small-grain 
label  by  the  classification  or,  as  appropriate,  cluster- 
ing algorithm.  Bias  correction  was  done  using  ALL, 
100,  75.  50,  25,  and  10  type  2 dots,  where  “ALL"  is 
interpreted  as  all  labeled  pixels  not  used  as  type  I 
dots.  This  variation  of  the  number  of  type  2 dots  is 
not  particularly  interesting  since  its  effects  are  com- 
pletely explained  by  the  analytical  expressions  given 
earlier.  It  did,  however,  provide  an  added  oppor- 
tunity to  validate  these  expressions. 

Three  different  schemes  were  used  for  selecting 
the  pixels  to  be  labeled  and  used  as  type  1 or  type  2 
dots.  In  each,  the  pixels  were  selected  from  the  209 
grid  dots  situated  at  the  intersections  of  the  grid  lines 
found  on  photographic  products  of  LACIE  imagery. 


Table  /K— Representation  of  Grid  Points  In  Clusters, 
Segment  1988, 4 Channels,  40  Starling  Dots 


Cluster 

number 

Total 
number 
of  pixels 

Number  of 
labeled 
grid  points 

Number  of 
grid  points 
labeled 
wheat 

Cluster 

label 

\ 

1326 

9 

1 

O 

2 

1191 

12 

7 

W 

3 

3196 

29 

3 

O 

4 

2349 

25 

3 

oa 

wb 

5 

228$ 

26 

3 

O 

6 

21S9 

20 

1 

O 

7 

2525 

16 

• 

O 

8 

233 

2 

1 

W 

9 

1013 

9 

6 

W 

10 

1759 

11 

8 

O 

tl 

781 

7 

4 

w 

12 

756 

6 

3 

w 

13 

690 

6 

4 

0 

14 

58 

3 

- 

0 

IS 

1217 

12 

9 

w 

16 

456 

4 

1 

0 

17 

777 

9 

. 

0 

18 

161 

3 

• 

0 

*40  libeling  pixels 

w libeling  pixels 


Table  V. — Segments  Used  in  Experiment  2 

Segment 

Percent  SG 

Run 

— type 

Number  Location 

Ground  Dots 

Phase  II 

truth 

estimate 

1003 

Colorado 

19.8 

25.82 

25.5 

Sequential 

1090 

Colorado 

32.8 

37  50 

29.1 

T-4 

1961 

Kansas 

8.2 

100 

8.0 

Sequential 

1988 

Kansas 

33.0 

32.73 

285 

Sequential 

186$ 

Kansas 

23.4 

24.39 

12.0 

Sequential 

1178 

Kansas 

15.5 

18.18 

16.0 

Sequential 

1574 

Nebraska 

8.2 

1632 

15.9 

T-4 

1624 

Norih  Dakota 

5389 

57  89 

46.8 

T-4 

1967 

North  Dakota 

34.5 

36.76 

30.0 

T-4 

1046 

Oklahoma 

23.1 

20.00 

14.6 

Sequential 

1238 

Oklahoma 

11.99 

10.05 

0 

T-4 

1978 

Texas 

484 

44.44 

17.0 

Sequential 

1084 

Texas 

16.09 

2208 

.4 

T-4 
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boundary  pixels  pixels  lying  on  boundaries  bo* 
tween  fields  of  the  same  class)  were  available  for 
labeling.  Under  the  third  scheme,  only  field-center 
pixels  were  labeled. 

The  data  analysis  discussion  starts  with  figure  3, 
which  contains  box  plots  of  observed  errors  in  the 
Procedure  I estimates  of  the  small*grain  proportions. 
These  estimates  used  either  Iterative  or  nearest* 
neighbor  clustering  with  4, 8, 12,  or  16  channels  of 
data.  The  4*,  8-,  and  12-channel  data  consisted  of  ob- 
servations from  the  7 sequential  segments  and  the 
16-channel  data  consisted  of  observations  from  all  13 
of  the  segments.  Using  the  interquartile  ranges  (i.e., 
length  of  the  box  portion  of  the  plots  as  a measure  of 
variability7),  the  following  can  be  seen. 

1.  Excepting  the  surprisingly  small  variance  of 
the  12-channel  Iterative  clustering  data,  the  variances 
decrease  gradually  as  the  number  of  channels  in- 
creases, with  a large  decrease  at  16  channels. 

2.  With  four  and  eight  channels,  the  observed 
variances  are  slightly  smaller  for  nearest  neighbor 
than  for  Iterative  clustering.  With  16  channels,  the 
variances  are  virtually  identical.  The  12-channel 
Iterative  clustering  case  again  stands  out  as 
anomalous. 

Also,  given  the  number  of  observations  in  each  box, 
there  is  no  indication  that  the  true  medians  of  these 
distributions  differ  from  zero.  This  result  was  antic- 
ipated since  ground-truth  labels  were  used 
throughout. 

Figure  4 shows  the  observed  Procedure  1 sam- 
pling efficiencies  ( R values)  associated  with  the  pro- 
portion estimators  used  in  figure  3.  The  R values 
have  a skewed  distribution  with  most  of  the  observa- 
tions falling  well  above  0.9.  This,  of  course,  implies  a 
very  small  gain  in  sampling  efficiency  for  the 
machine  processing  as  opposed  to  an  analyst-based 
hand  counting  estimator.  In  fact,  there  is  a loss  of 
efficiency  if  the  total  number  of  type  1 and  type  2 
dots  is  taken  into  account.  The  situation  improves 
somewhat,  though  not  sufficiently,  with  16  channels, 
where  the  median  of  the  R values  drops  to  about 
0.75.  Protc-^  ire  1 again  shows  some  potential  for  pro- 
ducing good  sampling  efficiencies  since,  excepting 
the  four-channel  cases,  there  are  some  low  values  of 
R. 


7The  distrioution  of  the  interquartile  range  depends  on  both 
the  number  of  observations  and  the  true  distribution  of  the  data. 
With  normally  distributed  data,  its  expected  value  tends  to  about 
4/3<r  as  the  number  of  observations  increases. 


FIGURE  3.— Compart  see  at  Procedure  t proportion  errors  mint 
alive  and  nearest  nelshber  clustering  (All-pixel  labeling). 


CHMMCLS:  < S II  M 

FIGURE  4.— Observed  Procedure  1 sampling  efficiencies  (JF)  for 
Iterative  and  nearest  neighbor  clustering  (All-pixel  labeling). 


Figure  5 gives  a .scatter  plot  of  the  sampling  effi- 
ciencies from  Iterative  and  nearest  neighbor  cluster- 
ing with  16-channel  data.  This  plot  includes  data 
from  both  the  All-pixel  and  field-center-pixel  label- 
ing schemes.  As  expected  from  the  box  plots,  the 
Alt-pixel  labeling  data  fall  very  close  to  a 45°  line. 
The  field-center-pixel  labeling  values  do  also,  except 
for  three  values  that  are  much  higher  for  nearest 
neighbor  than  for  Iterative  clustering. 

Although  not  explicitly  shown  to  be  so  in  the 
equations  in  the  section  entitled  “Analytical 
Results,”  the  sampling  efficiencies  are  functions  of 
the  accuracy  with  which  pixels  are  classified.  Figures 
6 and  7 show,  respectively,  the  proportions  of  the 
labeled  small-grain  and  other  pixels  that  were  cor- 
rectly  classified.  In  every  case,  the  labeled  other  pix- 
els were  classified  very  accurately  and  the  median  ac- 
curacy increased  slightly  with  the  number  of  chan- 
nels. The  median  classification  accuracies  of  the 
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FIGURE  5.— Scatter  plot  of  oampling  efficiency  (A  for  nearcet 
neighbor  vmui  Iterative  clustering  (16-channel  data). 
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FIGURE  I.— Plot  of  the  estimated  sampling  efficiencies  (A  ver- 
sus ground -truth  wheat  proportions. 


FIGURE  Observed  grid  dot  classification  accuracies  for  all 
labeled  small-grain  pixels. 
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FIGURE  7.— Observed  grid  dot  classification  accuracies  for  all 
labeled  other  pixels. 


small-grain  pixels  are  low,  starting  between  0.S  and 
0.6  for  4-channel  data  and  increasing  to  about  0.7  to 
0.75  with  16-channel  data.  This  would  seem  to  be  the 
prime  weakness  of  Procedure  1 as  presently  imple- 
mented—it  does  not  classify  small-grain  pixels  very 
accurately.  It  will  be  demonstrated  subsequently  that 
this  weakness  can  be  traced  to  the  clustering 
algorithm. 

Figures  8 and  9 show,  respectively,  plots  of  the 
estimated  sampling  efficiency  t and  probability  of 
correct  classification  of  wheat  A »1  HO  versus  the 
true  wheat  proportion  in  these  segments.  The  strong 
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FICURE  9.— Plot  of  ftif'IW')  versu*  ground-truth  wheat  pro- 
portion. 
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relationship  of  R and  P{W\W)  to  the  segment's 
wheat,  or  more  generally  small-grain,  proportion  is 
readily  seen  in  these  plots.  As  noted,  the  sampling 
efficiency  A can  be  approximated  by  the  least 
squares  line 


P*(  W\W) 


,-0.M+7.l  pw 
1 ♦ £-0*83+7. Ipw 


and  the  classification  probability  by  the  relationship 


R*  • 0.84  - 1.18p 


The  main  import  of  taese  figures  is  that  the  perfor- 
mance of  Procedure  1 depends  on  the  true  propor- 
tion of  small  grains  in  a segment;  there  is  improved 
performance  with  larger  small-grain  proportions.  Ex- 
periments to  detect  the  effects  of  changing  the 
parameters  of  Procedure  1 should,  therefore,  use  seg- 
ments with  a full  range  of  small-grain  proportions. 

Figures  10  and  1 1 show  two  strong  points  of  Pro- 
cedure 1.  Figure  10  contains  box  plots  comparing  the 
distributions  of  observed  proportion  errors  with  and 
without  the  bias  correction  step  of  Procedure  1.  For 
each  set  of  channels,  the  bias  correction  significantly 
decreases  the  incidence  of  large  proportion  errors 
and,  as  a result,  the  bias-corrected  proportion  esti- 
mates have  a much  smaller  error  variance  than  the 
others.  The  bias  correction  step  of  Procedure  1,  then. 


FIGURE  18.— J Camputioa  «f  clMtlflcatiM-bateS  (CLASS)  ami 
blaa-carrectcf  .BIAS)  prapwtiM  error*  (nearest  neighbor 
clattering.  AU-pitel  labeling). 
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FIGURE  tl.— Campari  ton  at  Fbatc  II  ml  PracaSar*  I proper- 
Hm  aatlmatlM  error*  (nearest  neighbor  clustering.  All-pixel 
labeling). 


contributes  to  the  efficiency  of  the  Procedure  1 
estimators. 

Figure  1 1 compares  errors  in  proportion  estimates 
produced  by  the  Classification  and  Mensuration  Sub- 
system (CAMS)  during  Phase  II  of  LACIE  with 
similar  errors  in  Procedure  1 estimates  made  using  4, 
8, 12,  and  16  channels  of  data.  It  is  readily  apparent 
that  Procedure  1 represents  a significant  improve- 
ment even  in  the  four-channel  case,  where  sampling 
efficiencies  were  shown  to  be  poor.  Note  that,  since 
the  Procedure  1 estimates  are  based  on  ground-truth 
labels  whereas  the  CAMS  estimates  are  based  on 
analyst  labels,  the  comparison  in  figure  1 1 is  biased 
in  favor  of  Procedure  I.  It  will  be  shown  later  that 
this  bias  does  not  appear  to  be  great  enough  to  invali- 
date the  conclusion  that  Procedure  1 is  an  improve- 
ment over  the  procedure  used  in  Phase  II. 

Figure  12  allows  a comparison  of  the  sampling 
efficiencies  resulting  from  two  pixel-labeling 
schemes.  This  figure  gives  box  plots  of  the  efficien- 
cies resulting  from  the  All-pixel  and  field-center-pix- 
el  labeling  schemes  for  labeling  the  type  I and  type  2 
dots.  No  large  differences  between  the  two  schemes 
are  seen,  though  the  medians  of  the  field-center-pixel 
efficiencies  tend  to  be  slightly  lower  (better)  than 
those  of  the  All-pixel  efficiencies.  As  a resu' 
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FIGURE  12.— CtmpirlMn  af  dlairiMtllaM  af  mmpllng  afllctan- 
clta  (R  with  All-pixel  and  flrld-crnter-plxel  labeling  (ncartat 
neighbor  cindering). 


LACIE  has  adopted  a procedure  in  which  only  pure 
pixels  are  labeled  for  use  as  type  1 dots.  Because  a 
bias  might  result  if  boundary  pixels  were  not  con* 
sidcred  in  the  bias  correction  step,  all  available  type  2 
pixels  are  labeled. 

The  final  factor  to  be  considered  in  the  discussion 
of  this  experiment  is  tne  effect  of  the  classification 
step  in  Procedure  1 . The  steps  of  Procedure  1 include 
a clustering  run  followed  by  classification  to  attach  a 
small-grain  or  other  label  tc  each  pixel  in  the  seg- 
ment. Since  the  clusters  are  also  labeled  as  small 
grains  or  other,  an  alternate  classification  could 
easily  be  accomplished  by  attaching  the  cluster  label 
to  each  pixel  in  a cluster.  The  number  of  pixels 
clustered  into  the  small-grain  class  could  then  be 
counted  and  used  to  calculate  a duster-based  propor- 
tion estimate.  Bias  correction  would  then  be  based  on 
comparison  of  analyst  and  cluster  labels  for  type  2 
dots.  The  result  would  be  a cluster-based  Procedure  1 
proportion  estimator.  Because  of  the  use  of  the  type  2 
dots  for  bias  correction,  this  estimator  would  have 
the  same  expected  value  as  does  the  standard  Pro- 
cedure I estimator.  The  classification  process  re- 
quires computer  resources  and  can  be  justified  only  if 
it  produces  improved  sampling  efficiencies  with  at- 
tendant decreases  in  the  variances  of  the  proportion 
estimators.  The  remainder  of  this  discussion  of  ex- 
periment 2 will  be  devoted  to  a comparison  of  the 
classification-based  and  cluster-based  Procedure  1 
sampling  efficiencies. 

First,  a small  diversion.  All  R values  presented 
thus  far  in  the  discussion  of  experiment  2 were 
calculated  without  use  of  classification  result  from 
the  type  1 dots.  AH  other  labeled  pixels  were  used  in 
the  calculation.  The  type  1 dots  were  excluded 


because  they  are  used  in  labeling  the  clusters  and, 
consequently,  will  be  more  accurately  classified  than 
the  remaining  pixels.  Their  inclusion  would  cause  a 
biased,  optimistic  estimate  of  the  R values.  Figure  13 
illustrates  this  bias  using  16-channel  classification 
results.  Every  R calculated  using  the  type  1 dots  is 
smaller  than  the  corresponding  R calculated  from  all 
labeled  pixels.  It  would  be  erroneous  to  accept  an 
estimate  of  R based  on  ail  labeled  pixels.  On  the 
other  hand,  there  is  a high  correlation  between  the 
two  estimators  of  R.  If  the  purpose  is  to  compare 
alternate  approaches  to  Procedure  1 , either  estimator 
of  R should  provide  a valid  comparison. 

The  point  of  the  preceding  discussion  is  that  the 
computer  listings  that  contain  the  results  of  this  ex- 
periment do  not  isolate  the  type  I dote  when  tabulat- 
ing cluster  results.  Consequently,  all  R values  calcu- 
lated for  a duster-based  Procedure  1 proportion 
estimator  used  results  from  both  the  type  I and  the 
type  2 dots.  The  remaining  figures  will  show  these 
biased  estimates  of  the  sampling  efficiencies.  Even 
the  classification-based  results,  then,  will  not  be  iden- 
tical to  those  in  previous  figures. 

Figures  14  and  IS  provide  a comparison  of  the 
sampling  efficiencies  from  the  duster-based  and 
classification-based  schemes  using  the  Iterative  and 
nearest  neighbor  clustering  algorithms.  There  do  not 
appear  to  be  any  significant  differences  between  the 
R values  from  the  two  schemes,  although  the  duster- 
based  values  may  tend  to  be  slightly  lower  than  the 
others.  Figures  16. 17,  and  18  arc  scatter  plots  of  R 
values  for  a duster-based  procedure  versus  a 
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classification-based  procedure  using  1 Channel  data 
and  the  3 pixel-labeling  schemes.  Again,  there  are  no 
significant  differences  between  the  two  sets  of  values 
and  they  appear  very  highly  correlated  in  all  plots. 
Slightly  more  than  half  of  the  duster-based  R values 
are  higher  than  the  corresponding  classification- 
based  values.  This  implies  a slight,  though  hardly  sig- 
nificant. advantage  for  the  classification-based  pro- 
cedure. Finally,  figures  19  and  20  show  observed  pro- 
portions of  the  labeled  pixels  that  are  correctly 
classified  if  the  cluster  labels  are  used  as  classifica- 
tion labels  for  the  pixels  in  the  clusters.  A com- 
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parison  of  these  flgures  with  figures  6 and  7 will 
show  that  the  two  classification  schemes  produce 
virtually  identical  classification  accuracies  for  the 
non-small-grain  pixels.  The  medians  of  the  percen- 
tage of  the  small-grain  pixels  that  are  correctly 
classified  tend  to  be  slightly  higher  for  the  classifica- 
tion results  than  for  the  cluster  results.  The 
differences  between  the  two,  however,  are  certainly 
not  statistically  significant.  Thus,  there  does  not  ap- 
pear to  be  a real  gain  in  sampling  efficiency  due  to 
the  classification  step  in  Procedure  1.  Also,  there  is 
no  degradation  of  results  due  to  this  step  and,  in  par- 
ticular, the  poor  classification  accuracy  results  for  the 
small-grain  pixels  are  seen  to  stem  directly  from  the 
cluster  assignments. 


Conelueiona  Prom  Experiment  2 

Results  from  experiment  2 reinforce  and  add  to 
the  conclusion  from  experiment  1.  The  main  conclu- 
sions are  as  follows. 

1.  Procedure  1 produced  much  better  proportion 
estimates  than  did  the  CAMS  Phase  II  procedure. 
This  observation  must  temporarily  be  tempered  by 
the  fact  that  the  CAMS  results  are  based  on  analyst 
labels,  whereas  the  Procedure  1 results  are  based  on 
ground-truth  labels.  Later  results  will  demonstrate 
that  use  of  the  different  labels  does  not  invalidate 
this  conclusion. 

2.  Procedure  1 was  shown  to  produce  very  good 
sampling  efficiencies  for  some  segments  under  some 
conditions  but,  in  general,  is  not  more  efficient  than 
a hand  counting  procedure  using  both  the  type  1 and 
the  type  2 dots. 


FIGURE  19.— Observed  grid  dot  cluster  classification  accuracies 
for  labeled  small-grain  pixels. 


FIGURE  20.— Observed  grid  dot  cluster  classlflcation  accuracies 
for  labeled  other  pixels. 


3.  The  main  problem  with  Procedure  1 is  the  high 
incidence  of  misclassification  of  small-grain  pixels. 
This  problem  is  caused  by  the  clustering  procedures, 
which  do  not  adequately  separate  the  small-grain  pix- 
els from  the  other  pixels. 

4.  The  nearest  neighbor  clustering  algorithm  ap- 
pears to  be  slightly  better  than  the  Iterative  algorithm 
in  terms  of  its  effectiveness  in  Procedure  1.  This 
result  is  somewhat  surprising  since  Iterative  is  a com- 
plicated algorithm  whereas  nearest  nieghbor  uses  an 
extremely  simple,  almost  naive  approach.  The 
simplicity  of  the  nearest  neighbor  algorithm  holds 
hope  for  improvement  in  clustering. 

5.  For  the  present  implementation  of  Procedure 
1 , the  classification  step  following  clustering  does  not 
appear  to  affect  the  sampling  properties  of  the  pro- 
portion estimators.  This  conclusion  may  bear  further 
study  with  additional  segments  and,  in  particular, 
must  be  reconsidered  if  new  clustering  algorithms 
are  proposed  for  Procedure  1. 

6.  Results  improved  as  the  number  of  acquisi- 
tions increased  from  one  to  four. 

7.  Finally,  the  bias  correction  step  of  Procedure  1 
was  shown  to  significantly  improve  the  variances  of 
the  proportion  estimators.  Both  the  bias-corrected 
and  uncorrected  proportion  estimators  appear  to  be 
unbiased  when  ground-truth  labels  are  used. 

Experiment  3 

The  third  experiment  employed  the  nine  seg- 
ments identified  in  table  VI  to  evaluate  the  effects  on 
Procedure  l's  performance  of  using  analyst  labels. 
This  experiment  used  both  the  nearest  neighbor  and 
the  Iterative  clustering  algorithms,  and  all  segments 
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were  processed  using  one,  two,  three,  and  four  ac- 
quisitions. As  in  the  earlier  study,  the  nearest  neigh- 
bor algorithm  produced  slightly  better  results  than 
did  the  Iterative.  Only  nearest  neighbor  clustering 
results  will  be  discussed  here. 

Figure  21  shows  the  distributions  of  observed 
differences  between  the  Procedure  1 proportion  esti- 
mates and  the  ground-truth  small-grain  proportions 
for  these  segments.  It  is  readily  seen  that  the  esti- 
mates based  on  analyst  labels  have  a significant  nega- 
tive bias  and  a very  large  variance.  The  ground-truth 
label  results  are  unbiased,  but  they  also  have  a large 
variance  when  compared  with  the  ground-truth 
results  from  experiment  2. 

Figure  22  shows  the  distributions  of  the  sampling 
efficiency  measures  for  these  segments.  Once  again, 
there  are  some  very  good  efficiencies;  i.e.,  R values 
less  than  0.6.  Most  of  the  efficiencies  are  poor,  with 
the  majority  of  the  R values  falling  well  above  0.8. 
The  improvement  in  efficiency  resulting  from  the 
use  of  more  acquisitions  seen  in  experiment  2 is  not 
seen  in  these  results. 

One  possible  explanation  for  the  poor  results  from 
this  experiment  as  compared  with  those  from  experi- 
ment 2 is  the  presence  of  the  four  small-field  seg- 
ments in  North  Dakota.  The  locations  of  their  pro- 
portion errors  in  the  box  plots  in  figure  21  are  shown 
by  the  small  circles  in  the  boxes.  The  spread  of  these 
circles  indicates  that  much  of  the  variability  is  due  to 
these  segments.  Unfortunately,  there  are  not  enough 
data  to  allow  the  questions  raised  by  this  experiment 
to  be  addressed. 

The  main  conclusion  from  this  study  is  that  a 
larger  study  should  be  conducted  using  small-field 
segments  with  both  ground-truth  and  analyst  labels. 
Another  conclusion  of  this  study  is  that  the  analyst 
labeling  errors  caused  a large  negative  bias  in  the  pro- 


TABLE VI —Segments  Used  in  Experiments 


Segment 

Ground-truth  percentage 

Number 

Location 

Wheat 

Small  grains 

1660 

North  Dakota 

26.6 

346 

1651 

North  Dakota 

20.86 

27.93 

1642 

North  Dakota 

39.2 

57.6 

1614 

North  Dakota 

26.96 

42.76 

1003 

Colorado 

16.6 

19,8 

1046 

Oklahoma 

23.1 

23.1 

1961 

Kansas 

8.2 

8.2 

1988 

Kansas 

33.0 

33.0 

1865 

Kansas 

23.2 

23.4 

portion  estimates  and  introduced  a large  source  of 
variability.  Also,  the  analyst-based  proportion  esti- 
mates do  not  appear  to  change  in  distribution  as  the 
number  of  channels  is  increased.  Because  of  the 
small  number  of  segments  and  the  mixture  of  small- 
and  large-field  segments,  these  conclusions  must  be 
tentative,  pending  a more  complete  study. 


LACIE  Operation*  Data8 

The  last  set  of  data  to  be  discussed  contains  actual 
LACIE  proportion  estimates  for  a set  of  segments 
with  associated  ground-truth  labels.  To  provide  a 
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FIGURE  21. — Proportion  errors  from  ground-truth  and  analyst 
labels. 
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FIGURE  22.— Comparison  of  sampling  efficiencies  (Jt)  for 
analyst  and  ground-truth  labels  with  Procedure  I. 


^The  data  evaluated  in  this  subsection  were  provided  by  K 
Havens  of  Lockheed  Electronics  Company.  Much  insight  was 
gained  from  her  evaluations  of  these  data. 
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basis  Tor  comparison,  these  segments  were  processed 
by  Procedure  1 using  both  ground-truth  and  analyst 
labels  for  the  type  1 and  type  2 dots.  The  data  set  is 
identified  in  table  VII.  Note  that  the  segments  were 
processed  using  two,  three,  or  four  acquisitions  of 
imagery. 

Figure  23  shows  the  observed  errors  between  the 
estimated  and  the  ground-truth  small-grain  propor- 
tions for  these  segments.  As  in  experiment  3,  there  is 
a serious  >egative  bias  in  the  proportion  estimates 
baseci  i.i.  the  analyst  labels.  This  bias  is  caused  by 
analys.  labeling  errors  and  indicates  that  the  analysts 
tend  to  underestimate  the  small-grain  proportions. 
This  observation  has  been  substantiated  in  studies  of 
the  analyst  labeling  errors.  The  estimates  based  on 
ground-truth  labels  are  unbiased  and  have  a slightly 
smaller  variance  than  do  the  analyst-based  estimates. 
There  is  a tendency  for  the  variances  from  both  label 
sets  to  decrease  as  the  number  of  acquisitions  is  in- 
creased. This  result  is  particularly  encouraging  in 
view  of  the  fact  that  different  segments  were  pro- 
cessed using  the  differing  numbers  of  acquisitions. 

Figures  24  and  2 5 show  the  distributions  of  the  ob- 
served variance  ratios  and  a comparison  between  the 
variance  ratios  from  the  analyst  and  ground-truth 
label  processing.  Only  a few  of  these  R values  are 
lower  than  C.6,  indicating  a general  loss  of  sampling 
efficiency  for  Procedure  1 compared  with  analyst 
hand  counting.  (The  value  0.6  is  used  since  most  of 
the  segments  were  processed  with  about  40  type  1 
and  60  type  2 dots  leading  to  a break-even  value  of 
0.6  for  the  R values.)  There  is  no  strong  indication  of 
an  advantage  in  the  R values  from  the  ground-truth 
labeling  over  the  analyst  labeling  results.  This  result 
implies  that  the  increased  variances  noted  in  the  pro- 
portion estimates  based  on  analyst  labels  are  caused 
mainly  by  segment-to-segment  variation  in  the 
biases  due  to  analyst  labeling  errors. 

The  main  conclusion  from  this  study  is  that 
analyst  labels  are  biased  and  tend  to  underestimate 
the  amount  of  small  grain  in  the  segments.  If  this 
bias  source  could  be  corrected  (e.g.,  by  providing  ad- 
ditional information  or  training  to  the  Al’s),  the  Pro- 
cedure 1 proportion  estimators  based  on  analyst 
labels  would  be  very  competitive  with  estimators 
based  on  ground-truth  labels.  This  experiment  again 
shows  that  Procedure  1 is  not  yet  attaining  the  sam- 
pling efficiency  required  to  be  competitive  with  an 
analyst-based  count  estimator.  This  conclusion  again 
indicates  that  Procedure  1 requires  an  improved 
clustering  algorithm  since  problems  can  be  traced  to 
misclassification  of  small-grain  pixels. 


Table  VII. — Segments  Used  in  Evaluation  of 
Procedure  I Using  Operations  Data 


Segment 

(a) 

Location 
(county,  stale ) 

Number  of 
acquisitions 

I00S  (W) 

Cheyenne,  Colo. 

4 

1032  (W) 

Wichita,  Kans. 

4 

1033  (W) 

Clark,  Kans. 

2 

1853  (W) 

Ness,  Kans. 

3 

1861  (W) 

Kearny,  Kans. 

4 

1512  (S) 

Clay,  Minn. 

2 

1520  (S) 

Big  Stone,  Minn. 

3 

1544  (S) 

Sheridan,  Mont. 

2 

1739  (M) 

Teton,  Mont. 

4 

1582  (W) 

Hayes,  Nebr. 

4 

1604  (S) 

Renville,  N.  Dak. 

2 

1606  (S) 

Ward,  N.  Dak. 

2 

1648  (Sj 

Bowman,  N.  Dak. 

2 

1661  (S) 

McIntosh,  N.  Dak. 

2 

1902  (S) 

McKenzie,  N.  Dak. 

2 

1231  (W) 

Jackson,  Okla. 

3 

1242  (W) 

Canadian.  Okla. 

4 

1367  (W) 

Major,  Okla. 

3 

1677  (S) 

Spink,  S.  Dak. 

4 

1690  (S) 

Kingsbury.  S.  Dak. 

3 

1803  (W) 

Shannon,  S.  Dak. 

4 

1805  (M) 

Gregory.  S.  Dak. 

4 

1056  (W) 

Moore.  Tex. 

3 

1059  (W) 

Ochiltree,  Tex. 

4 

1060  (W) 

Sherman.  Tex. 

2 

aW  «•  winter  wheal:  S - spring  wheat:  M — mixed  wheat. 


A Postmortem  on  Figure  1 1 

Figure  11  shows  that  the  Phase  II  CAMS  esti- 
mates of  the  small-grain  proportions  in  the  segments 
used  in  experiment  2 are  much  more  variable  than 
similar  estimates  produced  by  Procedure  1.  In  the 
discussion  of  Procedure  1.  the  fact  that  analyst  labels 
were  used  in  producing  the  CAMS  results  and 
ground-truth  labels  were  used  in  producing  the  Pro- 
cedure 1 results  was  identified  as  a possible  source  of 
this  large  difference  in  variability.  However,  the 
results  from  the  operational  data  and  experiment  3 
indicate  that  the  errors  in  analyst-based  Procedure  1 
proportion  estimators  have  a standard  deviation  only 
about  1.4  times  as  large  as  the  standard  deviation  of 
errors  resulting  from  ground-truth  labeling.  If  this  is 
true,  the  results  in  figure  II  show  a very  real  im- 
provement for  Procedure  1 over  Phase  II  CAMS, 
even  in  the  worst  case  in  which  a single  acquisition  is 
processed  by  Procedure  1. 
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FIGURE  i.3.— Ob  «rved  pruportlon  errors  using  the  operations 
data. 
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FIGURE  24.— Observed  variance  ratios  using  the  operations 

data. 


The  main  conclusion  reached  from  these  studies 
is  that  Procedure  1 needs  an  improved  clustering 
algorithm  if  it  is  to  attain  a sampling  efficiency  com- 
petitive with  simple  analyst-based  count  estimates  of 
proportions.  There  are  new  clustering  routines 
available  (e.g.,  AMOEBA,  BCLUST,  and  ECHO) 
which  use  spatial  as  well  as  spectral  information  and, 
as  a consequence,  may  be  substantially  better  than 
the  ERIPS  algorithms.  There  is  considerable  room 
for  improvement  since  the  nearest  neighbor 
algorithm,  which  produces  better  proportion  estima- 
tors than  does  the  Iterative  algorithm,  is  too  simple 
to  be  optimal. 

The  results  show  that  Procedure  1 has  a potential 
for  greatly  increasing  the  sampling  efficiencies  since 
it  did  so  for  some  segments.  Also,  Procedure  I is  a 
substantial  improvement  over  the  Phase  II  tech- 
nology. 

Currently,  the  primary  effect  of  using  analyst 
labels  is  that  they  introduce  a negative  bias  in  the 
proportion  estimators.  The  Procedure  l proportion 
estimators  based  on  analyst  labels  have  about  a 40- 
percent  larger  standard  deviation  than  do  the  estima- 
tors based  on  ground-truth  labels.  This  40-percent  in- 
crease is  due  mainly  to  variations  in  the  bias  caused 
by  analyst  labeling  errors  rather  than  to  increased  R 
values. 

Finally,  the  results  of  experiment  2,  and  other 
results  not  included  in  this  study,  show  that  the  bias 
correction  step  in  Procedure  1 produces  a substantial 
decrease  in  the  variance  of  the  proportion  estimator 
though  it  does  not  greatly  affect  the  bias.  Also,  the 
classification  following  clustering  does  not  appear  to 
affect  the  properties  of  the  estimators  and  could  be 
eliminated. 
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FIGURE  25. — Scatter  plot  of  variance  ratios  from  analyst  labels 
versus  variance  ratios  front  ground-truth  labels. 
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The  Vegetative  Index  Number  and  Crop  Identification 
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INTRODUCTION 

Considerable  research  on  the  use  of  the  vegetation 
index  number  (green  number)  (ref.  1)  conducted  by 
LACIE  and  other  government  agencies  (ref.  2) 
revealed  that  the  green-number  approach  to  drought 
and  yield  monitoring  has  been  successful.  However, 
only  a few  studies  (ref.  3)  have  examined  the  green- 
number  approach  for  crop  identification. 

In  this  study,  a vegetative  index  number  of 
numerical  value  was  calculated  from  the  digital 
values  of  the  Landsat  system.  The  objective  was  to 
provide  some  measure  of  green  growing  vegetation. 
The  purpose  of  this  paper,  as  a pilot  project,  is  to  in- 
vestigate the  usefulness  of  the  green  numbers  for 
schemes  in  crop  identification  and  acreage  estima- 
tion and  to  compare  a new  vegetation  index  number, 
the  Ashburn  Vegetation  Index  (AVI),  with  the 
Kauth-Thomas  Vegetation  Index  (K.VI)  for  crcp 
identification  schemes.  Comparisons  between  the 
AVI  and  the  KVI  are  given  in  table  I.  Table  II  shows 
the  results  of  wheat  acreage  estimation  using  LACIE 
Procedure  1 (P-i)  and  the  AVI  for  the  eight  LACIE 
sample  segments  used  in  this  study.  The  process 
used  by  LACIE  for  crop  acreage  estimates  fer  the 
1976-77  crop  season,  P-1,  will  be  described  in  another 
part  of  this  paper.  Visual  results  of  the  AVI  may  be 
found  in  the  figjres. 


minus  band  S)  of  the  Landsat  data.  This  algorithm 
operates  on  the  principle  that  as  growing  plants  turn 
green,  the  chlorophyll  in  the  leaves  absorbs  the  red, 
ihereby  lowering  the  digital  value  in  band  S;  the 
green  leaves  increase  the  infrared  reflectance, 
thereby  increasing  the  digital  value  in  band  7. 
Therefore,  the  greater  the  spread  the  higher  the  green 
number.  When  the  values  of  these  two  bands  are 
equal  to  or  less  than  zero,  the  AVI  indicates  no  grow- 
ing vegetation  for  that  pixel.  The  negative  values  are 
mapped  to  zero.  Therefore,  zero  equals  no  vegetation 
and  any  positive  value  equals  growing  vegetation. 

Kiuth-Thomae  Vegetation  Index 

The  Kauth-Thomas  (1976)  Vegetation  Index 
(KVI)  is  based  on  transformation  of  the  four 
multispectral  scanner  (MSS)  bands  oi  the  Landsat. 
They  include  a green  vegetation  index  which  is  equal 
to  0.290MSS4  - 0.562MSS5  + 0.600MSS6  + 
0.491  MSS7  and  a soil  brightness  index  which  is  equal 
to  0.433MSS4  + 0.632MSS5  + 0.586MSS6  + 
0.264MSS7.  The  green  number,  however,  is  derived 
from  the  following  transformation. 

v'  = Ax 1 (2) 

where 


DESCRIPTION  OF  THE  ALGORITHMS 

Ashburn  Vegetation  Index 

The  AVI  algorithm  is  2 times  band  7 minus  band 
S with  all  the  resulting  negative  values  mapped  to 
zero. 


a vector  representing  the  Landsat-1 
version  of  the  Kauth-Thomas  transfor- 
mation of  x'(6);  the  superscript  is  the 
pixel  number  and  the  subscript  is  the 
Landsat  channel. 


AVI  = 2(7)  - 5.  AVI  0 then  set  AVI  = 0 (1) 

The  AVI  is  a straight  linear  equation  (2  times  band  7 


aUSDA  Foreign  Agricultural  Service,  Houston,  Texas. 


0.4326  0.6325  0.5837 
-.2897  - .5620  .5995 

-.8242  .5329  -.0502 

.2229  .0125  -.5431 


0.2641  Soil  brightness 
.4907  Greenness 
.1850  Yellowness 
.8094  None  such 


'843 


TaHU  I. — A Comparison  o f the  A VI  and  the  K VI  for  Four  Dates  Over 
the  Randall  County  Intensive  Test  Site 


Pixel  Sir,.  2.  IV7t) 

h*i.  IV.  IV 77 

June  ft,  tvr 

July  12.  IV 77 

At.0 

CLh 

aw 

' DM) 7 

'?0. It) 

771 SX 

77IV4 

UD 

(J) 

Id) 

Aft  M 1 

All  Kit 

All  KU 

Ait  mi 

Line  Jo 


10 

24 

23 

8 

10 

14 

16 

0 

4 

W 

20 

0 

6 

2 

3 

7 

10 

2 

9 

N 

N 

30 

0 

4 

3 

5 

0 

4 

0 

3 

N 

40 

0 

4 

0 

0 

I 

10 

0 

5 

N 

N 

N 

50 

0 

3 

0 

1 

1 

9 

0 

7 

DO 

N 

N 

60 

0 

4 

0 

1 

0 

8 

0 

4 

DO 

N 

N 

70 

10 

11 

0 

0 

5 

14 

3 

11 

N 

N 

N 

80 

30 

26 

8 

8 

6 

12 

0 

3 

W 

W 

90 

0 

5 

0 

3 

0 

5 

18 

22 

N 

N 

100 

2 

5 

1 

2 

0 

2 

35 

33 

N 

no 

5 

9 

0 

4 

0 

1 

0 

3 

N 

N 

N 

120 

0 

5 

0 

5 

0 

2 

0 

1 

N 

130 

0 

2 

0 

2 

0 

2 

0 

4 

N 

140 

0 

5 

1 

4 

0 

4 

27 

27 

W 

N 

150 

2 

5 

0 

3 

18 

18 

8 

12 

N 

160 

0 

2 

0 

2 

0 

5 

0 

3 

N 

N 

170 

0 

3 

4 

7 

0 

3 

3 

II 

W 

W 

W/B 

180 

0 

2 

3 

15 

23 

24 

23 

N 

N 

N 

190 

5 

4 

4 

5 

14 

17 

20 

19 

N 

N 

Liny  20 

10 

0 

1 

0 

5 

16 

21 

17 

23 

N 

N 

N 

20 

8 

4 

0 

4 

0 

11 

2 

8 

W 

N 

30 

0 

3 

0 

1 

1 

8 

0 

5 

N 

N 

40 

2 

5 

0 

0 

1 

11 

0 

7 

DO 

N 

N 

50 

0 

2 

0 

1 

0 

3 

1 

12 

DO 

W 

N 

60 

8 

8 

0 

1 

0 

10 

54 

44 

N 

N 

N 

70 

11 

12 

4 

6 

10 

14 

0 

4 

W 

N 

W 

80 

11 

9 

2 

3 

12 

17 

0 

10 

N 

W 

90 

0 

3 

0 

0 

0 

8 

43 

34 

N 

N 

100 

2 

6 

0 

4 

0 

3 

21 

24 

N 

N 

110 

4 

6 

0 

3 

0 

6 

5 

II 

N 

N 

N 

120 

0 

3 

0 

3 

14 

20 

5 

9 

N 

W 

130 

3 

6 

0 

4 

1 

11 

45 

38 

140 

9 

12 

8 

8 

5 

15 

0 

6 

N 

N 

150 

3 

8 

0 

3 

28 

24 

1 

9 

W 

W 

160 

0 

1 

0 

2 

0 

8 

1 

7 

N 

N 

N 

170 

1 

5 

3 

6 

16 

21 

19 

22 

N 

180 

0 

1 

0 

2 

21 

26 

22 

23 

N 

N 

190 

0 

2 

0 

3 

1 

12 

9 

16 

N 

4 At  “ analytl  label  fnnti  P-l 
'C‘L  •-  clarification  label  from  P I 
cOTL  - gn>und*iruih  lalicl 

**W  - wheal.  N - non  wheal.  (JO  - designated  other.  W/B  “ wheal  boundary,  ITS  - Intensive  Test  Sue.  N/B  - non  wheat  boundary,  and  Til  “ threshold 


844 


Table  /. — Continued 


Enel  ,Vm\  2.  1076 

76.107 

Ecb.  IV.  IV7 7 
7?0 <0 

June  6.  IV? 7 

77I.'.K 

July  12.  IV 77 
77IV4 

AL 

CL 

or/. 

AM  KM 

AM  KM 

AM  KM 

AM  KM 

Lmc  30 


10 

2 

7 

0 

2 

4 

12 

2 

10 

N 

N 

N 

20 

1 

6 

0 

5 

8 

1$ 

3 

11 

N 

N 

30 

2 

5 

0 

4 

0 

3 

0 

0 

N 

N 

N 

40 

0 

3 

0 

1 

0 

7 

13 

18 

N 

N/B 

50 

0 

3 

0 

2 

0 

It 

0 

4 

DO 

N 

N 

60 

2 

5 

4 

5 

0 

2 

0 

6 

W 

W 

W 

70 

0 

3 

3 

4 

0 

3 

0 

3 

W 

N 

80 

6 

10 

0 

1 

0 

3 

0 

8 

TH 

W 

90 

10 

10 

0 

4 

0 

8 

17 

18 

N 

N 

N i 

100 

2 

5 

0 

1 

6 

12 

0 

2 

N 

N 

W/B  > 

ITS 

110 

0 

0 

0 

2 

15 

19 

0 

4 

N 

W ) 

120 

20 

19 

2 

7 

2 

8 

0 

0 

N 

W 

130 

35 

29 

4 

6 

4 

12 

0 

3 

W 

w 

140 

1 

3 

0 

6 

8 

11 

10 

11 

TH 

N 

150 

0 

0 

0 

0 

0 

6 

0 

5 

DO 

N 

N 

160 

0 

2 

0 

1 

0 

10 

2 

8 

DO 

N 

N 

170 

1 

5 

0 

2 

0 

6 

31 

38 

N 

N 

N 

180 

0 

0 

0 

3 

24 

29 

39 

38 

N 

N 

N 

190 

0 

4 

0 

3 

24 

26 

29 

27 

N 

N 

N 

Eme  40 


10 

0 

6 

0 

5 

0 

11 

0 

8 

N 

N 

N 

20 

6 

7 

0 

1 

0 

7 

25 

24 

N 

N 

30 

4 

8 

8 

7 

17 

17 

0 

8 

W 

W 

W 

40 

4 

9 

0 

5 

0 

4 

0 

3 

N 

N 

N \ 

50 

0 

6 

0 

3 

0 

2 

0 

1 

N 

N I 

60 

12 

12 

11 

10 

13 

1$ 

2 

9 

W 

W 

W / 

70 

12 

12 

4 

4 

0 

7 

3 

8 

w 

W > ITS 

80 

2 

8 

0 

8 

0 

1 

0 

2 

N 

N / 'TS 

90 

8 

10 

0 

3 

0 

2 

31 

26 

N 

N 

N I 

100 

4 

8 

1 

3 

0 

-1 

0 

2 

N 

N 

N 1 

110 

0 

4 

0 

2 

17 

17 

0 

2 

N 

W 7 

120 

0 

4 

0 

1 

0 

7 

51 

41 

N 

N 

130 

0 

2 

0 

4 

4 

II 

51 

41 

N 

140 

3 

8 

3 

8 

0 

7 

0 

4 

N 

N 

150 

0 

2 

0 

1 

0 

10 

6 

10 

DO 

TH 

N 

160 

3 

2 

0 

0 

0 

8 

0 

II 

DO 

N 

170 

0 

6 

0 

1 

33 

34 

43 

36 

N 

N 

N 

180 

0 

3 

0 

2 

3 

12 

1 

10 

N 

190 

0 

1 

0 

0 

3 

14 

6 

14 

N 

845 


Table  I.— Continued 


Pixel  Nov.  2.  /97ft 

7ft  .*07 

Feb.  IV.  /'•  77 
77050 

June  ft,  /V77 
7715b 

Juiy  12.  1177 
77114 

AL 

CL 

GTL 

,4  37  A,»7 

ah  Kvt 

AVI  KV! 

AVI  KV! 

/./«(■  50 


10 

0 

3 

0 

1 

4 

13 

15 

17 

20 

14 

14 

1 

3 

4 

14 

0 

2 

W 

W 

30 

3 

6 

0 

i 

0 

10 

33 

30 

N 

N 

40 

3 

6 

0 

0 

0 

6 

2 

9 

N 

W 

N \ 

50 

3 

6 

0 

3 

0 

5 

46 

40 

N 

N I 

60 

14 

12 

10 

9 

5 

12 

2 

10 

W 

W 

W I 

70 

20 

18 

8 

10 

12 

14 

2 

9 

W 

W 

w 1 

80 

0 

4 

0 

2 

0 

3 

3 

8 

N 

N > ITS 

90 

7 

10 

0 

0 

0 

5 

39 

37 

N 

N 

N I 

100 

0 

5 

0 

3 

0 

4 

0 

2 

N 

N 

N 1 

110 

7 

6 

0 

6 

0 

3 

0 

0 

N 

N 1 

120 

5 

7 

0 

1 

0 

3 

0 

2 

N 

N / 

130 

5 

7 

0 

2 

0 

6 

31 

28 

W 

N/B 

140 

5 

5 

0 

4 

0 

1 

0 

7 

N 

N 

N 

150 

0 

7 

0 

5 

0 

1 

28 

26 

N 

N 

160 

0 

1 

0 

2 

0 

l 

39 

36 

N 

170 

0 

2 

0 

1 

0 

to 

0 

1 

N 

N 

180 

9 

9 

8 

10 

8 

18 

0 

3 

W 

W 

W 

190 

0 

1 

0 

0 

0 

7 

28 

26 

N 

N 

N 

Line  V/ 

10 

0 

4 

0 

2 

0 

10 

5 

10 

DO 

TH 

N 

20 

0 

2 

0 

3 

6 

14 

2 

10 

N 

W 

30 

0 

4 

0 

2 

0 

9 

14 

15 

N 

N 

W/B  V 

40 

0 

2 

0 

2 

6 

IS 

37 

33 

N 

N 1 

50 

0 

0 

0 

-2 

2 

14 

22 

26 

N 

N 

N 1 

60 

2 

4 

0 

1 

0 

7 

1 

6 

N 

N 

W/B  1 

70 

0 

1 

0 

2 

X 

12 

2 

6 

N 

W > ITS 

80 

0 

5 

0 

4 

X 

4 

0 

4 

N 

N 

N I 

90 

14 

14 

0 

II 

16 

17 

0 

6 

W 

W 1 

100 

8 

12 

X 

4 

2 

12 

0 

4 

N 

W/B  1 

110 

0 

4 

X 

3 

0 

13 

0 

6 

N 

W/B  ' 

120 

0 

4 

0 

4 

14 

18 

18 

19 

DO 

N 

N 

130 

t 

6 

0 

3 

0 

3 

0 

5 

DO 

N 

N 

140 

0 

3 

3 

4 

0 

4 

21 

19 

N 

N 

150 

0 

4 

0 

3 

9 

14 

0 

6 

W 

W 

160 

0 

5 

2 

4 

4 

12 

0 

10 

W 

N 

w 

170 

0 

1 

0 

0 

0 

8 

5 

12 

N 

N/B 

180 

0 

5 

0 

2 

0 

7 

0 

5 

N 

190 

0 

1 

0 

0 

0 

9 

37 

34 

N 

N 

846 


Table  /. — Continued 


Pixel 

Nov.  1 1976 
76307 

Feb.  19.  1977 
77050 

June  6.  1977 
77158 

July  12. 1977 
77194 

AL 

CL 

MWM4 

CTL 

AVI 

KVI 

AVI 

KVI 

AVI 

KVI 

AVI 

KVI 

Line  70 

10 

1 

S 

0 

4 

2 

10 

4 

9 

DO 

N 

N 

20 

0 

1 

2 

3 

0 

7 

1 

11 

N 

N 

W 

30 

0 

s 

0 

4 

8 

II 

9 

IS 

W 

W/B 

40 

18 

19 

1 

2 

0 

4 

1 

12 

N 

N 

N 

SO 

14 

IS 

1 

6 

0 

8 

II 

14 

W 

W/B 

«0 

2 

2 

0 

1 

0 

3 

0 

2 

N 

W 

70 

0 

2 

0 

2 

0 

S 

2 

2 

N 

N 

W/B 

10 

0 

6 

0 

S 

0 

4 

0 

6 

N 

N 

N 

90 

6 

8 

0 

2 

0 

2 

S6 

44 

N 

N 

N 

too 

2 

7 

0 

2 

0 

to 

0 

4 

N 

W/B 

110 

0 

S 

3 

5 

0 

7 

0 

3 

W 

W 

W 

120 

2 

3 

2 

3 

0 

8 

0 

9 

N 

N 

W 

130 

0 

2 

0 

2 

0 

4 

31 

27 

N 

N 

N 

140 

6 

8 

0 

2 

0 

11 

64 

SO 

N 

N 

ISO 

2 

7 

0 

3 

3 

9 

0 

1 

N 

N 

160 

l 

S 

0 

3 

0 

3 

0 

2 

N 

170 

0 

5 

0 

2 

0 

S 

0 

8 

DO 

N 

180 

0 

7 

0 

3 

0 

4 

0 

6 

W 

W 

N/B 

190 

12 

13 

8 

8 

8 

12 

0 

6 

N 

N 

W 

Line  SO 

10 

2 

S 

0 

4 

s 

13 

» 

11 

DO 

N 

N 

20 

6 

9 

0 

4 

7 

16 

IS 

16 

DO 

N 

N 

30 

0 

4 

1 

3 

0 

II 

3 

14 

W 

W 

40 

6 

9 

0 

2 

0 

9 

0 

8 

N 

W 

SO 

4 

7 

0 

l 

0 

4 

33 

28 

N 

w 

60 

0 

6 

0 

4 

0 

3 

18 

22 

N 

N 

N 

70 

3 

8 

0 

4 

0 

S 

9 

IS 

N 

N 

N 

80 

0 

3 

0 

0 

0 

4 

67 

S3 

N 

N 

N 

90 

0 

2 

0 

3 

0 

1 

57 

48 

N 

N 

N 

100 

3 

6 

0 

3 

0 

1 

0 

S 

N 

N 

110 

0 

3 

1 

4 

0 

8 

0 

1 

W 

W 

120 

0 

3 

1 

4 

3 

14 

2 

14 

W 

w 

130 

0 

S 

2 

4 

0 

10 

0 

10 

w 

w 

140 

2 

$ 

0 

2 

17 

21 

18 

19 

N 

N 

ISO 

0 

3 

0 

4 

8 

IS 

16 

20 

N 

N 

N 

160 

0 

4 

0 

1 

0 

6 

2 

12 

N 

N 

N 

170 

0 

3 

0 

1 

0 

II 

4 

12 

N 

N 

180 

0 

2 

0 

0 

0 

10 

6 

13 

DO 

N 

190 

0 

2 

0 

0 

0 

9 

9 

13 

DO 

N 

847 


Table/.— Continued 


«.* W Nov.  2,  197* 

7*J07 

fri).  /*.  1977 
77010 

June  ft.  1977 
771.1* 

July  12.  1977 
77194 

AL 

CL 

CTL 

AW  AI7 

AW  KW 

AW  *W 

AW  A 17 

Lmt  90 


10 

2 

6 

6 

8 

14 

16 

S 

13 

W 

W 

20 

0 

2 

4 

6 

16 

18 

0 

0 

w 

N 

W 

30 

0 

3 

2 

S 

n 

MM 

0 

4 

W 

W 

40 

S 

7 

0 

4 

MM 

MtM 

0 

10 

N 

N 

w 

SO 

4 

9 

4 

$ 

10 

ii 

s 

16 

W 

W 

W ' 

60 

2! 

20 

4 

S 

10 

10 

0 

3 

w 

w 

w 

70 

9 

10 

$ 

8 

10 

It 

0 

3 

w 

w 

w 

10 

4 

8 

0 

3 

n 

0 

0 

N 

N 

N 

, ITS 

90 

0 

$ 

0 

3 

MM 

$4 

45 

N 

N 

100 

2 

7 

7 

7 

12 

IS 

6 

7 

W 

w 

na 

0 

3 

0 

S 

0 

8 

0 

S 

w 

w 

120 

0 

4 

0 

1 

0 

10 

0 

S 

N 

w 

IF1 

2 

7 

6 

8 

0 

8 

1 

9 

w 

w 

w 

140 

S 

9 

3 

5 

0 

7 

0 

1 

N 

w 

irm 

0 

6 

0 

S 

0 

4 

0 

9 

N 

160 

2 

0 

0 

0 

8 

s 

9 

DO 

N 

N 

170 

0 

2 

0 

1 

0 

10 

4 

II 

DO 

N 

N 

180 

%MfWk 

-1 

0 

-1 

0 

9 

4 

6 

DO 

N 

N 

190 

0 

0 

0 

2 

0 

9 

0 

S 

DO 

N 

Lb*  iOO 


10 

0 

3 

0 

3 

0 

4 

0 

5 

N 

N 

N 

20 

II 

13 

0 

2 

1 

7 

48 

37 

N 

N 

30 

3 

s 

0 

2 

4 

10 

44 

44 

N 

N 

N 

40 

0 

3 

0 

4 

0 

8 

3 

13 

DO 

N 

W 

SO 

0 

3 

0 

1 

0 

7 

e 

9 

DO 

W 

W/B 

60 

8 

II 

4 

6 

4 

10 

0 

S 

W 

W 

70 

4 

7 

s 

7 

8 

II 

0 

8 

W 

w 

W 

80 

6 

8 

3 

6 

6 

II 

0 

6 

W 

w 

W 

90 

6 

II 

5 

9 

0 

1 

0 

6 

w 

N 

100 

3 

a 

3 

S 

0 

3 

0 

4 

W 

w 

W/B 

110 

0 

4 

0 

0 

4 

13 

0 

8 

N 

W 

120 

0 

0 

0 

1 

0 

6 

0 

2 

N 

N 

130 

0 

1 

1 

2 

7 

12 

12 

16 

N 

N 

N 

140 

2 

8 

6 

6 

0 

7 

0 

1 

W 

W 

W 

ISO 

0 

2 

3 

4 

0 

6 

0 

2 

w 

W 

160 

0 

6 

0 

1 

0 

3 

0 

7 

N 

N 

170 

0 

-1 

0 

0 

0 

4 

18 

19 

DO 

N 

N 

180 

0 

-1 

0 

1 

0 

6 

s 

9 

DO 

N 

N 

190 

0 

1 

0 

2 

0 

II 

s 

12 

DO 

W 

N 

848 


TaHU  L—Cmduded 


hul 

Am 

. vo 

h*.  Id.  Id" 
"II  All 

ft.  /V?' 

w* 

My  1 

Id" 

794 

At 

a 

Oft. 

"777 

Ml 

Alt 

Kit 

All 

Ml 

Alt 

MV 

Line  ltd 

10 

16 

13 

8 

9 

II 

12 

0 

5 

W 

w 

w 

20 

0 

t 

0 

2 

0 

0 

0 

II 

N 

N 

N 

30 

0 

4 

0 

5 

0 

6 

2 

9 

N 

40 

0 

3 

0 

2 

7 

12 

3 

12 

IX) 

N 

N 

50 

0 

0 

0 

l 

0 

5 

0 

1 

DO 

N 

N 

60 

2 

4 

0 

2 

0 

1 

44 

38 

N 

N 

N 

70 

2 

9 

0 

5 

0 

1 

0 

4 

N 

N 

N 

80 

0 

5 

0 

5 

0 

1 

0 

7 

N 

N 

N 

90 

0 

6 

2 

6 

0 

5 

1 

II 

W 

W 

W 

too 

0 

2 

2 

4 

0 

5 

0 

4 

W 

W 

w 

no 

0 

3 

0 

3 

0 

7 

0 

3 

w 

w 

120 

03 

8 

0 

} 

0 

3 

3 

n 

w 

N 

130 

0 

4 

0 

1 

0 

-1 

0 

12 

N 

N 

140 

0 

) 

0 

1 

0 

3 

0 

9 

N 

N 

150 

0 

7 

0 

3 

0 

1 

12 

15 

N 

N 

N 

160 

0 

3 

0 

5 

0 

10 

0 

7 

V/ 

W 

W 

170 

0 

0 

p 

2 

5 

12 

3 

14 

N 

W 

ISO 

7 

S 

16 

15 

2 

g 

0 

8 

N 

N 

w 

190 

Each  vector  is  inspected  automatically,  and  any  vec- 
tor having  values  unreasonable  for  agricultural  data 
is  discarded  using  the  following  procedure. 

A pixel  y is  accepted  as  good  only  if 

1.  xt  is  less  than  12  and  12x,  - 34 x,  is  more  than 
108. 

2.  >ii  is  less  than  IS  or  more  than  120. 

3.  >}  is  less  than  8 or  more  than  30. 

4.  >•,  is  less  than  6. 

5.  y,  is  less  than  10  or  more  than  3S. 

The  greenness  level  m of  the  soil  line  then  is  esti- 
mated by  1 percent  of  the  minimum  greenness  value 
y2  for  acceptable  pixels. 

Then  the  green  number  y is  computed  for  each 
pixel  by 


Tabu  II. — Crop  Identification  Results 
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PROCEDURES 


Procedure  1 

Very  briefly,  P-l  is  the  technique  used  by  the 
LACIE  for  crop  identification  and  area  measure- 
ment during  the  1976-77  crop  season.  This  procedure 
was  duplicated  on  the  General  Electric  IMAGE-100 
(1*100)  interactive  computer  system.  It  includes  the 
selection  of  a number  of  picture  elements  (pixels) 
from  a LACIE  5-  by  6-nautical-miie  sample  segment. 
These  pixels  are  randomly  selected  from  a preset  se- 
quence of  209  pixels  located  at  the  intersection  of  a 
grid  that  is  placed  over  the  sample  segments  given  in 
the  figures.  The  pixels  chosen  are  adjacent  to  the  up- 
per lef;  of  each  intersection  of  the  grid.  From  a ran- 
dom selection  of  these  209  pixels,  two  groups  of  pix- 
els are  identified  and  labeled.  Type  I includes  a selec- 
tion and  labeling  as  spring  wheat  (SW),  winter  wheat 
(WW),  nonwheat  (N).  or  designated  other  (DO)  of 
40  or  more  pixels  that  are  used  to  cluster  and  classify 
the  rest  of  the  sample  segment. 

The  label  DO  in  P-l  represents  any  area  that  has 
been  removed  from  the  scene.  It  could  be  alfalfa, 
barley,  corn,  water,  roads,  bare  soil,  etc.  Each  of  the 
40  single-pixe!  fields  is  labeled  by  the  analyst.  The 
statistics  from  these  fields  are  then  used  for  starting 
the  clustering  and  classification  procedures  for  the 
segment.  Type  2 includes  40  or  more  additional  pix- 
els that  are  labeled  and  used  to  provide  a bias  correc- 
tion for  the  Type  1 classification  results.  This  pro- 
cedure provided  good  classification  results  but  re- 
quired approximately  3.5  hours  per  segment  These 
results  are  shown  in  table  II.  One  of  the  pixel  iden- 
tification aids  is  the  K VI  green  number.  This  number 
is  shown  for  each  of  the  209  pixels  in  table  I.  One  use 
of  this  number  is  to  determine  whether  a field  h.s 
growing  vegetation.  This  becomes  highly  significant 
during  ' he  very  early  stages  of  crop  growth.  For  addi- 
tional details  on  P-l,  see  reference  4. 


AVI  Chang*  Detection 

The  AVI  is  run  on  segment  data  that  are  housed 
on  the  1-100  disk.  It  is  calculated  by  using  a tape  load 
procedure  housed  in  the  consolidated  tape  read  pro- 
gram. This  program  loads,  in  5 minutes  and  10  sec- 
onds, the  AVI  data  to  channel  3,  band  4 to  channel  1. 
band  5 to  channel  2,  and  band  7 to  channel  4 of  the 
1-100  system.  Channels  1,  2.  and  4 provide  a regular 


color-infrared  irosge.  Three  hardwired  programs  in 
the  1-100  are  then  us  d.  The  fust  sets  the  MOO 
perimeters  so  that  ail  256  gray-level  values  are  used 
to  develop  AVI  histograms.  The  second  is  a single- 
cell extension  program  that  alarms  all  the  pixels  in 
the  segment.  The  third  is  the  multichannel  histogram 
program.  This  tark  takes  approximately  30  seconds. 
The  histogram  of  the  AVI  is  isolated  on  channel  3 
and  all  values  above  zero  are  alarmed  on  the  screen. 
The  results  are  then  assigred  to  one  of  the  eight 
theme  tracks  of  the  MOO.  That  theme  is  measured 
and  the  percent  of  scene  calculated  for  the  final  area 
measurement.  This  process  requires  approximately  7 
minutes.  This  same  process  is  done  for  each  of  the 
acquisition  dates  with  each  result  assigned  to  a 
different  theme  in  the  MOO.  These  different  themes 
can  then  be  added,  subtracted,  or  a logical  AND/OR 
performed. 

This  allows  the  analyst  to  subtract  the  AVI  mask, 
which  represents  the  native  vegetation,  from  a later 
AVI,  which  includes  areas  of  native  vegetation.  The 
analyst  can  also  see  areas  where  native  vegetation 
has  been  removed  by  using  a logical  AND/OR  pro- 
gram in  the  MOO.  Level  thresholding  the  histogram 
can  also  separate  low,  high,  or  other  densities  within 
the  AVI.  The  AVI  levels  range  from  0 to  67  for  this 
study.  In  theory,  the  upper  band  limit  could  be  128  or 
2 times  the  value  of  band  7. 


PILOT  TEST 

The  LACIE  sample  segments  used  in  this  test 
were  taken  from  operational  segments  used  in 
LACIE  Phase  III.  These  segments  were  randomly 
distributed  among  the  four  U.S  Department  of 
Agriculture  (USDA)  commodity  analysts  and  pro- 
cessed on  the  MOO.  The  segments  were  worked  using 
P-l  to  provide  an  operational  wheat  estimate.  They 
were  taken  from  intensive  test  sites  (ITS’s)  in  the 
United  States,  from  ITS  and  blind  sites  in  Canada, 
and  from  50  segments  from  Kokchctav  Oblast. 
U.S.S.R.  Biind  sites  arc  LACIE  sample  segments  that 
have  total  ground-truth  identification.  They  arc 
called  blind  sites  because  the  analyst  does  not  know 
that  ground  truth  is  being  taken  over  the  site. 

At  the  time  of  this  writing,  ground  truth  was 
available  only  for  portions  of  the  U.S.  ITS’s.  Conse- 
quently. the  results  were  compared  to  the  ground 
truth  where  available  and  to  P-l  results  where 
ground  truth  was  not  available  before  the  AVI  was 
tested;  however,  the  results  were  not  compared  until 
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all  testing  of  the  AVI  was  completed.  These  results, 
shown  in  table  II,  will  be  discussed  later  in  this  paper. 

By  happenstance,  these  segments  provided  differ- 
ing conditions  which  helped  identify  various  ways 
the  AVI  could  be  used  for  crop  identification. 
Among  these  were  a straight  acceptance  of  the 
results  of  the  algorithm,  segments  where  the  AVI 
resuits  were  identified  as  native  vegetation  on  one 
date  and  subtracted  from  the  AVI  results  of  a second 
date,  segments  where  growing  vegetation  was  iden- 
tified by  the  AVI  and  a histogram  produced  to  isolate 
the  crop  in  question,  segments  where  the  crops  were 
identified  and  P-1  used  to  separate  one  crop  from  the 
other,  and  instances  where  limited  information  was 
obtained.  These  results  are  described  in  a subsequent 
section  of  this  paper. 


Randall  County,  Texas,  ITS  1978,  has  been  used 
to  show  the  digital  values  from  the  AVI/KVI  (table 
I).  This  site  has  both  irrigated  and  dryland  winter 
wheat  fields.  It  experienced  a severe  drought  during 
the  winter  months  of  1976-77. 

Landsat  imagery  of  the  ITS  for  Julian  dates  76290, 
76307,  77032,  77050,  77158,  and  77194  during  the 
1976-77  crop  season  was  available.  Of  these,  76307, 
77050,  77158,  and  77194  (fig.  1)  are  used  to  compare 
the  AVI  with  the  KVI.  The  results  are  shown  in  table 
I.  The  first  two  numbers  of  the  Julian  dates  listed 
above  are  for  the  year,  and  the  last  three  numbers  are 
for  the  day  of  the  year. 

Table  I is  set  up  so  that  each  part  of  the  table 
represents  one  line  of  the  same  segment.  Every  tenth 
pixel  was  sampled.  There  are  209  samples  for  each  of 


ntil  RE  I. — Randall  C'ounlv,  Texas;  I.ACTK  sample  segment  1978.  (al  Day  "6.107,  Not  ember  2,  1976.  (bl  Day  77(15(1,  Kebruars  19. 
1977.  (c)  Day  77IS8.  June  6.  1977  Idl  Day  77194.  July  12.  1977. 


the  lour  dales  used.  These  are  the  same  209  single- 
pixel fields  used  in  P-1  The  last  three  columns  are 
labels  front  P-l  and  the  ground  truth  for  the  ITS.  The 
analyst  labels  < A l.  > are  used  as  labels  for  starting 
statistics  in  the  clustering  and  classification  of  the 
scene  The  classification  libel  (CL)  is  the  result  of 
the  P-l  interactive  cluster  and  a mixture  density 
classification.  The  ground-truth  label  tCiTL)  is  true 
only  for  the  pixels  within  the  brackets;  the  other 
labels  in  this  column  are  interpreted  from  signatures 
in  the  ITS.  Some  of  the  CiTL's  are  border  pixels 
which,  because  of  registration  problems,  may  be  in  a 
field  on  one  acquisition  and  out  of  the  field  on 
another  acquisition  These  pixel  labels  are  followed 
by  a "B 

The  dryland  wheat,  because  of  dry  weather  and 
hot  temperatures,  was  ripe  and  ready  for  harvest  by 
June  6;  however,  the  irrigated  w heat  was  just  getting 
ripe  by  July  12.  A separation  of  time  between  dry- 
land and  irrigated  wheat  harvest  is  not  unusual,  but  3 
weeks  to  a month  separation  is  not  common.  The 
green  numbers,  however,  reflect  this  separation. 

Photographs  of  segment  1978  for  the  four  dates 
used  in  table  I are  shown  in  figure  1.  These  photo- 
graphs can  be  used  to  follow  any  of  the  single-pixel 
fields  used  to  show  what  the  AVI/kVl  represent  and 
are  especially  useful  for  comparing  differences  in  the 
AVI/KVL  Line  40/pixel  40  is  a good  example  of 
these  differences.  On  November  7.  '.976,  this  milo 
field  was  ripe  and  harvest  had  iust  begun.  It  is  possi- 
ble that  this  field  was  still  partially  green  Both  the 
AVI  at  4 and  the  KVI  at  9 indicate  some  green 
vegetation  On  February  19.  1977,  u is  obvious  that 
the  field  has  been  harvested  and  only  milo  stubble  re- 
mains. The  AVI  at  zero  indicates  no  green  in  the 
field,  but  the  K VI  at  5 is  recording  green  In  this  case, 
as  well  as  in  others  in  the  series,  the  k VI  is  recording 
a number  over  w hite/vellow  colors  of  stubble  fields. 
The  KVI  drops  from  9 to  3 in  this  series.  The  AVI 
drops  from  4 to  0 and  remains  at  0 throughout  the 
series. 

The  AVI/KVI.  however,  are  highly  consistent  in 
the  low  to  middle  ranges  where  green  vegetation  is 
evident.  This  is  shown  on  line  40/pixel  60.  An  irri- 
gated wheat  field  has  an  AVI  series  of  1 2. 1 1 , 1 3,  and 
2;  the  kVI  ha>  a 12.  10,  15,  and  9 for  the  same  pixel. 

Photographs  (from  a 35-millimeter  camera)  of  the 
1-100  television  screen  of  the  results  of  the  AVI  for 
segment  1978,  February  19. 1977,  are  shown  in  figure 
2.  This  acquisition  date  was  chosen  because  confu- 
sion front  native  grasses,  weeds,  or  other  crops  is  at  a 


FUJI  RK  2. — Randall  County,  Texas:  FACIE  'ample  segment 
t**7K  with  AVI  results,  February  IS,  |U77  (orange  “ 
\\  l/wheat). 


minimum.  The  orange  color  is  the  AVI  identification 
of  wheat.  The  theme  tracks  of  the  I- 1 (X)  are  used  to 
hold  successive  dates  of  AVI  results.  The  P-l  results 
are  shown  in  figure  3. 

No  ground  truth  was  av  ailable  for  the  February  19 
date.  Segment  ground  truth  was  first  calculated  by 
Accuracy  Assessment  for  the  June  6, 1977,  date.  Ac- 
curacy Assessment  is  a group  within  LAC1E  that 
takes  the  ground-truth  information  and  assigns  the 
ground-truth  label  to  each  of  the  single-pixel  fields 
In  ITS  segments,  where  only  partial  ground  truth  is 
collected,  they  use  the  imagery  and  available  ground 
truth  ind  expand  these  signatures  to  the  single-pixel 
fields  .hat  do  not  have  ground-truth  labels. 

The  AVI  results  for  the  77158  date  identified  both 
spring  and  summer  crops.  These  crops  were 
thresholded  out  of  the  scene  by  subtracting  the  AVI 
theme  of  77050  from  77158.  The  results  are  shown  in 
table  II. 

ITS  1964.  Ellis  County.  Kansas,  provides  a case 
where  only  one  winter  date,  77084,  was  required  to 


FHil'RF  3. — Randall  Count).  Texas;  I.ACIF  sample  segment 
I17N  with  l’-l  results.  February  I1),  I **77  (pink  I’-l/wlieaO. 
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obtain  an  estimate.  There  was  no  attempt  to  remove 
native  vegetation  front  this  dale  or  to  run  additional 
dates.  An  estimate  for  this  date  was  obtained  in  6 
minutes.  Also,  no  effort  was  made  to  determine 
whether  the  fields  identified  were  harvested  for 
grain. 

ITS  1971.  Hill  County,  Montana,  was  used  to  iden- 
tify a segment  that  contained  both  winter  and  spring 
wheat.  The  AVI  for  day  77149  was  run  and  a 
histogram  was  produced.  It  was  obvious,  by  viewing 
the  results  on  the  television  screen,  that  some  spring 
crops  were  just  becoming  visible.  By  taking  out  the 
lower  five  levels,  the  winter  wheat  and  a small 
amount  of  native  vegetation  were  separated  from  the 
spring  crops.  The  results  of  the  AVI  for  77149  were 
subtracted  from  the  AVI  of  77184  and  77202.  The 
final  results  are  shown  in  table  II. 

ITS  1978.  Finney  County,  Kansas,  is  a case  where 
minimal  results  were  obtained  by  using  the  AVI. 
Drought  conditions  during  the  winter  and  a lack  of 
adequate  segment  coverage  were  the  prime  reasons 
for  a failure  to  adequately  identify  winter  wheat.  By 
77109,  the  spring  rains  had  not  come  to  this  area,  and 
there  was  no  coverage  between  77104  and  77175. 
Therefore,  no  good  green  period  was  available  for  the 
AVI  to  estimate.  By  77175,  the  crop  was  already  ripe 
and  being  harvested.  An  attempt  was  made  to  add 
the  theme  tracks  from  77104  to  a thresholded  theme 
from  77175.  The  results  of  this  attempt  are  shown  in 
table  11 

Ground  truth  for  the  Canadian  blind  sites  had  not 
been  received  at  this  writing,  so  the  results  of  the 
AVI  are  compared  to  P-1  results.  These  segments 
show  special  uses  of  the  AVI  for  obtaining  an  esti- 
mate. 

Saskatchewan  blind  site  3159  provided  a case 
where  the  AVI  histogram  provided  a good  spring 
wheat  estimate.  The  spring  grains  were  first  iden- 
tified by  using  an  upper  threshold  to  remove  the  na- 
tive vegetation  from  the  scene.  A lower  threshold 
was  then  used  to  separate  the  older  spring  wheat 
from  the  younger  small  grains  in  the  segment.  These 
threshold  levels  were  identified  by  the  analyst  while 
viewing  the  color  monitor.  He  systematically 
removed  AVI  levels  until  the  native  vegetation  area 
and  the  younger  small  grains  at ’a  were  identified. 
These  results  are  shown  in  table  11. 

Saskatchewan  blind  site  3186  provided  a cast 
where  a multilcmporal  AVI  was  insufficient.  In  this 
case,  the  native  vegetation  was  identified  with  the 
AVI  on  77150.  This  theme  was  saved  and  subtracted 


EK.l  RE  4. — Saskatchewan.  Canada:  LACIE  sample  segment 
3186  with  AVI  results.  Vellow  = AVI  of  77150;  red  plus  yellow 
plus  purple  = AVI  of  77185'  red  plus  purple  « difference  of 
77150  and  77185;  and  red  = P-l  classification  of  wheat. 

front  the  AVI  results  from  77185.  No  histogram  sep- 
aration could  be  found  that  would  isolate  the  spring 
wheat.  So  P-l  was  used  over  the  results  of  the  AVI  to 
separate  the  spring  grains.  The  results  are  shown  in 
table  II  and  figure  4. 

Saskatchewan  blind  site  3192  is  shown  in  figure  5 
The  yellow  in  this  scene  is  the  result  of  the  AVI  for 
77145.  This  native  vegetation  mask  was  subtracted 
from  the  AVI  on  77182.  The  results  are  shown  in  ta- 
ble II. 

Kokchetav,  U.S.S.R.,  sample  segment  8402.  is 
shown  in  figure  6.  The  AVI  for  this  segment  was  run 
for  acquisitions  77150  and  77187  and  the  results  are 
presented  in  figure  7.  The  orange  color  was  from 
77150  and  the  orange  plus  the  yellow  was  from 
77187.  All  the  orange  was  identified  as  native  vegeta- 
tion and  all  the  yellow  was  identified  as  low-density 
wheat.  The  blue  is  the  result  of  the  P-l  classification 
for  wheat. 


FIGl'RE  5. — Saskatchewan.  Canada;  LACIE  sample  segment 
.3192  with  AVI  results  from  day  77145  (yellow  - native  vegeta- 
tion (1)0  mask)). 
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FIGURE  6.— Kokcheta*,  I .S.S.R.;  LACIF.  sample  segment  8402.  (a)  Da>  77150.  May  29.  1977.  (b)  Day  77187.  July  5.  1977. 


DISCUSSION 

This  study  has  demonstrated,  on  the  basis  of  a 
limited  number  of  ground-truth  cases,  that  a change- 
detection  green-number  approach  to  crop  identifica- 
tion and  acreage  measurement  could  be  successful. 
This  has  been  demonstrated  over  sample  segments 


FIGl  RF.  7. — kokchetav,  U.S.S.K.;  I.ACIF  sample  srgmrnl  8402 
with  AVI  and  P-l  results.  Orange  AVI  of  77IS0  - nalise 
tegetation  (DO  mask);  orange  plus  yellow  = AVI  of  77187; 
yellow  = difference  spring  wheat;  blue  = P-l  spring  wheat. 


in  three  countries  (the  United  States.  Canada,  and 
the  USSR).  Success  has  been  demonstrated  in  the 
extremely  fast  calculation  of  the  AVI;  whereas  P-l 
requires  210  minutes  on  the  average,  the  AVI  re- 
quires 6 to  12  minutes  to  obtain  a comparable  wheat 
estimate.  Time,  accuracy,  and  ease  of  use  are  con- 
siderations in  the  selection  of  crop  identification 
methods. 

Two  significant  results  show  that  the  AVI  iden- 
tifies growing  vegetation  very  near  the  soil  line;  and 
that  all  positive  values  record  growing  vegetation.  A 
project  now  underway  will  determine  how  much 
vegetation  is  required  to  first  produce  a positive 
value  for  the  AVI  and  the  KV1  Preliminary  results 
of  this  project  indicate  AVI  identification  at  the  two- 
to  three-leaf  stage. 

The  AVI  was  preferable  to  the  KVI  for  very  early 
season  identification  of  growing  vegetation.  Since  the 
KVI  records  a positive  green  number  over  wheat 
straw'  and  niilo  stubble,  the  soil  line  is  difficult  to  ob- 
tain. The  AVI,  however,  registers  a positive  value 
only  when  green  vegetation  is  present.  The  AVI  and 
the  KVI,  however,  are  remarkably  close  when  grow- 
ing vegetation  is  evident. 

Since  the  AVI  change-detection  system  is  based 
on  identification  and  measurement  of  growing 
vegetation,  it  is  reasonable  to  assume  that  major  crop 
types  that  have  different  growing  seasons  can  be 
identified  and  measured.  This  is  shown  in  the  iden- 
tification of  native  vegetation  that  was  subtracted 
from  later  AVI  results,  and  in  the  separation  of 
winter  and  spring  grains  by  using  the  AVI  change- 
detection  system. 
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Since  there  are  large  areas  of  interest  where  native 
vegetation  is  identifiable  by  the  AVI  before  any  crop 
is  detectable,  the  AVI  can  be  used  to  locate  native 
vegetation  areas  (DO  areas  in  P-1).  The  stored  areas 
(masks)  can  then  be  subtracted  from  later  acquisi- 
tions. Since  the  AVI  for  any  acquisition  only  detects 
growing  vegetation  which  also  includes  the  crop  or 
crops  of  interest,  the  subtraction  of  a vegetation  DO 
mask  leaves  only  the  crop  or  crops  of  interest.  These 
can  then  be  identified  and/or  measured.  All  other 
DO  areas  such  as  roads,  water  bodies,  cities,  and  bare 
soil  are  automatically  removed  from  the  scene  by  the 
AVI. 

It  is  also  reasonable  to  assume  that  information 
based  on  the  AVI  change-detection  system  can  be 
derived  from  full-frame  data.  This  is  cost  effective 
because  the  new  interactive  computers  that  use  array 
or  parallel  pipe  systems  can  calculate  linear  equa- 
tions exceedingly  fast.  It  is  estimated  that  the  PDP 
11-70  with  a parallel  pipe  system  can  calculate  the 
AVI  on  full-frame  data  in  less  than  6 minutes.  This 
makes  feasible  a total  inventory  of  a state  or  country 
to  check  and/or  replace  sample  segment  aggrega- 
tions. 

This  study  has  identified  a procedure  that  could  be 
tested  in  an  operational  system.  This  would  provide 
additional  information  on  the  AVI  usage,  its  relative 
strengths  and  weaknesses,  and  its  cost  effectiveness. 
Additional  studies  should  be  conducted  using  the 
AVI  for  stress  measurement,  soil  moisture  iden- 
tification, direct  yield  or  yield  modification  through 
stress  factors,  and  bare  soil  for  early-season  esti- 
mates. 


CONCLUSION 

Based  on  the  limited  number  of  samples  used  in 
this  study,  the  AVI  change-detection  system  appears 
to  be  a promising  procedure  for  crop  identification.  It 
was  found  effective  in  identification  of  crops  where 
the  crop  was  the  only  growing  vegetation.  It  was 
found  effective  in  identification  of  native  vegetation 
in  the  spring  wheat  regions  of  the  U.S.S.R.  and 
Canada.  It  was  found  effective  over  areas  where 
timely  acquisitions  allowed  for  the  development  of 
native  vegetation  masks  which  were  subtracted  from 
later  AVI  results  to  provide  a good  crop  estimate.  It 
was  found  effective  when  the  various  crops  were  at  a 
growing  stage  that  could  be  separated  by  density 
levels  of  the  histogram. 


The  AVI  was  not  effective  when  a green-phase  ac- 
quisition was  not  received.  It  was  marginally  effec- 
tive when  crops  in  the  scene  could  not  be  separated 
by  green-number  histograms.  However,  there  was 
some  advantage  in  knowing  where  the  crops  were  so 
that  other  classification  methods  could  be  used.  In 
these  instances,  P-1  was  used  and  found  to  yield  ac- 
curate results.  Timeliness  and  accuracy  are  key  fac- 
tors in  the  selection  of  methods  for  data  analysis. 
The  AVI  and  P-1  were  found  to  be  equally  accurate 
in  this  study.  However,  the  time  differential  between 
crop  identification  on  the  MOO  Hybrid  System  and 
the  AVI  for  equally  accurate  results  highly  favored 
the  AVI.  This  suggests  the  consideration  of  AVI  in  a 
large  test  to  determine  its  suitability  for  inclusion  in 
an  operational  system.  Such  a combination  of  pro- 
cedures could  enhance  the  timeliness  and  cost  effec- 
tiveness of  analysis  with  no  sacrifice  of  accuracy. 
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Manual  Landaat  Data  Analy8ls  for  Crop  Type 

Identification 

C.  M.  Hay* 


INTRODUCTION 

An  important  component  of  the  measurement 
procedures  in  LACIE  has  been  the  manual  iden- 
tification of  crop  type  by  human  analysts.  This  paper 
will  briefly  describe  the  process  of  manual  analysis 
for  crop  identification,  the  problems  encountered  in 
LACIE  that  were  associated  with  the  manual  crop 
identification  measurement  procedures,  and  the 
research  undertaken  in  cooperation  with  LACIE 
operations  by  the  supporting  research  community  to 
effect  solutions  to  or  greater  understanding  of  the 
manual  analysis  problems. 


HISTORY  OF  MANUAL  INTERPRETATION 
IN  LACIE 

LACIE  Phasaa  I and  II 

Throughout  LACIE  Phases  I and  II  (1975  and 
1976),  the  analyst  performed  two  main  tasks.  The 
first  task  was  to  outline  representative  areas  (fields) 
for  all  spectral  classes  in  a segment  on  the  basis  of 
their  appearance  on  the  Landsat  image  product.  The 
spectral  statistics  generated  from  these  areas  were 
used  as  training  for  maximum  likelihood  classifica- 
tion. The  second  task  was  to  label  the  crop  type 
(wheat/nonwheat)  in  the  selected  training  areas. 
This  process  of  first  selecting  representative  training 
areas  and  then  labeling  the  crop  type  in  the  areas 
comprised  what  is  called  the  “Fields  Procedure.”  An 
ana'yst  took  approximately  12  hours  to  process  a seg- 
ment by  the  Fields  Procedure  and  to  evaluate  and 
possibly  rework  the  results.  Half  of  this  time  was 
spent  selecting  and  recording  training  areas;  only 
one-eighth  of  the  time  was  spent  actually  labeling  the 
areas  as  to  crop  type. 
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LACIE  Phase  III 

By  contrast  with  the  procedure  of  Phases  I and  II, 
a procedure  was  developed  and  implemented  in 
LACIE  Phase  III  (1977)  which  incorporated  cluster- 
ing for  spectral  class  definition  and  training  statistics 
generation.  This  procedure  is  called  Procedure  1.  The 
analyst  was  freed  from  the  time-consuming  task  of 
spectral  class  definition  and  could  now  concentrate 
solely  on  crop  type  labeling.  A new  within-segment 
sampling  strategy  involving  randomly  selected  dots 
(pixels)  was  another  innovation  of  Procedure  1.  The 
analyst  had  only  to  label  sample  dots  as  to  crop  type, 
thus  reducing  his  segment  processing  time  to  approx- 
imately 3 or  4 hours.  In  Phase  III,  therefore,  the 
analyst  had  only  one  main  analysis  task— crop  type 
identification. 


CROP  TYPE  IDENTIFICATION— 

THE  ANALYSIS  PROCESS 

In  simple  terms,  the  interpretation  process  (also 
called  labeling)  consists  of  two  main  components: 
(1)  feature  detection  and  physical  characteristics 
determination,  and  (2)  feature  evaluation,  including 
identification  and  condition  assessment.  While  these 
processes  may  occur  simultaneously  and  iteratively, 
they  can  be  treated  separately  to  facilitate  under- 
standing. Feature  detection  is  the  action  of  dis- 
criminating a unique  landscape  feature  (a  field  in  the 
LACIE  case)  on  the  basis  of  spectral,  spatial,  and 
temporal  characteristics  observable  in  Landsat 
multitemporal-spectral  data.  Feature  evaluation  is  the 
process  of  assessing  available  data  by  analytical 
means  and  then  synthesizing  the  pertinent  data  to 
conclude  the  feature's  identity  and  condition. 
Feature  identification  is  the  action  of  assigning  a 
name  (e.g.,  wheat,  nonwheat)  to  the  detected  feature. 
Correct  feature  identification  cannot  properly  pro- 
ceed unless  feature  detection  has  first  occurred. 
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Feature  detection,  however,  does  not  ensure  feature 
identification. 


Feature  Detection  and  Characteristic* 
Determination 

In  agricultural  environments,  the  features  that  an 
analyst  wishes  to  detect  are  cropped  fields.  The 
feature  characteristics  that  an  analyst  must  determine 
are  (1)  the  size  and  shape,  the  type  of  boundary  ele- 
ments, and  the  spatial  relationships  of  similarly  and 
dissimilarly  appearing  features  (spatial  charac- 
teristics); (2)  the  development  patterns  throughout 
the  growing  season  for  the  fifeld  (temporal  charac- 
teristics); and  (3)  the  spectral  response  in  specific 
time  periods  corresponding  to  given  crop  type 
biostages  (spectral  characteristics).  Of  these  three 
characteristics,  the  second  is  the  most  significant  to 
the  analyst  for  the  detection  and  identification  of  any 
crop  type.  The  other  two  characteristics  are  neces- 
sary when  significant  temporal  overlap  exists  be- 
tween one  crop  type  and  some  confusion  crops, 
when  key  acquisitions  are  missing  or  of  poor  quality, 
or  when  ambiguity  exists  in  the  data.  Obviously,  the 
probability  of  correctly  identifying  a crop  in  a given 
field  will  be  low  if  a spectral  response  indicating 
vegetation  canopy  is  never  detected  during  the  grow- 
ing season  or  is  not  detected  at  particularly  signifi- 
cant vegetation  biophases  specific  to  given  crop 
types. 


Feature  Characteristics  Evaluation 
for  Crop  Identification 

While  site-specific  Landsat  data  allow  an  analyst 
to  detect  a feature  and  determine  its  temporal, 
spectral,  and  spatial  characteristics,  ancillary  data 
and  a priori  knowledge  from  outside  the  Landsat 
data  are  necessary  for  the  analyst  to  identify  and 
label  a detected  feature;  that  is,  nowhere  will  one  find 
the  words  “ vheat  field”  written  across  a field  as  ob- 
served on  Landsat  data.  A priori  knowledge  and  an- 
cillary data  supply  information  about  what  crops  are 
grown  in  a region,  the  rate  and  timing  of  crop- 
specific  canopy  development,  cropping  and  cultiva- 
tion practices  employed  in  a region  or  specific  to  a 
given  crop  type,  the  characteristic  appearances  of 
given  features  on  Landsat  data,  etc. 

A priori  knowledge  is  gained  from  training  and 
past  experience.  Ancillary  data  consist  of  data  that 


are  additional  to  the  site-  and  date-specific  Landsat 
spectral  data.  Ancillary  data  necessary  for  crop  type 
identification  are  (1)  crop  calendar  information,  in- 
cluding average-normal  and  year-specific  data;  (2) 
historical  crop  proportions  for  several  recent  years; 
(3)  regional  cropping  practice  information,  such  as 
crop  rotation  sequence,  cultivation  practices,  and  ir- 
rigation practices;  and  (4)  occurrence  of  meteorologi- 
cal events  affecting  crop  development  and  crop 
spectral  response.  For  a more  complete  description 
of  the  manual  analysis  process,  see  this  author's  sym- 
posium paper  entitled  “Manual  Interpretation  of 
Landsat  Data." 


PROBLEMS  ENCOUNTERED  IN  LACIE  WITH 
MANUAL  CROP  IDENTIFICATION 

In  Phases  I and  II  of  LACIE,  it  was  found  that,  in 
some  regions,  the  analysts'  interpretation  error  was 
beyond  the  tolerance  limits.  Several  problem  areas 
associated  with  each  of  the  two  main  interpretation 
components— feature  detection  and  feature  evalua- 
tion—were  identified.  Solutions  to  these  problems 
were  addressed  in  LACIE  through  cooperation  be- 
tween the  research  community  and  LACIE  opera- 
tions personnel.  The  problems  identified  as  opera- 
tive in  LACIE  can  be  grouped  into  three  main  areas 
related  to  the  manual  analysis  process  and  measure- 
ment procedures.  These  are  (1)  problems  associated 
with  feature  detection  and  characteristics  determina- 
tion (the  first  component  of  the  manual  analysis 
process),  (2)  problems  associated  with  feature 
evaluation  (the  second  component  of  the  manual 
analysis  process),  and  (3)  problems  associated  with 
labeling  procedures  (measurement  mechanics). 


Problama  Associated  With  Feature  Detection 
and  Characteristics  Determination 

image  product  deficiencies. — Ac  stated  earlier, 
Landsat  data  are  used  for  feature  detection  and 
physical  characteristics  determination.  Thus,  the 
ability  of  Landsat  data  products  to  represent  spatial 
and  spectral  data  clearly  and  accurately  to  the  analyst 
is  of  great  concern.  During  LACIE  Phases  I and  II, 
the  primary  Landsat  data  product  available  to 
analysts  was  the  color-infrared  (CIR)  image  Product 
I . This  image  product  is  a color  composite  of  the  data 
from  three  of  the  Landsat  spectral  bands.  The  three 
spectral  bands  selected  to  produce  this  color  com- 


posite  are  the  green  band  (O.S  to  0.6  micrometer, 
multispectral  scanner  (MSS)  band  4)  assigned  to  a 
blue  color  gun,  the  red  band  (0.6  to  0.7  micrometer, 
MSS  band  S)  assigned  to  a green  color  gun,  and  an  in* 
f rared  band  (0.8  to  1.1  micrometers,  MSS  band  7) 
assigned  to  a red  color  gun.  The  resultant  color- 
composite  image  was  designed  to  simulate  the  type 
of  image  secured  from  conventional  color-infrared 
photographic  imagery  because  analysts  were  most 
familiar  with  that  type  of  image  product.  The  re- 
maining Landsat  infrared  band  (0.7  to  0.8 
micrometer,  MSS  band  6)  is  normally  excluded  from 
the  composite  because  of  the  three-color  limitation 
of  light  additive  systems. 

The  Product  1 image  is  an  effective  format  for  the 
extraction  of  spatial  information,  such  as  feature 
size,  shape,  relationship  to  neighboring  features,  and 
distribution  throughout  the  area.  However,  Product  1 
can  provide  only  gross,  relative  spectral  information 
about  a feature.  While  gross,  relative  spectral  infor- 
mation is  often  sufficient  for  crop  type  identification 
where  multitemporal  analysis  procedures  are  used, 
numerous  situations  were  encountered  in  LACIE 
Phases  I and  II  where  Product  1 did  not  sufficiently 
represent  the  Landsat  spectral  data  to  allow  correct 
crop  type  labeling. 

Frequently,  in  situations  where  there  was  a sparse 
canopy,  as  early  in  the  growing  season,  or  where 
there  was  abundant  vegetative  cover  in  general,  as  in 
humid  regions,  or  where  close  confusion  crops  were 
present  with  the  crop  of  interest.  Product  1 either  did 
not  represent  the  vegetated  field  in  the  manner  nor- 
mally expected  (some  shade  of  red  or  pink)  or  did 
not  show  subtle  spectral  differences  between 
features  that  actually  were  present.  These  problems 
caused  the  analyst  to  (1)  “misinterpret”  vegetated 
fields  as  nonvegetated  fields  on  the  early-season  ac- 
quisitions and  (2)  fail  to  detect  spectral  characteristic 
differences  between  close  confusion  crops. 

Temporal  sampling  rate  deficiencies. — Another 
crop  identification  problem  related  to  feature  charac- 
teristics determination  is  insufficient  temporal  sam- 
pling. As  was  stated  above,  the  temporal-spectral  pat- 
tern throughout  the  growing  season  is  the  most  sig- 
nificant feature  characteristic  for  crop  identification. 
If  this  pattern  is  not  adequately  determined,  there  is 
a greater  probability  of  confusion  among  crop  types. 
Two  causes  of  insufficient  temporal  sampling  which 
lead  to  inadequate  temporal-spectral  pattern  deter- 
mination are  (1)  missing  Landsat  acquisitions  due  to 
cloud  cover  or  other  reasons  and  (2)  periodicity  of 
Landsat  overpasses.  Temporal-spectral  pattern 


changes  that  occur  with  a frequency  of  less  than  18 
days  are  unlikely  to  be  consistently  observed  since 
Landsat  passes  over  a particular  site  every  18  days.  A 
problem  created  by  this  periodicity  is  confounded 
ev  nore  significantly  when  acquisitions  are  lost 
be  a of  cloud  cover  or  other  cases  of  non- 
response. 

Spatial  resolution  deficiencies. — Features  below  the 
resolution  limit  of  the  Landsat  sensors  (approx- 
imately 1 acre)  cannot  be  detected.  Thus,  correct 
crop  identification  with  Landsat-1  and  Landsat-2  for 
fields  smaller  than  1 acre  is  impossible  and  for  fields 
of  up  to  approximately  10  acres  is  improbable.  The 
improbability  of  correctly  identifying  5-  to  10-acre 
fields  is  a function  of  misregistration  between  ac- 
quisitions and  boundary  (mixed)  pixel  problems.  It 
is  necessary  to  determine  fairly  accurately  the 
spectra*  changes  of  a field  over  time.  If  data  points 
representing  a given  ground  location  cannot  be  over- 
laid from  one  acquisition  to  another  with  a fair 
degree  of  precision,  an  accurate  temporal-spectral 
pattern,  and  thus  crop  type,  cannot  be  determined. 

Of  the  feature  detection  and  characteristics  deter- 
mination problems  just  discussed,  the  most  signifi- 
cant problem  was  deficient  Landsat  data  products.  A 
further  discussion  of  the  factors  involved  and  some 
solutions  implemented  in  Phase  III  relative  to  the 
deficient  data  products  will  be  presented  later  in  this 
paper. 


Problems  Associated  With 
Feature  Evaluation 

Most  of  the  remaining  sources  of  error  associated 
with  manual  crop  type  labeling  in  LACIE  were  a 
function  of  insufficient  a priori  knowledge  or  ancil- 
lary data  or  of  nonoptimum  labeling  procedures. 

Insufficient  a priori  knowledge  and  ancillary  data. — 
One  deficiency  in  a priori  knowledge  which  had  an 
effect  on  analysts’  labeling  accuracy,  particularly  in 
the  early  phases  of  LACIE,  was  the  lack  of  adequate 
information  concerning  the  variability  in  the 
temporal-spectral  patterns  of  wheat,  small  grains, 
and  other  crop  types.  A related  deficiency  was  the 
lack  of  adequate  crop  type  temporal-spectral  sepa- 
rability information.  No  specific  information  about 
the  temporal-spectral  patterns  of  crop  types  other 
than  wheat  was  available  to  the  analysts.  These  defi- 
ciencies resulted  in  omission  errors  for  wheat  and 
small  grains.  Incorrectly,  analysts  assumed  less 
variability  in  wheat  temporal-spectral  patterns  than 
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was  actually  present.  Thus,  analysts'  labels  were  con* 
servative  with  respect  to  wheat.  As  the  analysts'  ex* 
perience  in  LACIE  increased,  they  gained  a better 
appreciation  Tor  the  true  wheat  temporal-spectral 
pattern  variability.  However,  additional  variability 
information  was  definitely  needed  in  abnormal  situa- 
tions, such  as  the  occurrence  of  drought  or  winterkill 
or  < her  episodal  events.  Similarly,  without  specific 
information  about  the  temporal-spectral  patterns  of 
crops  other  than  wheat,  analysts  could  not 
''doublecheck’’  their  identifications  by  working  the 
problem  in  reverse.  That  is,  in  addition  to  consider- 
ing the  question,  "Is  this  pixel  wheat?”  the  analyst 
could  have  asked,  (1)  "What  crop  type  is  this  pixel?” 
and  (2)  "What  crop  types  are  definitely  not  repre- 
sented by  this  pixel?”  Elimination  of  candidate  crop 
types  from  consideration  in  the  analysis  often  forces 
the  analyst  to  go  back,  reconsider  his  inital  analysis, 
and  change  his  initial  answer.  However,  since  the 
analyst  did  not  have  the  necessary  data  and  tem- 
poral-spectral variability  information  for  crop  types 
other  than  wheat,  he  could  not  doublecheck  his  ini- 
tial answer.  The  result  was  that  some  wheat  was 
mislabeled  or  omitted. 

Another  problem  that  resulted  in  inconsistent 
labels  among  different  analysts  (analyst  variability) 
was  related  to  the  differing  amounts  and  types  of  a 
priori  knowledge  that  individual  analysts  possessed. 
Each  analyst  had  his  own  unique  background  and  set 
of  interpretation  experiences  upon  which  to  draw. 
This  meant  that  the  quality  and  quantity  of  a priori 
knowledge  was  highly  variable  among  the  analysts. 
Before  the  start  of  operational  interpretation  in 
LACIE,  the  analysts  had  all  undergone  an  extensive 
2-week  training  course  which  was  intended  to  help 
standardize  their  background  and  experience.  After 
the  start  of  LACIE  operational  interpretation, 
however,  it  was  found  that  the  variability  among 
analysts  was  still  greater  than  desired.  Steps  taken  to 
help  remedy  this  situation  are  addressed  later  in  this 
paper. 

Nonoptimum  labeling  procedure.— A large  number 
of  labeling  errors  traced  to  the  analyst  consisted  of 
labels  affixed  to  misregistered  and  boundary  (mixed) 
pixels.  Misregistered  pixels  are  those  that  jump  back 
and  forth  between  one  field  and  an  adjacent  one  on 
successive  acquisitions.  Boundary  pixels  are  mix- 
tures of  the  signatures  from  more  than  one  field.  In 
LACIE,  the  analyst  had  to  affix  a definite  crop  type 
label  (wheat  or  nonwheat)  to  a pixel,  including  the 
boundary  and  misregistered  pixels.  To  do  this,  he 
specified  a reference  acquisition  on  which  he  labeled 


the  pixel.  He  “guaranteed”  the  pixel  label  for  that 
reference  acquisition  only  and  not  for  any  other  ac- 
quisitions. This  led  to  analyst-credited  "mislabels” 
when  the  pixel  label  was  not  appropriate  for  the  ma- 
jority of  the  segment’s  machine-processed  acquisi- 
tions that  were  subsequently  checked  in  accuracy 
assessment. 

This  problem  was  not  significantly  addressed 
while  the  LACIE  experiment  was  being  run. 
However,  current  opinion  is  that  the  problem  can 
probably  be  lessened  or  completely  eliminated  by 
screening  misregistered  and  boundary  pixels  and 
then  labeling  them  in  a different  manner,  since  the 
analyst  has  no  problem  recognizing  and  describing 
these  pixels  as  misregistered  or  boundary  pixels.  The 
problem  is  due  to  the  lack  in  the  current  procedure  of 
an  adequate  labeling  option  to  affix  to  these  pixels. 


MAJOR  RESEARCH  EFFORTS  TO 
IMPROVE  ANALYST  LABELS 

Landsat  Image  Products 
Improved  and  Expanded 

One  of  the  deficiencies  of  Product  1 was  due  to  the 
mapping  function  used  to  transform  the  Landsat 
digital  data  to  image  format.  Each  Landsat  spectral 
band  was  scaled  and  biased  separately  to  enhance 
overall  image  contrast.  This  was  desirable  for  op- 
timum extraction  of  the  spatial  information. 
However,  this  data  mapping  procedure  altered  the 
relationships  between  spectral  bands  such  that  the 
spectral  information  was  very  definitely  distorted. 
Thus,  fields  with  sparse  vegetative  canopy  often 
failed  to  display  the  expected  “red”  tones. 

An  auxiliary  image  product,  called  Product  3 or 
the  Kraus  Product,  was  developed  to  restore  proper 
spectral  band  proportions.  On  Product  3 (fig.  1), 
sparse  canopy  is  represented  in  pale  or  dull  red  col- 
ors. This  is  more  in  line  with  analyst  expectations  of 
characteristic  CIR  image  appearance  for  this  vegeta- 
tive condition.  Product  3,  however,  exhibits  a loss  of 
image  contrast  and  brightness,  which  causes  analyst 
fatigue  when  interpreted  for  sustained  periods.  Thus, 
Product  3 was  used  only  as  an  auxiliary  to  Product  1. 
A description  of  the  manner  in  which  Product  1 and 
Product  3 were  produced  is  contained  in  the  paper  by 
Juday  entitled  “Colorimetric  Consideration  of 
Transparencies  for  a Typical  LACIE  Scene.” 

Another  factor  contributing  to  Product  I limita- 
tions was  that  equal  differences  in  digital  spectral 
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responses  were  not  seen  as  equally  contrasting  color 
differences.  Thus,  the  analyst  could  not  sec  some  of 
the  significant  spectral  differences  between  features 
on  Product  I Figure  2 clearly  demonstrates  this 
problem  In  the  quantization  of  Landsat  data  by  the 
production  film  converter  (PFC).  16  Landsat  values 
are  grouped  as  I PFC  value  Landsat  MSS  band  7 (in- 
frared) data  are  plotted  against  MSS  band  5 (red) 
data  and  the  data  points  in  the  resultant  scatter  plot 
have  been  assigned  the  color  that  those  pixels  would 
possess  in  the  normal  Product  I.  As  can  be  seen  in 
figure  2,  many  pixels  spanning  significant  regions  in 
the  Landsat  spectral  data  space  arc  assigned  the  exact 
same  image  color,  and  thus  spectral  differences  are 
not  detected.  A more  thorough  discussion  of  this 
very  interesting  image  display  problem  is  found  in 
the  paper  by  Juday. 


orthogonal  view  of  the  “hyperplane"  in  which  Land- 
sat  spectral  data  tail  than  is  possible  within  Landsat 
coordinated  space  The  Tassclcd  Cap  Transforma- 
tion defines  a new  set  of  coordinate  axes  referred  to 
as  TCH-I  (brightness).  TCH-2  (greenness),  TC'H-3 
(yellow  stuff),  and  TCH-4  (non-such).  Results  of 
work  at  the  Environmental  Research  Institute  of 
Michigan  (ERIM)  indicate  (hat  TCH-I  corresponds 
to  soil  and  scene  brightness.  TCH-2  corresponds  to 
green  stuf  ( such  as  green  vegetation  w ithin  a scene, 
and  TC'H-3  appears  to  correspond  to  “yellow  or  dry 
vegetation"  within  a scene  TCH-4.  which  contains 
little  data  and  has  no  specific  correlation  to  ground 
conditions,  is  called  “non-such  and  is  not  presently 
used. 


Numeric  and  Graphic  Data  Products 

To  offset  the  limitations  of  image  products  in  ade- 
quately portraying  the  Landsat  spectral  data,  several 
numeric  and  graphic  data  products  were  developed 
for  I.AC  IE  analysts.  I hese  products  were  made 
available  to  analysts  after  Procedure  I implcmenta- 
t.on  in  Phase  III.  The  first  step  in  constructing  any  of 
these  products  was  to  apply  the  Kauth-Thomas 
I asseled  Cap  Transformation  to  the  Landsat  spectral 
data  (see  the  paper  by  Kauth  ct  al  entitled  "Feature 
Extraction  Applied  to  Agricultural  Crops  as  Seen  bv 


HM  RE  2. — Spectral  scatter  plot 
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The  transformed  Landsut  data  were  presented  to 
the  analyst  in  several  different  numeric  and  graphic 
formats.  One  such  product  was  a scatter  plot  that 
allowed  the  analyst  to  compare  the  greenness 
(TCH-2  vertical  axis)  of  a pixel  in  relation  to  its 
brightness  (TCH-1  horizontal  axis).  Figure  3 is  a 
scatter  plot  of  TCH-2  versus  TCH-1  for  an  acquisi- 
tion of  a segment  in  North  Dakota  where  barley  is 
turning  and  wheat  is  still  green.  Scatter  plots  were 
produced  for  each  acquisition  processed  by  the  auto- 
matic classifier,  and  the  data  were  sampled  by  means 
of  a 10-point  by  10-line  grid  placed  over  the 
registered  Landsat  data.  This  produced  a sample  of 
209  pixels  from  the  scene.  From  this  set  of  209  pixels 
were  drawn  the  analyst  starting  and  labeling  (type  1) 
dots  and  the  bias  correction  (type  2)  dots.  The  subset 
of  labeled  sample  pixels  was  displayed  with  the 
corresponding  analyst  labels.  This  allowed  the 
analyst  to  check  quickly  the  consistency  of  his  dot 
labels  (see  the  paper  by  Heydorn  et  al.  entit'cd 
“Classification  and  Mensuration  of  LACIE  Seg- 
ments” for  a discussion  of  within-segment  sampling 
procedures). 

Another  “spectral  aid”  developed  for  Phase  III 
LACIE  was  the  trajectory  plot  (fig.  4).  Again,  TCH-2 
(greenness)  versus  TCH-1  (brightness)  plots  we  e 
used.  However,  each  trajectory  plot  was  the 
multitemporal  history  of  just  1 of  the  209  dots;  that 
is.  the  TCH-2  versus  TCH-1  values  for  the  given  pix- 
el for  all  multitemporal  acquisitions  (currently 
limited  to  four)  were  presented  in  one  trajectory  plot. 
The  points  on  the  plot  were  labeled  in  proper  tem- 
poral sequence,  and  the  analyst  evaluated  the 
dynamic  change  in  spectral  response  through  time 
for  the  pixel.  The  temporal  change  in  spectral 
response  of  a crop  type  is  a very  significant  identify- 
ing characteristic.  The  analyst  was  able  to  compare 
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FltiURE  .1. — Example  of  ait*l>d  ipertril  all  waller  plat. 


FtCtRL  4.— Example  at  trajectory  plan,  (al  Spring  wheat, 
(hi  Non  wheal. 


the  trajectory  plots  of  sample  pixels  to  the  trajectory 
plots  of  various  crop  types  contained  in  keys. 

The  third  and  final  numeric  product  (fig.  S)  made 
available  to  the  analysts  in  Phase  III  was  a simple 
listing  of  the  values  of  green  number  and  brightness 
for  each  of  the  209  sample  pixels  for  each  acquisition 
processed  by  the  classifier. 

The  numeric  and  graphic  data  products  described 
were  available  only  after  the  segment  had  been 
machine  processed.  The  analyst,  therefore,  used 
these  products  to  check  the  consistency  of  his  dot 
labels  after  machine  processing  of  the  segment.  For 
example,  the  analyst  could  change  bias  correction  dot 
labels  (type  2 dots)  after  processing;  such  a change 
could  have  a beneficial  effect  on  the  segment  wheat 
estimate.  Thus,  the  spectral  aids  affected  the  quality 
and  consistency  of  bias  correction  dot  labels  more 
directly  than  the  starting  and  cluster  labeling  dots 
(type  1 dots).  A fuller  discussion  of  these  products  is 
presented  in  the  paper  by  Aboueen  entitled  “Image 
and  Numerical  Display  Aids  for  Manual  Interpreta- 
tion.” 

in  LACIE  Phase  III.  N ASA  and  Lockheed  Elec- 
tronics Company  did  a study  in  which  the  spectral 
aids,  particularly  the  scatter  plots,  were  used  to  sepa- 
rate spring  wheat  from  the  other  spring  small  grains 
in  North  Dakota,  most  of  which  were  barley.  As  has 
been  established  in  crop  separability  analysis,  barley 
matures  and  turns  golden  sooner  than  spring  wheat. 
On  optimally  timed  acquisitions  around  the  wheat 
soft-dough  and  barley  turning  biostage,  barley  will 
appear  less  “green"  (lower  TCH-2  green  number) 
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and  more  bright  (higher  TCH-1  brightness  value) 
than  spring  wheat.  In  a scatter  plot  for  this  optimally 
timed  acquisition,  barley  will  fall  below  and  to  the 
right  of  spring  wheat  (fig.  3).  Thus,  the  spectral  aids 
can  greatly  enhance  analysis  of  subtle  spectral 
differences.  Note,  however,  the  emphasis  placed  on 
optimally  timed  acquisitions. 


A Priori  Knowledge  Expanded 
and  Standardized 

Interpretation  keys. — After  the  start  of  LAC1E 
operations,  it  was  apparent  that  some  additional 
analyst  training  and  standardization  was  still  needed. 
In  response  to  this  need,  selective  interpretation  keys 
were  compiled  and  made  available  to  the  analyst  (see 
the  paper  by  Baron  et  al.  entitled  "Analyst  In- 
terpretation Keys").  The  keys  were  intended  to  help 
incorporate  the  analysts'  experiences  and  standard- 
ize a priori  knowledge  concerning  wheat  and  Landsat 
data  analysis.  It  was  hoped  that  the  keys  would  help 
minimize  variability  in  crop  identification  (labeling). 
Furthermore,  the  keys  aided  and  hastened  the  train- 
ing of  new  analysts  who  joined  the  project  after  the 
initial  pool  of  analysts  was  selected  and  trained. 

The  interpretation  keys  were  compiled  in  two 
volumes.  Volume  I contained  introductory  material 
with  information  concerning  the  general  analysis  of 
Landsat  imagery  and  ancillary  date  for  the  identifica- 
tion of  wheat.  Volume  II  consisted  of  Landsat  image- 
ry showing  examples  of  wheat  development  within 
specific  geographic  regions.  In  this  way.  regional  ec- 
centricities or  problems  that  affected  the  wheat  tem- 
poral-spectral response  pattern  couH  be  efficiently 
presented  to  the  analyst. 

Crop  separability  studies.— When  analysts  began 
processing  segments  from  spring  wheat  aieas.  it  was 
quickly  determined  that  the  labeling  of  spring  wheat 
versus  other  spring  small  grains  was  very  unreliable. 
The  analysts  had  no  guidelines  to  help  them  separate 
the  spring  wheat  from  all  other  spring  small  grains. 
This  problem  of  small-grains  separability  was  most 
significant  in  the  spring  wheat  segments  because  sig- 
nificant proportions  of  other  spring  small  *i:*ins 
were  grown  along  with  the  wheat.  This  was  tny  the 
usual  case  in  the  winter  wheat  situation  Since 
LACIE  was  originally  intended  to  be  a wheat  inven- 
tory system  as  opposed  to  a small-grains  inventory 
system,  research  was  undertaken  to  determine 
whether  there  were  consistent  temporal  or  spectral 
characteristics  that  could  be  identified  and  measured 


from  Landsat  to  enable  the  analyst  to  make  the 
needed  distinctions. 

This  work  was  cart  ied  out  primarily  at  ERIM,  the 
Laboratory  for  Applications  of  Remote  Sensing 
(LARS),  and  the  NASA  Johnson  Space  Center 
(JSC).  The  result*  of  this  research  were  not  available 
until  near  the  end  of  the  LACIE  experiment  and 
thus  did  not  affect  LACIE  operational  procedures. 
However,  JSC  in-house  tests  were  conducted  into 
procedural  analysis  changes  that  required  the  analyst 
to  distinguish  spring  wheat  fron  all  other  small 
grains  present.  The  direct  wheat  evaluations  were  en- 
couraging and  discouraging  at  the  same  time.  It  was 
found  that  spring  wheat  and  spring  barley  did  differ 
from  each  other  in  their  temporal  characteristics. 
Barley  fairly  consistently  matured  faster  and  sooner 
than  wheat.  Thus,  barley  would  start  to  turn  before 
wheat  did.  The  discouraging  aspect,  however,  was 
that  a Landsat  acquisition  was  needed  at  this  critical 
barley-turning/wheat  soft-dough  stage.  It  was  found 
that  acquisitions  at  this  critical  time  were  often  miss- 
ing or  improperly  timed.  So,  while  spring  wheat  and 
barley  could  theoretically  be  separated  from  each 
other,  they  could  not  be  separated  consistently  from 
one  segment  to  the  next  because  of  the  need  for  the 
often-missing  critically  timed  barley-turning  acquisi- 
tion. 


Procedural  Modifications 

The  most  significant  procedural  modification  tc 
be  developed  in  LACIE  was  Procedure  1.  which 
relieved  the  analyst  of  the  responsibility  for  spectral 
class  definition.  This  was  accomplished  by  the  use  of 
clustering.  Also  in  Procedure  1,  the  analyst  labeled 
individual,  randomly  selected  pixels  from  a 
systematic  grid  instead  of  labeling  fields  that  he  had 
previously  delineated.  Procedure  l reduced  analyst- 
segment  interaction  time  by  65  percent,  thus  increas- 
ing throughvj  and  decreasing  turnaround  time.  A 
more  complete  description  and  discussion  of  Pro- 
cedure 1 can  be  found  in  the  paper  by  Heydorn  et  al. 

One  innovation  incorporated  in  Procedure  1 was  a 
change  in  the  use  of  the  classifier  output.  In  the  old 
Fields  Procedure,  the  wheat  acreage  from  the 
classifier  was  treated  in  the  traditional  remote-sens- 
ing manner  as  the  final  estimate  for  the  segment. 
However,  Procedure  I recognized  that  there  was  bias 
in  the  classification  and  therefore  used  the  classifier 
output  as  the  stratification  to  be  used  in  a stratified 
sampling  scheme.  Analyst  labels  for  bias  correction 


dots  (type  2 dots)  were  now  used  in  connection  with 
the  stratification  produced  by  the  classifier  to  pro* 
duce  the  final  estimate  for  the  segment. 

Currently  in  Ptocedure  1,  all  pixels  are  clustered 
and  assigned  labels  according  to  a nearest  neighbor 
rule  on  the  basis  of  a limited  number  of  analyst  labels 
(type  I dots).  All  pixels  sre  then  processed  through  a 
maximum  likelihood  dassiflfcr  for  assignment  to  a 
wheat  or  nonwheat  stratum.  One  supporting  research 
study  has  been  undertaken  to  evaluate  alternative 
methods  of  producing  the  stratification  used  in  Pro- 
cedure I.  An  alternative  procedure  developed  by  the 
University  of  California  at  Berkeley  (UCB)  for  pro- 
ducing the  crop  type  stratification  is  tailed  the  Delta 
Function  Stratification  Procedure.1  This  procedure 
utilizes  an  indicator  of  the  temporal  pattern  of  the 
vegetation,  this  indicator  is  produced  by  ratioing 
Landsat  MSS  band  7 (infrared)  with  Landsat  MSS 
band  S (red)  to  assign  clusters  to  a crop  type  stratum. 
The  strata  produced  from  this  procedure  (usually 
five  or  six)  are  then  "bias  corrected*’  according  to 
standard  Procedure  1 methods.  Advantages  of  the 
alternative  stratification  procedure  arc  ( I ) a potential 
decrease  in  analyst  segment-handling  time  due  to  a 
decrease  in  the  number  of  >ixels  requiring  labels  (2) 
more  accurate  labeling  ol  clusters,  (3)  a decrease  in 
the  amount  of  computer  processing  time  required 
per  segment  achieved  by  eliminating  the  maximum 
likelihood  processing  step,  and  (4)  the  capability  to 
extend  the  procedure  to  crops  other  than  wheat. 
Tests  of  this  alternative  stratification  procedure  indi- 
cate that  it  prodixes  results  that  are  comparable  to 
and  not  statistically  significantly  different  from  cur- 
rent Procedure  I results.  The  alternative  stratifica- 
tion procedure  is  undergoing  further  evaluation  and 
has  not  yet  been  evaluated  in  u test  on  the  scale  of 
LACIE  operations. 


Automatic  Crop  Labeling— Tho  Future 

The  foregoing  discussion  has  addressed  the 
manual  labeling  of  crop  types  from  Landsat  dau. 
The  potenticl  of  automating  these  crop-labeling  pro- 


*C. M liny  et  at.  “Development  of  Technique*  (or  Producing 
Sutk  Strata  Map*  and  Development  of  Phctotntcrprctation 

Method*  Saicd  on  Multttemporal  Land*ai  Data."  Annual  Report. 
NASA  Contract  NAS9-I4$*5  (R,  N.  Colwell,  Principal  Invcittf*- 
tort.  Space  Science*  Laboratory.  Seric*  19.  I**uc  I.  Univeruty  of 
California  at  Berkeley . Dec  1977. 


cedures  will  now  be  briefly  addressed.  There  it  a 
study  in  progress  within  the  supporting  research 
community  which  has  as  its  objective  the  develop* 
men t of  an  automated  or  computer-aided  labeling 
procedure.  The  motivation  for  such  a study  is  to 
further  decrease  variability  in  dot  labels  due  to 
differences  between  individual  analysts.  In  addition, 
ii  is  desirable  to  have  an  estimate  of  the  reliability  of 
each  dot  lobe!;  i.e.,  the  probability  that  a given  label  is 
correct. 

The  automatic  labeling  procedure  being  developed 
and  tested  is  called  LIST  (Label  Identification  from 
Statistical  Tabulation)  (see  the  paper  by  Pore  and 
Abotteen  entitled  “A  Programed  Labeling  Approach 
to  Image  Interpretation").  Three  types  of  informa- 
tion are  input  to  the  procedure:  ( I ) spatial  informa- 
tion provided  by  manual  analysis  of  Landsat  data; 
(2)  Landsat  spectral  information  which  is  automat- 
ically sampled;  and  (3)  ancillary  information  com- 
piled from  meteorological  data  and  other  ancillary 
data  types  described  earlier.  The  questions  an  analyst 
must  answer  for  input  to  LIST  and  the  automated 
questions  are  presented  in  table  I.  Presently,  the  pro- 
cedure is  “trained"  on  an  area  for  which  ground  data 
are  available.  Relative  weights  for  the  input  variables 
(answers  to  input  questions)  are  determined  by 
statistical  analysis  of  these  ground-observed  are  •$. 
The  ' trained"  LIST  procedure  is  then  applied  to 
areas  without  benefit  of  ground  data.  Initial  test 
results  (table  II)  are  comparable  with  results  from 
analyst-labeled  dots.  Boundary  and  misregistered 
pixels  were  screened  from  the  test  so  the  test  results 
are  for  “pure"  pixels  only.  As  the  decision  logic  for 
specific  crop  identification  becomes  better  defined,  it 
too  may  be  automated.  Automated  or  partially  auto- 
mated crop-labeling  procedures  can  enhance  opera- 
tional crop  inventory  systems  in  that  manual 
analysis  inputs  can  be  minimized.  The  analyst  can  be 
freed  from  repetitive  analysis  tasks,  and  procedures 
can  be  more  nearly  standardized  to  reduce  measure- 
ment procedure  variability. 


SUMMARY 

In  the  simplest  terms,  manual  identification  of 
crop  type  consists  of  two  components.  The  first  is 
feiturc  detection  and  physical  characteristics  deter- 
mination. A feature  of  interetl  in  LACIE  is  an 
agricultural  field.  The  second  component  is  feature 
identification  or  labeling.  The  data  utilized  for 
feature  detection  arc  Landsat  data.  The  data  ncccs- 
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Table  I.— LIST  Questions 


Question 


Response 


Analyst-interpreter  questions 


1.  Is  pixel  clearlyin  non- 
agricultural  area? 


2.  Is  pixel  registered  with  regard 

to  other  dates  (i.e.,  in  the 
same  category  on  all  four 
dates)? 

3.  Is  pixel  a mixed  pixel  (part  of 

more  than  one  field  or 
boundary)? 

4.  Is  this  an  anomalous  pixel 

(not  representative  of 
most  of  the  other  pixels 
within  the  field)? 

5.  PFC  vegetation  canopy  in- 

dication is 

(Use 

all  available  imagery  film 
types.) 


(I)  Yes.  Stop. 

(Blank)  Agricultural  area  or 
indeterminate 

(I)  No.  Stop. 

(Blank)  Yes  or  indeterminate 


(I)  Yes. Stop. 

(Blank)  No  or  indeterminate 


(1)  Yes.  Stop. 

(Blank)  No  or  indeterminate 


(0)  No  vegetation  canopy 

( 1 ) Low-density  green  vegeta- 

tion canopy 

(2)  Medium-density  green 

vegetation  canopy 

(3)  High-density  green  vegeta- 

tion canopy 

(4)  Senescing  (turning) 

vegetation  canopy 

(5)  Harvested  canopy  (stub- 

ble) 


sary  for  feature  identification  are  a priori  knowledge 
and  ancillary  data.  That  is,  only  feature  existence  and 
information  about  the  physical  characteristics  of  the 
feature  can  be  extracted  from  Landsat.  The  informa* 
tion  that  allows  the  correlation  of  feature  charac* 
teristics  to  a specific  type  of  feature  (e.g.,  a wheat 
Held)  comes  from  a priori  knowledge  and  ancillary 
data.  Landsat  data  contain  significant  information 
which,  in  conjunction  with  ancillary  data,  can  allow 
quite  sophisticated  analyses  to  be  performed.  The 
format  in  which  Landsat  data  are  presented, 
however,  does  significantly  affect  the  usefulness  of 
the  information.  Two  types  of  information  are  ex- 
tracted from  the  Landsat  data— spatial  information 
and  temporal-spectral  information.  Originally,  the 
sole  format  for  Landsat  data  in  LACIE  was  an  image 
called  Product  1.  While  this  format  was  optimal  for 
the  extraction  of  spatial  information,  it  was  not  op- 
timal for  the  extraction  of  precise  spectral  informa- 
tion. Indeed,  Product  1 distorted  the  spectral  data 
and  led  to  labeling  problems  for  analysts.  In  answer 
to  the  Landsat  spectral  data  format  problem,  the 
research  community,  working  closely  with  LACIE 
operations  personnel,  developed  numeric  and 
graphic  formats  for  analyst  “spectral  aids”  which 
were  more  optimal  for  the  extraction  of  spectral  in- 
formation from  Landsat  data. 

Crop  type  identification  is  possible  because  of 
relatively  unique  temporal-spectral  patterns  associ- 
ated with  timing  differences  of  certain  phenologies! 


A utomated  questions 


1.  Robertson  biostages  for 

winter  and  spring  wheat, 
respectively 

2.  Green  number  of  pixel  (cor- 

rected to  60°  incidence) 

3.  Is  green  number  in  the  small- 

grains  range? 

4.  Brightness  number  of  pixel 

5.  Winter  and  spring  principal 

component  greenness 
(PCG)  statistics,  respectively 


A utomated  analyst-interpreter  keys 


1.  Is  the  vegetation  indication  of 

the  pixel  (using  all  avail- 
able product  types)  valid  for 
the  Robertson  biostage  of 
wheat  for  the  acquisition? 

2,  Does  the  pixel  follow  a small- 

grains  vegetation  canopy 
development  pattern? 


Table  II. — LIST  Test  Results a 


Labeling 

Omission  error.  Commission  error. 

procedure 

percent 

percent 

Winter  small-grains  sites 

Analyst 

18 

13 

LIST 

17 

15 

Spring  small-grains  sites 

Analyst 

50 

29 

LIST 

53 

39 

'Four  ininini  tnd  four  test  segments  tor  etch  site. 
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. growth  stages  of  different  crop  types.  Adequate  sam- 
» pling  through  time  of  the  spectral  response  of  given 
fields  is  necessary  to  identify  crop  types  reliably. 
Thus,  missing  Landsat  acquisitions  because  of  cloud 
cover  have  frequently  had  a significant  impact  on 
crop-labeling  accuracies  in  LACIE.  Also,  optimal 
timing  of  the  acquisitions  is  critical  to  the  separation 
of  closely  related  crops  such  as  wheat  and  barley. 
Temporal  differences  between  closely  related  crops 
are  subtle  and  are  observable  only  within  limited 
time  periods.  Since  only  one  Landsat  was  used,  tem- 


poral sampling  was  limited  to  18-day  intervals;  this 
periodicity  did  not  allow  for  consistent,  reliable  sepa- 
rations between  closely  related  wheat  and  barley. 

Automated  and  partially  automated  crop-labeling 
procedures  were  developed,  and  initial  testing  dem- 
onstrates a significant  potential  for  such  procedures. 
Automated  procedures  offer  decreased  variability  in 
crop  type  labels,  reduced  manual  analysis  require- 
ments in  operational  crop  inventories,  and  increased 
measurement  reliability  information  about  crop  type 
labels. 
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LACIE  Analyst  Interpretation  Keys 


J.  G.  Baron, a R.  W.  Payne,0  and  W.  F.  Palmer0 


INTRODUCTION 

The  Analyst  Interpretation  (AI)  Keys  were  pre- 
pared within  the  Large  Area  Crop  Inventory  Experi- 
ment (LACIE)  for  incorporation  into  the  Classifica- 
tion and  Mensuration  Subsystem  (CAMS)  Detailed 
Analysis  Procedures.  They  were  developed  and 
tested  during  Phase  II  of  LACIE  (1976)  and  imple- 
mented during  Phase  III  (1977). 

The  Classification  and  Mensuration  Subsystem  of 
LACIE  was  responsible  for  using  Landsat  data  to 
determine  the  proportion  of  wheat  in  each  sample 
segment  required  by  the  Crop  Assessment  Sub- 
system (CAS).  Analysts  within  CAMS  used  Landsat 
color-composite  images,  crop  calendars,  ancillary 
data  such  as  historical  statistics,  and  computer-gener- 
ated spectral  plots  and  cluster  maps  to  identify  a 
subset  of  the  picture  elements  (pixels).  This  iden- 
tification formed  the  basis  for  a machine  classifica- 
tion of  the  entire  segment  and  calculation  of  the  pro- 
portion of  wheat.  Identification  of  the  pixels  was  ac- 
complished by  comparing  their  characteristics  with 
known  signatures  (physical/cultural  features  or  pat- 
terns of  features  which  allow  a particular  crop  type  Uf 
be  recognized  on  imagery).  These  signatures  of 
wheat  and  other  crops  are  described  and  documented 
in  the  Analyst  Interpretation  Keys. 


Objectives  of  the  Keys 

The  major  objectives  of  the  AI  Keys  were  to  im- 
prove accuracy  and  efficiency  by  minimizing 
variance  in  crop  identification,  to  disseminate 
analyst  experience  gained  in  LACIE,  to  accelerate 
the  training  of  new  analysts  or  those  new  to  specific 
geographic  areas,  to  maintain  an  interpretation  infor- 
mation base,  and  to  provide  a documentation  format 
that  would  enable  easy  updating.  Another  important 


aLockheed  Electronics  Company,  Houston,  Texas. 


objective  was  to  provide  the  analyst  with  a better  un- 
derstanding of  the  expected  ranges  in  color  variation 
of  signatures  for  individual  biostages  and  of  temporal 
sequences  of  Landsat  signatures.  Since  signatures 
within  these  images  are  affected  by  image  processing 
and  environmental  conditions  as  well  as  changing 
crop  conditions,  absolute  color  matching  is  not 
usually  a reliable  means  of  identification.  However, 
relative  similarities  or  differences  of  signatures  and 
temporal  sequences  are  useful  and  this  usefulness  is 
being  increased  by  such  new  technology  as  haze  cor- 
rection. 

Since  crop  discrimination  in  LACIE  is  sometimes 
dependent  on  the  somewhat  subjective  interpreta- 
tion of  data  ancillary  to  actual  multidate  satellite  im- 
agery, the  construction  of  detailed  decision  logic  has 
been  an  elusive  but  important  objective.  General  in- 
terpretation decision  logic  (fig.  1)  without  docu- 
mented signature  variability  and  temporal  signature 
sequences  for  small  grains  has  been  used  and  im- 
proved throughout  LACIE.  In  this  context,  the  AI 
Keys  initially  expanded  and  illustrated  the  logic  re- 
lated to  the  use  of  LACIE  Product  1 (the  primary  im- 
agery product— color-composite  imagery  of  bands  1, 
2,  and  4)  and  treated  ancillary  data  (other  than  crop 
calendars)  as  supplementary  information  for  deci- 
sions. 


Background 

The  first  key  used  in  LACIE  was  the  “Wheat 
Identification  Aid  for  Image  Interpreters"  (ref.  1).  It 
was  developed  in  June  1974  using  the  limited 
amount  of  data  (Landsat  and  ground  observations) 
available  at  that  time.  These  data  were  from  two  in- 
tensive test  sites  (ITS's):  Hill  County,  Montana,  and 
Swift  Current,  Saskatchewan.  An  appendix  contain- 
ing five  additional  sites  was  added  in  1975.  This 
document  proved  to  be  very  beneficial  in  the  training 
of  analysts  new  to  the  CAMS  environment. 
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FIGURE  1.— Decision  logic  for  small-grains  identification. 


In  August  1975,  the  Acreage  Estimation  Technical 
Review  Team1  suggested  that  additional  keys  be 
developed  to  improve  the  accuracy  and  consistency 
of  interpretation.  This  recommendation  was  subse- 
quently endorsed  by  LAC1E  project  management 
and  the  LACIE  Keys  Design  and  Planning  Working 
Group2  was  organized  in  January  1976. 

It  was  apparent  to  the  LACIE  Keys  Design  and 
Planning  Working  Group  that  the  interpretation 
methodology  that  had  evolved  within  CAMS  and  the 
combination  of  imagery,  ancillary  data,  and  ground 
observations  already  found  beneficial  in  LACIE 
would  be  the  basis  of  a timely,  specific  key.  A review 


'The  Review  Item  Disposition  (RID)  was  initiated  for  the 
team  by  William  Anderson  of  the  Earth  Resources  Observation 
Systems  (EROS)  Data  Center,  Sioux  Falls,  South  Dakota. 

2The  LACIE  Keys  Design  and  Planning  Working  G>”'up  con- 
sisted of  personnel  from  Lockheed  Electronics  Company  (LEC) 
and  Lockheed  Missiles  and  Space  Company  (LMSC)  and  two  in- 
dependent consultants,  Robin  Welch  (Texas  A & M University) 
and  Joseph  Clifton  (U  S.  Department  of  Ag.  culture,  retired). 


of  interpretation  keys  in  general  use  had  provided  lit- 
tle insight  into  a design  that  would  be  applicable  in 
LACIE. 

The  first  step  the  LACIE  Keys  Design  and  Plan- 
ning Working  Group  took  was  to  examine  factors 
that  would  affect  the  identification  of  design  require- 
ments. These  factors  and  their  implications  for  the 
design  were  stated  as  follows. 


Experimental  factors 

1.  The  Data  Acquisition  Sys- 

tem does  not  produce 
unambiguous  signatures. 

2.  Correct  identifications 

have  required  ancillary 
data  for  evaluation  of 
spatial  and  temporal  sig- 
nature variability. 


Design  implications 

1.  Training  analysts  on 

nominal  signatures  alone 
will  not  assure  accuracy 
or  consistency 

2.  Correlation  of  temporal  im- 

age clues,  crop  calendars, 
and  other  data  is  re- 
quired for  acceptable  ac- 
curacy and  consistency. 


The  design  requirements  resulting  from  the 
analysis  were  then  identified  as  the  following. 

1.  Both  nominal  signatures  and  signature 
variability  must  be  recognized  and  understood. 
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2.  Explicit  logic  is  needed  for  best  use  of  crop 
calendars  to  identify  nominal  signatures. 

3.  Ancillary  data  should  be  designed  to  cue 
analysts  to  sources  of  possible  signature  variability. 

4.  Reference  imagery  should  be  provided  for  seg- 
ments and  sites  that  are  most  representative  of 
specific  crops  and  that  show  relatively  homogeneous 
signatures. 

5.  Reference  imagery  on  all  biostages,  indexed  to 
crop  calendar,  should  bo  provided  for  reference  sites. 

6.  Ground  truth,  where  available,  should  be  used 
for  verification  of  the  keys. 

7.  The  selective  key  approach  should  be  used. 

With  respect  to  item  7,  interpretation  keys  can  be 

categorized  into  two  basic  types:  elimination  keys 
and  selective  keys.  An  example  of  the  elimination 
key  is  the  dichotomous  key  in  which,  at  each  step  in 
the  analysis  process,  the  various  categories  of  objects 
are  divided  into  two  groups,  on  the  basis  of  some 
characteristic  that  should  be  visible  on  the  imagery. 
Each  succeeding  step  subdivides  the  remaining 
group  until  the  object  of  interest  is  correctly  iden- 
tified (ref.  2).  This  type  of  key  usually  works  oest 
when  the  interpretation  task  is  straightforward. 
Because  of  the  broad  range  of  variability  in  wheat  sig- 
natures and  the  sometimes  confusing  nature  of  the 
low-resolution  Landsat  imagery,  this  type  of  key  was 
not  practical  for  LACIE. 

A selective  key  usually  consists  of  illustrations 
and  descriptions  of  the  objects  of  interest  and  is 
designed  for  comparative  analysis.  The  analyst 
selects  from  the  key  the  example  that  most  clearly 
represents  the  ground  cover  or  the  object  to  be  iden- 
tified. This  type  of  key  is  well  suited  to  crop  iden- 
tification, and  it  was  determined  to  be  better  for  the 
difficult  interpretation  task  in  LACIE. 

It  was  planned  to  develop  (1)  a basic  or  introduc- 
tory volume  (Analyst  Interpretation  Keys,  Volume 
I — Image  Analysis  Guide  for  Wheat/Small  Grains 
Inventories)  to  provide  nominal  wheat  signatures  oc- 
curring in  Landsat  imagery  and  the  signature 
variability  caused  by  environmental  influences  and 
regional  agricultural  practices  and  (2)  supplementary 
volumes  containing  regional  selective  keys  for  the 
United  States  and  Canada,  the  U.S.S.R.,  and  other 
countries  to  present  annotated  Landsat  imagery  on 
all  biophases  for  selected  reference  sites,  descriptions 
of  wheat  cropping  areas,  and  structured  interpreta- 
tion logic  based  on  crop  calendars.  Two  important 
factors  relating  to  the  design  of  the  keys  were  project 
decisions  (1)  to  engage  in  a large  ground-truth  collec- 
tion program  throughout  the  U.S.  Great  Plains  (blind 


sites)  and  (2)  to  develop  methods  to  “partition"  this 
same  region  into  homogeneous  areas  where  certain 
factors  relating  to  wheat  acreage  and  yield  would  be 
grouped. 

Ground  observations  were  being  routinely  col- 
lected at  29  ITS’s  (United  States  and  Canada),  and 
these  data  plus  the  1976  Mind-site  ground  observa- 
tions would  be  available  to  verify  Landsat  signatures 
in  the  initial  version  of  the  AI  Keys.  Along  with 
these  ground  data,  2 years  of  Landsat  imagery  would 
be  “in  house"  for  construction  of  the  document. 
Adequate  ground  truth  and  a sequence  of  good  Land- 
sat acquisitions  over  each  test  site  to  be  considered  as 
a “reference”  site  were  of  highest  importance. 

The  initial  partitioning  of  U.S.  and  Canadian 
wheat-growing  regions  by  project  personnel  was 
scheduled  for  completion  in  1976,  and  since  this 
coincided  with  the  keys  development  schedule,  this 
version  was  incorporated  into  the  keys  design.  (Later 
studies  have  resulted  in  slightly  different  delinea- 
tions of  homogeneous  U.S.  Great  Plains  regions.) 

Additionally,  the  design  of  the  keys  was  in- 
fluenced by  the  following  considerations. 

1.  The  distribution  of  keys  within  a country 
should  be  adequate  to  document  the  major 
geographic  differences  in  crop  signatures. 

2.  The  number  of  keys  within  a region  would 
result  from  balancing  the  objectives  of  documenting 
smaller  variations  in  signatures  and  retaining  a con- 
venient size  for  the  AI  Keys. 

3.  Fields  used  to  illustrate  the  signatures  of  crops 
should  be  dated  in  terms  of  crop  development  in- 
stead of  calendar  date  to  minimize  reflectance 
differences  due  to  the  intrascene  range  of  planting 
dates  and  the  geographic  differences  in  average 
planting  date. 

4.  Imagery  used  in  the  AI  Keys  must  closely  ap- 
proximate operational  imagery  in  resolution,  color 
balance,  and  scale. 

For  the  best  use  of  project  resources,  the  following 
items  were  considered  to  be  desirable. 

1.  The  existing  data  collection  and  processing 
systems  would  be  used,  if  possible,  to  ensure  exten- 
sive and  continuous  data  flow  for  development  and 
updating  purposes. 

2.  The  keys  development  would  be  closely  tied  to 
CAMS  operations  so  that  technically  accepted  pro- 
ducts and  procedures  could  be  incorporated,  testing 
could  be  conducted  with  analysts  in  a realistic  opera- 
tional environment,  and  retraining  of  analysts  during 
the  implementation  phase  would  be  minimized. 

The  design  from  this  working  group  was  pre- 
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sented  to  and  accepted  by  the  Image  Analysis  Keys 
Team3  at  the  March/ April  1976  Project  Review.  Sug- 
gestions made  by  this  team  at  this  time  and  at  subse- 
quent reviews  were  incorporated  into  the  develop- 
ment of  the  keys.  Volume  I and  Volume  II  (Analyst 
Interpretation  Keys— United  States  and  Canadian 
Great  Plains  Regional  Keys)  were  developed,  tested, 
and  used  for  the  first  time  during  LACIE  Phase  III 
(1977-78  crop  year). 


ANALYST  INTERPRETATION  KEY8 
DESCRIPTION 

Detailed  descriptions  of  the  ‘imago  Analysis 
Guide  for  Wheat/Small  Grains  Inventories,”  Volume 
I of  the  Al  Keys,  and  Volume  II,  “United  States  and 
Canadian  Great  Plains  Regional  Keys,”  are  pre- 
sented in  the  following  two  sections. 


Volume  I,  ‘image  Analysis  Guide  for 
Wheat/Small  Grains  Inventories” 

Volume  I is  a synoptical  key  and  basic  informa- 
tion source  for  agricultural  interpretation  with  an 
emphasis  on  small-grains  identification.  This  volume 
is  designed  to  help  the  analyst  to  recognize  not  only 
the  normal  spectral  signatures  of  wheat  cultivation 
and  phenology  but  also  the  range  of  variations  from 
the  normal  that  occur  and  to  understand  the  reasons 
for  their  occurrence.4  Volume  I is  divided  into  five 
major  sections;  the  content  of  each  is  discussed 
herein. 

Section  1,  entitled  “Introduction,”  gives  the 
organization  and  use  of  the  keys,  the  system  objec- 
tives of  LACIE,  and  the  image  interpretation  objec- 
tives of  LACIE.  The  principal  wheat-growing  regions 
of  the  world  are  described  and  the  necessity  of  a 
global  crop  inventory  using  remote  sensing  is  dis- 
cussed. 


3The  Image  Analysis  Keys  Team  was  composed  of  the  follow- 
ing: J.  G,  Baron.  R.  W.  Payne,  and  W.  F.  Palmer  (LEC);  W.  E. 
Hensley  and  L.  C Wade  (NASA  Johnson  Space  Center); 
W.  Draeger  anJ  W.  Anderson  (EROS  Data  Center);  J.  Lunch 
(Central  Intelligence  Agency);  R.  I.  Welch  (Texas  A & M Univer- 
sity); and  W.  Williamson  (LMSC). 

^his  volume  includes  or  is  based  on  information  compiled  or 
developed  by  the  Earth  Satellite  Corporation  under  a previous 
contract.  This  information  includes  a substantial  portion  of  the 
textual  material  in  Section  2 of  Volume  1. 


• “The  Landsat  Data- Acquisition  System  and  MSS 
Image  Products,”  Section  2,  presents  a detailed  dis- 
cussion of  color,  color-infrared  (CIR),  and  black- 
and-white  photography.  Particular  emphasis  is 
placed  on  the  image  characteristics  of  CIR  film  and 
its  application  to  vegetation  analysis.  The  techniques 
used  to  produce  simulated  CIR  imagery  using  the 
data  acquired  by  the  Landsat  multispectral  scanner 
(MSS)  are  discussed.  An  example  of  spectral  reflec- 
tance data  for  winter  wheat  gathered  by  the  LACIE 
Field  Measurements  Project  is  depicted.  The  spectral 
bandwidths  imaged  by  the  MSS  are  annotated  on  the 
winter  wheat  spectral  curves  for  comparative  pur- 
poses (fig.  2).  In  addition,  aerial  photographic  views 
of  several  small-grain  fields,  both  in  color  and  CIR, 
are  included  to  illustrate  the  signature  responses  to 
be  expected  from  the  two  types  of  imagery.  A com- 
prehensive overview  of  Landsat  operations  and  of 
the  MSS  and  its  output  products  is  presented. 
Schematics  of  the  overall  Landsat  system,  the  MSS 
system,  and  the  Landsat  groundtracks  for  a typical 
day  are  included  to  provide  background  data  for  the 
reader.  Additional  diagrams  in  this  section  illustrate 
seasonal  changes  in  solar  elevation  and  the  relation- 
ship of  the  U.S.  winter  wheat  belt  to  the  various  solar 
elevations.  The  processing  sources  of  image  tonal 
variations  in  both  aerial  photography  and  Landsat 
imagery  are  also  discussed  in  this  section. 

Section  3,  “Identification  of  Wheat/Small  Grains 
Cultivation  on  Landsat  Imagery,”  addresses  the 
problems  of  detection,  recognition,  and  identifica- 
tion of  crop  cultivation  on  the  imagery  produced 
from  the  Landsat  MSS  data.  This  is  accomplished  by 
a comprehensive  description  of  the  photophenology 
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FIGURE  2. — Typical  reflectance  spectra  far  dryland  Eagle 
variety  winter  wheat  in  various  stages  (Garden  City,  Kansas; 
1975-76  growing  year). 
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Winter  wheat,  light  pink,  tillering,  20  cm  (8  in. I high 
Barley,  light  green,  emergence,  5 cm  (2  in.)  high 
Spring  wheat,  medium  green,  emergence,  5 cm  (2  in.)  high 


Winter  wheat,  pink,  booting,  41  cm  (16  in.)  hign 

Barley,  light  pink,  tillering,  20  cm  (8  in.)  high 

Spring  wheat,  light  pink/green,  tillering,  10  cm  (4  in.)  high 


mm 


Jl 

1 

p 

117  - Winter  wheat,  fully  headed.  76  cm  (30  in.)  high  117  - Winter  wheat,  light  green,  ripening.  81  cm  (32  in.)  high 

121  — Barley,  orange,  beginning  to  head,  46  cm  (18  in.)  high  121  — Barley,  orange,  fully  headed,  71  cm  (28  in.)  high 

186  — Spring  wheat,  orange,  booting,  25  cm  (10  in.)  high  186  - Spring  wheat,  orange/olive,  fully  headed,  81  cm  (32  in.) 

high 


<■  ’*  V.  r\l  V» 

mm 


1 1 7 — Winter  wheat,  white,  windrowed 

121  - Barley,  pink,  ripening 

186  — Spring  wheat,  beginning  to  ripen 


117  — Winter  wheat,  dark  green,  fallow 
121  — Barley,  medium  green,  fallow 
186  — Spring  wheat,  cloud  covered 


Kit.!  Kl  .1. — Sample  segment  images  of  the  Toole  County,  Montana,  intensive  lest  site  on  six  successive  dates,  (a)  June  It).  1975.  lb) 
June  28.  1975.  (r)  July  1A.  1975.  Id)  August  .1.  1975.  (e)  August  21.  1975.  (D  October  M.  1975. 
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3.  E mergence 

7.  Windrowed  (harvest) 

Preemergence  (spring  wheat) 


Preemergence  (spring  wheat) 


Kl(il  KK  4. — Multistage  wheat  pholuphenolug.s  that  relates  image-acquisition  date  to  the  corresponding  mean  wheat-growth  stage  and 
the  resultant  photographic  and  image  appearance.  In  this  illustration,  a late-June  image-acquisition  dale  has  been  extrapolated 
through  the  degree-das  growth  curse  into  the  photophenological  example  and  then  into  the  correspond!'  g aerial  and  spatial  signatures 
(Hand  Count).  South  Dakota),  (a)  W heal  phenologs . (The  numbers  along  the  degree-das  growth  curse  c *r respond  to  the  growth  stages 
shown  in  figure  4(b).)  (b)  Wheal  photophenologs  at  ground  level,  (cl  Color-Infrared  sertical  photograph,  August  II.  1976.  tdl  Sample 
segment  image.  August  23,  IVh.  (e)  Color-infrared  sertical  photograph,  June  20.  1976.  (f)  Sample  segment  image.  June  30.  1976.  tgl 
Color-infrared  sertical  photograph.  Mas  5,  1976.  (h)  Sample  segment  image.  Mas  7,  1976. 
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of  small  grains  and  by  illustrating  the  application  of 
multitemporal  analysis  to  the  detection  and  iden- 
tification of  the  various  biostages  of  wheat  and  other 
saia;'  grains. 

Tne  phenological  development  of  spring  and 
winter  Wheat  is  illustrated  by  a color  schematic 
which  depicts  the  color,  height,  and  general  ap- 
pearance of  the  wheat  plant  during  the  different 
biostages.  Another  figure  in  Section  3 illustrates  the 
concept  of  muititemporal  acquisitions  during  the 
various  biostages.  The  concept  of  multitemporal 
analysis  is  further  demonstrated  by  the  presentation 
of  six  LACIE  sample  segment  images  acquired  at 
different  times  during  the  year  over  Toole  County, 
Montana  (fig.  3).  Spring  wheat,  winter  wheat,  and 
barley  fields  are  annotated  on  each  of  the  six  Landsat 
CIR  images.  These  data  are  correlated  with  ground- 
level  photographs  of  the  three  crop  types.  The 
ground  photography  was  collected  during  the  corre- 
sponding Landsat  overpasses. 

Section  3 also  contains  a detailed  description  of 
the  crop  calendar  and  its  role  in  the  identification  of 
small  grains  and  other  crops.  A LACIE  crop  calendar 
for  Finney  County,  Kansas,  is  shown  to  illustrate  the 
use  of  the  calendar. 

In  Section  3,  the  general  techniques  for  identify  ng 
small-grains  cultivation  on  Landsat  imagery  are  de- 
scribed step  by  step  with  imagery  examples.  Figure  4 
is  a typical  example  of  the  imagery.  In  addition  to  the 
annotated  Landsat  imagery,  aerial  and  ground  photo- 
graphs of  different  agricultural  crops  are  included  to 
illustrate  the  varying  levels  of  information  contained 
in  each  type  of  imagery.  Sequential  imagery  collected 
over  the  LACIE  intensive  test  sites  in  Williams 
County  (North  Dakota),  Divide  County  (North 
Dakota),  Hand  County  (South  Dakota),  and  Toole 
County  (Montana)  is  used  to  illustrate  the  ap- 
pearance and  signatures  of  wheat/small  grains  and 
other  types  of  ground  cover  encountered  in  the 
Northern  Great  Plains.  Ground-truth  maps  for  the 
ITS's  are  included  in  the  sequence.  General  observa- 
tions on  some  similarities  and  dissimilarities  in  sig- 
natures are  made  to  familiarize  the  analyst  with  the 
different  types  of  patterns  and  signatures  he  may  en- 
counter. These  are  also  illustrated  with  imagery 
examples. 

The  selective  key  approach  for  Landsat 
agricultural  analysis  is  demonstrated  in  Section  3, 
and  the  use  of  the  LACIE  regional  keys  contrined  in 
Volume  II,  Parts  I and  II,  is  described  in  detail.  A 
regional  key  from  Volume  II  is  included  later  in  this 
paper  for  illustration  purposes.  The  decision  logic 


diagram  shown  in  figure  1 is  included  to  show  the 
decision  path  followed  by  the  analyst  identifying 
wheat/small  grains. 

Section  4,  “Environmental  Effects  on 
Wheat/Small  Grains  Signatures,"  provides  imagery 
examples  and  explanations  of  the  environmental  fac- 
tors that  may  alter  the  appearance  of  small-grains  sig-. 
natures.  Specific  factors  discussed  arc  soil  moisture 
variation,  planting  date  variation,  physiographic 
variations,  and  the  effects  of  drought,  flooding,  at- 
mosphere, snow,  and  wind.  A multistage  view  of  the 
various  environmental  effects  is  provided  by  ground- 
level  photography  and  aircraft  CIR  photography, 
which  are  correlated  to  the  Landsat  imagery  exam- 
ples. Figures  S to-7  are  typical  of  the  data  included  in 
this  section. 

To  facilitate  the  use  of  the  LACIE  Keys,  photo- 
graphic examples  of  common  agricultural  operations 
associated  with  the  development  and  harvest  of 
wheat  and  other  small  grains  are  presented  in  Section 
5,  “Common  Agricultural  Practices  for  Wheat  and 
Associated  Crops."  The  description  and  examples  in 
this  section  deal  with  the  preparation  of  agricultural 
land,  planting  and  harvesting  operations,  and  associ- 
ated cultivation  practices.  The  illustrations  and 
definitions  are  drawn  largely  from  North  America, 
but  they  are  not  necessarily  specific  to  the  United 
States  and  Canada.  Other  examples  are  inducted  to 
make  this  section  applicable  to  the  various  wheat- 
growing areas  of  the  world. 

Cropping  and  associated  practices  illustrated  in 
Section  4 include  planting  operations,  crop  rotations, 
fallowing,  minimum  tillage,  irrigation  systems, 
windrowing,  and  harvest  practices  (one-stage  and 
two-stage  harvesting).  Imagery  examples  of  the 
U.S.S.R.  (fig.  8),  the  People's  Republic  of  China 
(PRC),  India  (fig.  9),  Australia,  Brazil,  and  Argen- 
tina are  also  provided  to  illustrate  field  sizes  and 
shapes  typical  of  these  countries.  Ground  photo- 
graphs of  different  types  of  equipment  used  in  the 
operations  described  previously  are  included.  Aerial 
photographic  examples  are  also  provided  to  illustrate 
cropping  details  which,  in  some  instances,  are  not 
discernible  in  the  Landsat  imagery  (fig.  10). 


Volume  II,  “United  Stetee  and  Canadian  Great 
Plains  Regional  Keys" 

Volume  II  is  an  operational  Analyst  Interpreta- 
tion Key  for  use  in  identifying  small-grains  fields. 
Pixels  within  these  fields  are  used  as  inputs  to  train  a 
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< FUJI  RF.  7. — Sample  segment  image  containing  the  arra 
shown  in  figurr  6. 


FUJI  RF  5. — l.andsal  full-frame  image,  showing  surface- 
darkening  effect  from  a light  rainfall  of  brief  duration  (with 
ottrlas  of  total  precipitation  fur  March  19-211.  1973). 


FUJI  RK  N. — Sample  segment  image  of  kltar’kut  Oblast  in  the 
l kraine  region.  1 .S.S.R.  (No* ember  6.  1975). 


pattern-recognition  algorithm  so  that  all  small  grains 
within  a 5-  by  6-nauiical-milc  sample  segment  can  be 
classified 

Early  in  LACIE  Phase  1 ( 1975).  the  loss  of  data 
because  of  cloud  cover  and  acquisition  problems  was 
recognized  as  the  most  crucial  factor  affecting  the 
analyst's  ability  to  label  small  grains  correctly.  An 


FUJI  RF  6.— Color-infrared  high-allilude  photograph  showing 
ripe  wheal  (W).  harvested  wheat  (W/H).  windrowed  wheat 
(WWI,  am'  fe!iow  IF)  fields  on  glaciated  land  with  large  kel- 
tlrhules  Ik). 


analyst  mignt  process  a segment  w ith  a good  acquisi- 
tion history,  where  the  temporal  development  of 
small  grains  was  readily  apparent,  and  then,  several 
days  later,  process  a nearby  segment  with  only  one  or 
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H<il  Kt  *. — Sample  M-Ktnrnl  imam  n(  arra  mar  Merrill,  t liar 
Piadrvh.  ill  niiilhcrr.  India  iMarrh  I.  19751. 


two  acquisitions  and  a potentially  confusing  >nua- 
tion  Some  of  the  more  experienced  analysts  soon 
began  keeping  a few  "key"  segments  on  hand  for  use 
in  extendipit  signatures  to  nearby  areas  This  practice 
proved  to  be  very  effective  for  their  own  reference  as 
well  as  for  training  some  of  the  less  experienced 
analysts  Volume  II  is  ar.  attempt  to  formalize  this 
approach  by  the  systematic  selection  of  appropriate 
reference  segments  and  the  documentation  of  the 
procedure  for  their  jsc 

Regional  partition < — The  North  American  Great 
Plains  was  divided  into  41  regional  partitions  ba*cd 
on  soils,  climate,  land  use.  topography,  cropr  ing 
practices,  and  the  homogeneity  of  small-grams  sig- 
natures  (Sec  references  3 to  5 for  source  materials 


H(i'.  Kl  HI  — < «»ior-infnmd  mr dium  alliludt  phoi«»|*r jplt  of 
Kithl*tid  (uunl>,  Montana.  showing  fa'lon  U i upland  *tnp\ 
orit  tiled  p«  rp«  ndit  uUi  lo  (he  pie % ail i nt:  hind 


used  to  compile  the  partitions  ')  Thirty-seven  of  the 
regions  had  adequate  data  to  enable  the  selection  of  a 
reference  segment  for  inclusion  in  the  Al  Keys  The 
sample  segments  selected  as  reference  segments  arc 
intended  to  be  representative  of  the  remaining  seg- 
ments within  the  partitions  and.  therefore,  provide  a 
key  for  identifying  small  grains  throughout  the 
region. 

The  regional  partitions  along  with  reference  seg- 
ments arc  listed  in  table  I 

A reduced  example  of  the  material  available  for 
each  partition  is  shown  in  figure  1 1 The  actual  keys 
arc  11  by  17  inches  and  are  bound  in  loosclcaf  bind- 
ers. Volume  II  is  in  two  separate  binders  for  ease  of 
handling  Partitions  1 through  19  arc  contained  in 
Part  I of  Volume  II  and  Partitions  20  through  39  arc 
in  Part  II  Both  print  and  transparency  copies  arc 
available  for  use  by  the  analysts 

Use  of  Volume  II — The  complete  procedure  used 
by  analysts  to  estimate  the  proportion  of  small  grains 
in  LAr sample  segments  can  be  found  in  the 
CAMS  Detailed  Analysis  Procedures  (ref  6).  One  of 
the  (asks  performed  by  the  analyst  is  the  labeling  of 
individual  pixels  as  cither  small  grains  or  non-small- 
grains.  The  application  of  Volume  II  to  the  perfor- 
mance of  this  task  is  described  herein 

The  specific  steps  in  using  the  key  arc  illustrated 
in  the  following  example  An  analyst  is  assigned  seg- 
ment 1854,  which  has  an  acquisition  (fig  12)  col- 
lected 75/168  (168th  day  of  1975)  After  preparing 
and  mounting  the  imagery  according  to  standard  pro- 
cedures. the  analyst  should  proceed  as  follows 

1 Establish  the  principal  biostage  from  the  ap- 
propriate crop  calendar — The  analy  st  should  consult 
the  crop  calendar  adjustment  that  is  closest  to  the 
date  of  acquisition  (75/168)  and  read  the  biostage  for 
the  location  of  segment  1854  He  should  adjust  the 
crop  calendar  in  accordance  with  instructions  in  the 
CAMS  Detailed  Analysis  Procedures.  In  this  ease, 
the  biostagc  is  5 3 To  apply  the  key.  tnis  acquisition 
should  be  defined  as  biostage  5 and  the  value  5.3 
should  be  used  in  step  2 to  determine  other  possible 
biostages  for  wheat 

2 Determine  other  possible  biostages  from  the 
average  crop  calendar — Using  the  average  crop  calen- 
dar from  his  packet,  the  analy  st  should  plot  the  value 
5.3  on  the  50-percent  horizontal  line  lie  should  con- 
struct a vertical  line  through  this  point  as  shown  in 
the  follow  ing  diagram 

'l  I)  I’cjiI  'Al  ke>»  fjrliuonm*  Procedure  presentation 
to  VSSA  management.  Mat 
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Because  the  vertical  line  in  this  esse  crosses  biosuges 
5 and  6 and  because  the  accepted  tolerance  is  10  days. 


the  analyst  should  conclude  that  some  wheat  may  be 
in  biostage  4 (heading),  some  in  biostage  6 (ripe), 
and  some  in  biostage  7 (harvested).  Note  that  steps  I 
and  2 have  also  been  accomplished  for  the  first  ac- 
quisition (75/115 — Tig.  13)  and  the  results  are  ported 
beneath  the  imagery. 


PARTITION  10  - SOUTHCAN  C8NTRAL  HIOH  TAOLtLAND 
Segment  1028 
Creaky,  Kama* 


LOCATION 

a Tbit  partition  f*  in  southwestern  Kansas  and  a (mail  part 
o<  southeastern  Colorado. 

CANO  use 

• About  three-fifths  ol  tbit  partition  ii  cropland.  Moat  of 
tha  remaining  area,  consisting  of  hilly  and  ttaap  slopes 
bor daring  dreinaga  wavs.  it  In  native  graaaaa  and  ahruba. 

• Tbit  partition  la  a motor  dryfai mi.ig  region. 

a Winter  wheat  it  the  main  each  crop,  but  other  email 
greint.  grain  sorghum.  aome  com,  alfalfa,  and  Other  bay 
erope  occupy  large  aereaget. 

a Many  kinds  of  grope  are  grown  in  tha  narrow  band  of 
irrigated  land  along  tha  Arkansas  River. 


CLSVATION,  TOPOGRAPHY,  ANO  COILS 

a Iterations  range  trom  900  to  1 200  maters  (3000  to  4000 
feat). 

a This  partition  consists  of  smooth  loeta -mantled  tableland 
tlopas  that  are  level  (0  to  16  percent  slope).  Steep  slopes 
border  tha  Arkansas  River  Valley. 

• Mott  soils  in  this  partition  are  brown  to  nearly  black, 
fine-silty  and  clayey  In  texture. 

CLIMATE 

a Tha  average  annual  precipitation  it  3S  to  SI  centimeters 
<16  to  30  inches),  fluctuating  widely  trom  year  to  veer. 

a The  average  annual  temperature  is  283  to  307  K ISO*  to 
87*  P). 

• The  average  freaee-fraa  period  it  170  to  1S8  days. 

a Drought,  wind,  and  low  temperatures  are  maior  heard*. 

CROPPING  PRACTICES 

a In  all  tha  Kansas  counties,  a crop  rotation  of  summer 
f a! low/wheet /sorghum  it  common.  Because  of  water  re- 
quirements. wheat  does  not  often  directly  follow  sor- 
ghum. 

a In  Baca  County.  Colorado,  tha  possible  cropping  systems 
art  sorghum/f  show/ wheat;  wheat /tallow;  torghum/faUow; 
and  continuous  sorghum. 

a Al’emete  strips  of  crop  and  fallow  across  the  direction  of 
prevailing  winds  are  used  to  help  control  erosion.  Tha 
width  of  tha  strips  varies  with  erodebility  of  the  soil  end 
sin  ol  machines  used  in  the  fp.nleg  operation.  Bendy 
soils  require  narrower  strips  than  !..•  • ier  soils. 

SAMPLE  SEGMENTS 

a Partition  10  includes  sample  segments  1036. 1033, 1036. 
1041,  I860, 1883. 1884, 1867, 1888, 1881, 1883, 1884. 
1888. 1881.  and  1988. 


(a) 

FIGURE  Il.-Essnplr  pertitiea.  (a)  Map  and  tenant  tofentlhut.  <b>  I sndsef  full  frame  and  aaadaal  ctop  calendar,  (c)  October  18, 
1974,  to  Jane  19.  1975,  (8)  November  10, 1975.  to  April  20.  1976.  <e)  Jane  13  to  September  29.  1976. 
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IK,  l HI  II. — ( null  tim'd 


3.  Locate  the  appropriate  reference  segment  in 
the  key— Using  table  1 in  Volume  II  (a  portion  of 
which  is  reprinted  herein  as  table  II),  'he  analyst 
should  find  segment  1854  in  the  left-hand  column 
and  read  the  corresponding  partition  (10)  in  the 
right-hand  column. 

4 Locate  fields  to  be  identified — Suppose  the 
analyst  must  identify  dot  number  57  (the  upper  left 
pixel  of  the  intersection  of  grid  lines  190  and  30)  and 


dot  number  69  (the  upper  left  pixel  of  the  intersec- 
tion of  grid  lines  120  and  40).  He  must  first  deter- 
mine that  dot  number  69  is  a part  of  field  A and  dot 
number  57  is  a part  of  field  B. 

5.  Compare  these  fields  to  signatures  annotated 
on  the  reference  segment — Since  the  acquisition  has 
been  identified  as  biostage  5,  fields  A and  B should 
be  compared  to  fields  annotated  5 on  the  reference 
segment.  If  the  signatures  are  not  similar,  the  analyst 
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H(il  RK  II . — < nnlinuid. 


should  compare  the  Helds  with  other  possible 
biostages  (4.  6.  and  7)  In  ihis  example,  Helds  A and 
B are  similar  to  fields  labeled  5 on  the  reference  seg- 
ment. 

6.  Follow  the  logic  diagram — Follow  the  logic 
diagram  ( fig.  I ) to  determine  w hether  these  fields  are 
small-grains  or  non-small-grains  fields. 

7.  Repeat  steps  4 through  6 for  all  available  ac- 
quisitions and  for  each  field  to  be  identified — In 
figure  12.  acquisition  75/168,  both  fields  should  be 
regarded  as  similar  to  fields  annotated  as  biostage  5 
on  the  reference  segment  In  figure  13,  acquisition 
75/1 15.  field  A appears  similar  to  fields  annotated  2 
on  the  reference  segment,  whereas  field  B does  not 
appear  to  be  similar  to  fields  labeled  2 or  3,  which  are 
the  only  possible  biostages.  Following  the  decision 
logic  diagram,  field  A should  be  identified  as  a small- 
grains  field  end  field  B should  be  identified  as  a non- 
small-grains field. 


Al  KEYS  TEST  AND  EVALUATION 

The  introduction  of  new  or  modified  procedures 
into  the  LACIE  environment  is  generally  preceded 
by  testing  in  a quasi-operational  mode  using  a cross 
section  of  LACIE  analysts.  Although  the  keys  con- 
cept evolved  from  the  interpretation  methods  used 
in  LACIE,  there  are  procedural  elements  different 
enough  to  warrant  thorough  testing  and  evaluation 
before  implementation  into  ongoir"  lACIE 
analyses. 

The  test  was  designed  and  subsequent  ground- 
'ruth  comparison  evaluation  conducted  by  the 
Research,  Test,  and  Evaluation  Branch  (ref.  7).  The 
objective  of  the  test  was  to  determine  the  type  and 
pattern  of  influence  on  wheat/small-grains  iden- 
tification accuracy  resulting  from  the  introduction  of 
the  interpretation  keys  and  associated  decision  logic 
into  operational  use  in  LACIE. 
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Three  segments  were  chosen  randomly  from  each 
of  the  4 U.S.  spring/mixed  wheat  partitions,  provid- 
ing a total  of  12  segments  for  testing.  Twenty  to  thir- 
ty Helds  were  selected  in  each  segment  so  that  the 
full  range  of  wheat  and  nonwheat  signatures  was 
represented.  Each  segment  had  from  two  to  live  ac- 
quisitions. Each  segment  was  interpreted  eight  times 
with  the  keys  by  a group  of  analysts  and  eight  times 
without  the  keys  by  a different  set  of  analysts. 

The  test  approach  was  to  use  16  analysts  grouped 
according  to  4 levels  of  LACIE  experience.  The  four 
teams  consisted  of  the  following. 

1.  Analysts  with  little  LACIE  experience  who 
had  no  familiarity  with  the  U.S.  spring  wheat  regions 


2.  Analysts  with  little  LACIE  experience  who 
had  some  familiarity  with  U.S.  spring  wheat  regions 

3.  Analysts  with  LACIE  experience  in  areas 
other  than  the  U.S.  spring  wheat  regions  (e.g.. 
U SS  R or  PRC) 

4.  Analysts  with  LACIE  experience  in  the  U.S. 
spring  wheat  regions 

Total  errors  in  Held  labeling  (small-grains/non- 
small-grains) were  tested  using  analysis  of  variance 
(ANOVA)  methods  (ref.  8).  The  findings  and  con- 
clusions from  the  test  are  summarized  as  follows. 

1 The  interpretation  accuracy  for  all  four  groups 
of  analysts  improved  significantly  with  the  use  of  the 
Al  Keys.  The  total  error  was  reduced  in  each  group, 
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1976  CROP  YEAR 


January  21 . 1976 


76/111  April  20.1976 

Note:  Drought  resulted  in  thin,  spotty  fields.  Note  difference  in 
1975  and  1976  crop  years:  some  winter  kill  may  be  evident  in 
upper  portion'  of  the  image. 


76/092  April  1.1 9/b 

Note  A cold  front  caused  freezing  in  northwestern  Kansas. 
White  signatures  are  probably  frost:  no  snow  was  reported  this 
date. 
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Table  L— Regional  Partitions  and  Reference  Segments 


Partition 

number 

Partition  description 

Reference 
segment 
number. 
Phase  II 

Reference 

segment 

location 

1 

Northern  Texas  Btacktand  Prairie 

1274 

Fannin  Co.,  Tex. 

2 

Central  Rolling  Red  Prairie 

1241 

Woods  Co.,  Ok  la. 

3 

Central  Rolling  Red  Plains  (East) 

1259 

Baylor  Co.,  Tex. 

4 

Central  Rolling  Red  Plains  (West) 

1230 

Greer  Co.,  Ok  la. 

5 

Southern  High  Plains 

1084 

Swisher  Co.,  Tex. 

6 

Central  Rolling  Red  Plains  (North) 

1232 

Kiowa  Co.,  Okla, 

7 

Cherokee  Plains 

1178 

Bourbon  Co.,  Kans. 

8 

Great  Bend  Sand  Plains 

1889 

Edwards  Co..  Kans. 

9 

Northern  third  of  the  Southern  High  Plains 

1865 

Stevens  Co.,  Kans. 

10 

Southern  Central  High  Tableland 

1025 

Greeley  Co.,  Kans. 

11 

Upper  Arkansas  River  Valley  Rolling  Plains 
and  South  Central  High  Plains 

1005 

Cheyenne  Co.,  Colo. 

12 

North  Central  High  Tableland 

1093 

Yuma  Co.,  Colo. 

13 

Central  High  Tableland 

1851 

Graham  Co.,  Kans. 

14 

Rolling  plains  and  breaks 

1875 

Osborne  Co.,  Kans. 

15 

Central  Loess  Plains,  Bluestream  Hills,  and  Central  Kansas 
Sandstone  Hills 

1181 

Cowley  Co.,  Kans. 

16 

Nebraska  and  Kansas  Loess  Drift  Hills 

1574 

Colfax  Co.,  Nebr. 

17 

Central  Nebraska  Loess  Hills 

1588 

Adams  Co.,  Nebr. 

18 

Mixed  sandy  and  silty  tableland/Middle  Central  High  Plains 

1562 

Cheyenne  Co.,  Nebr. 

19 

Wyoming-South  Dakota-Upper  Platte  River  Valley 

1682 

Haakon  Co.,  S.  Dak. 

20 

Rolling  Pierre  Shale  Plains/South  Dakota-Nebraska  Eroded 
Tableland 

1694 

Lyman  Co.,  S.  Dak. 

21 

Eastern  Black  Glaciated  Plains 

1674 

Fauik  Co.,  S.  Dak. 

22 

Loess,  till,  and  sandy  prairies 

— 

— 

23 

Western  Minnesota  forest-prairie  transition 

1521 

Grant  Co.,  Minn. 

24 

Red  River  Valley  of  the  North 

1681 

Roberts  Co.,  S.  Dak. 

25 

Central  Black  Glaciated  Plains 

1622 

Ramsey  Co.,  N.  Dak. 

26 

Rolling  Soft-Shale  Ptains  and  Southern  Dark-Brown  Glaciated  Plains 

1629 

McLean  Co..  N.  Dak. 

27 

Northern  Rolling  High  Plains  and  Rolling  Soft-Shale  Plains 

1555 

Fallon  Co.,  Mont. 

28 

Northern  Roiling  High  Plains 

1556 

Powder  River  Co.,  Mont. 

29 

Northwestern  Black  Glaciated  Plains 

1606 

Ward  Co..  N.  Dak. 

30 

Dark-Brown  Glaciated  Plains 

1538 

McCone,  Co..  Mont. 

31 

Northern  Rolling  High  Plains  and  Northern  Smooth  High  Plains 

— 

— • 

32 

Northern  Rocky  Mountain  Foothills 

1732 

Glacier  Co.,  Mom. 

33 

Brown  Glaciated  Plain 

1739 

Teton  Co.,  Mom. 

34 

Southwestern  Saskatchewan  and  northern  Montana 

3081 

Saskatchewan 

35 

Southeastern  Saskatchewan  and  southwestern  Manitoba 

3129 

Saskatchewan 

36 

Eastern  Saskatchewan  and  western  Manitoba 

3158 

Saskatchewan 

37 

South-central  Saskatchewan 

3122 

Saskatchewan 

38 

Southwestern  Saskatchewan 

3133 

Saskatchewan 

39 

Alberta 

3256 

Alberta 

40 

Central  Alberta  and  central  Saskatchewan 

— 

41 

Northwestern  Alberta 
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Table  II. — Segment/Partition  Cross  Reference 


Segment 

Partition 

Segment 

Partition 

Segment 

Partition 

1745 

11 

1856 

11 

1879 

15 

1 74b 

28 

INS"1 

10 

1880 

14 

1747 

12 

1858 

13 

1881 

14 

1748 

12 

1859 

10 

1882 

14 

1749 

12 

1860 

11 

1881 

15 

1750 

32 

1861 

10 

1884 

15 

1751 

12 

1862 

9 

1885 

15 

1752 

32 

1861 

10 

1886 

14 

1753 

28 

1864 

10 

1887 

14 

1850 

10 

1865 

9 

1888 

15 

1851 

13 

1866 

to 

1889 

8 

1852 

10 

1875 

14 

1890 

8 

1851 

14 

1876 

15 

1891 

8 

1854 

10 

1877 

14 

1892 

15 

1855 

14 

1878 

14 

1891 

8 

75/168  June  17.  1975 


FHtl  HI  12 — St'Knu'iil  1854  (Scull,  Kansas)  in  hinsiauc  5;  other 
possible  biostages  are  4.  A.  and  7. 


as  is  shown  in  figure  14(a).  Group  4 had  (he  most  im- 
provement (d  percent),  and  groups  2 and  3 showed 
the  least  improvement. 

2.  The  omission  and  commission  error  rates  were 
reduced  by  using  the  Al  Keys.  The  greater  reduction, 
approximately  8 percent,  was  in  the  omission  error 
rate.  The  commission  error  rate  improved  by  approx- 
imately 2 percent  (fig.  14(b)). 


75/115  April  15,  1975 

FUJI  RE  1.1. — Segment  1854  in  biostage  2;  the  onl>  other  possi- 
ble biostage  is  .1. 


3.  The  AI  Keys  improved  accuracy  in  each  of  the 
four  agrophysical  partitions,  as  is  shown  in  figure 
14(c).  The  general  locations  of  the  partitions  are 
given  in  table  I. 

4.  Use  of  the  Al  Keys  improved  labeling  accuracy 
during  all  four  biowindows.  The  greatest  increase  in 
accuracy  occurred  in  biowindow  1 (8  percent),  with 
the  least  improvement  in  biowindow  2 (2  percent). 
Figure  14(d)  shows  the  accuracy  improvement  for 
each  biowindow. 


CONCLUSION 

The  LACIE  Analyst  Interpretation  Keys  Volume 
I,  “Image  Analysis  Guide  for  Wheal/Small  Grains 
Inventories,”  and  Volume  II,  “United  States  and 
Canadian  Great  Plains  Regional  Keys,”  were 
developed  during  Phase  II  of  LACIE  (1976)  and  im- 
plemented during  Phase  III  (1977).  The  Al  Keys 
were  tested  using  operational  LACIE  Jata,  and  the 
results  demonstrate  that  use  of  the  AI  Keys  provides 
improved  labeling  accuracy  in  all  analyst  experience 
groupings,  in  all  geographic  areas  within  the  U.S. 
Great  Plains,  and  during  all  periods  of  crop  develop- 
ment (biowindows). 

To  document  the  complete  range  of  signature 
variability  and  temporal  sequences,  several  addi- 
tional years  of  data  may  be  necessary.  Volume  II  cur- 
rently contains  the  2 years  of  segment  imagery  w hich 
were  available  during  the  development  of  the  keys  in 
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FIGURE  14.— Accuracy  improvement  using  the  Al  Keys,  (a)  Accuracy  improvement  by  analyst  experience  group.  <b)  Accuracy  Im- 
provement (or  omission  and  commission  error  rate,  (c)  Accuracy  Improvement  (or  each  agrophysical  partition,  (d)  Accuracy  improve- 
ment by  biowlndow. 


1976.  As  improved  analysis  techniques,  imagery  pro- 
ducts, and  spectral  aids  are  developed  and  tested, 
these  should  be  incorporated  into  the  Al  Keys 
volumes  to  reflect  current  technology  in  use  for 
satellite  agricultural  surveys. 
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Colorimetric  Consideration  of  Transparencies  for  a 

Typical  LACIE  Scene 

R.  D.  Juday0 


INTRODUCTION 

In  the  LACIE  operations,  analyst  interpretation 
of  transparencies  made  from  Landsat  data  was  an 
important  factor.  The  Earth  Observations  Division 
of  the  NASA  Johnson  Space  Center  (JSC)  embarked 
on  schemes  designed  to  provide  the  analyst  with  op- 
timal Him  products.  Two  such  products,  designated 
Product  1 and  Product  3 (the  latter  also  being  known 
as  the  Kraus  product),  were  the  ones  principally 
used.  Certain  aspects  of  those  products  are  reviewed 
here  from  the  standpoint  of  color  theory.  The  ex- 
amination of  the  two  products  has  led  to  a clarifica- 
tion of  some  principles  of  color  image  generation  for 
remote  sensing  and  has  given  a quantitative  basis  for 
the  t'evelopmem  of  a new  class  of  imagery  based  on 
those  principles,  reported  elsewhere  in  the  LACIE 
symposium  proceedings. 

Details  of  the  mathematical  calculations  are  given 
in  the  appendix;  an  attempt  is  made  in  the  text  to 
give  short,  simple  explanations  of  some  perhaps  un- 
familiar colorimetric  terminology.  The  reference  list 
is  a guide  to  a more  thorough  education. 

A complete  colorimetric  evaluation  includes 
least  two  subtasks;  consideration  of  the  color  presen- 
tation of  information  within  a single  transparency 
and  consideration  of  the  image-to-image  stability  of 
the  presentation.  In  this  report,  the  former  is  quan- 
titatively treated  and  the  latter  is  qualitatively 
discussed. 


COLORIMETRY 

The  machine  used  to  produce  the  LACIE  imagery 
is  the  FR-dO  manufactured  by  information  Interna- 
tional, Incorporated,  and  installed  at  the  Johnson 


“NASA  Johnson  Space  Ccmer,  Houston.  Texas 


Space  Center.  It  is  locally  known  as  the  production 
film  converter  (PFC).  It  operates  by  imaging  a black- 
and-white  cath«de-ray-tube  display  sequentially 
through  colored  filters  onto  color  reversal  fUm,  with 
the  result  being  thought  of  as  writing  red,  blue,  and 
green  images  through  independent  channels.  Each  of 
the  channels  is  configured  so  that  color  densitometry 
on  the  developed  film  shows  a linear  relationship  be- 
tween transmission  density  (being  the  logarithm  of 
the  transmission,  measured  at  a wavelength  near  the 
spectral  peak  for  the  channel  under  consideration) 
and  the  input  for  that  channel  (being  a number  of 
digital  counts  in  the  range  0 to  25S). 

Normal  color  vision  is  three  dimensional  (ref.  1). 
That  is,  three  properly  chosen  primary  lights  are 
sufficient  to  match  any  other  colored  light  by  an  ad- 
ditive process  in  which  the  primaries  are  added  in 
varying  proportion  (including  negative  contribu- 
tions, in  which  the  negative  contribution  is  attained 
by  adding  that  primary  to  the  light  to  be  matched). 
Many  coordinate  systems,  all  three  dimensional,  are 
used  to  describe  color.  Examples  include  the  CIE1 
system  (luminosity  and  two  chromatic  coordinates) 
and  the  Munsell  system  (hue,  chroma,  and  value) 
(ref.  2).  There  are  mathematical  relationships  allow- 
ing passage  between  the  systems.  The  three-dimen- 
sional system  having  the  counts  in  the  three  chan- 
nels of  the  PFC  as  basis  vectors  can  be  related  to  the 
standard  color  systems  by  appropriate  measure- 
ments; this  has  been  done  for  the  PFC,  and  the 
details  of  the  mathematical  model  are  given  in  the 
appendix.  The  considerable  body  of  colorimetric 
theory  is  then  accessible  for  a discussion  of  the  PFC 
and  the  LACIE  film  products. 

One  color  system  that  is  particularly  apropos  to 


I The  Commission  Internationale  4c  lTlcUira$e,  or  ihe  Inter- 
national  Commission  of  Illumination,  «rnch  sets  the  international 
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this  discussion  is  a uniform  chromaticity  scale 
(UCS)  space.  The  characteristic  of  a UCS  space  is 
that  Euclidean  distance  in  that  color  space  is  directly 
proportional  to  the  perceptibility  of  the  difference 
between  neighboring  colors.  Theory  shows  that  even 
though  color  vision  is  three  dimensional,  the  distor- 
tion of  the  usual  color  spaces  required  to  match  the 
stated  quality  of  the  UCS  space  cannot  be  displayed 
in  a Euclidean  space  of  three  dimensions  because  of 
negative  Gaussian  curvature.  However,  various  ap- 
proximations to  a true  UCS  can  be  displayed  in  a 
three-dimensional  Euclidean  space,  and  the 
(L*a*b*)  approximation,  being  evaluated  by  the  C1E 
(ref.  3),  will  be  used.  The  parameter  L* is  associated 
with  lightness;  a*,  with  a balance  between  the  com- 
plementary colors  green  and  magenta;  and  b\  with 
the  complementary  colors  blue  and  yellow.  The  unit 
of  distance  is  the  just  noticeable  difference  (JND) 
discussed  in  reference  4.  The  proportionality  be- 
tween Euclidean  distance  and  perceptibility  of  color 
difference  holds  only  locally  (i.e.,  for  small  color 
differences  such  as  those  between  various  shades 
and  lightnesses  of  green)  rather  than  globally  (as  for 
large  color  differences,  such  as  those  between  bright 
pink  and  navy  blue).  But  under  the  rationale  that 
large  color  differences  are  made  up  of  smalt  ones,  the 
proportionality  will  be  applied  to  large  color 
differences  where  required.  The  ( L*a*b *)  system  is 
attractive  for  its  analytic  invertibility,  a property  that 
will  be  seen  to  be  highly  advantageous. 

One  last  approximation  should  be  noted:  the 
transformation  between  counts  and  coloife  is  treated 
as  continuous,  even  though  the  transformation  exists 
only  for  integer  count  values. 

A colorimetric  model  of  the  PFC  has  been  gener- 
ated, as  outlined  in  the  appendix.2  The  model  de- 
scribes the  ..'totionship  between  PFC  input  count 
vectors  and  the  (L'a*b*)  space,  with  the  transparen- 
cy being  viewed  on  a cool  white  fluorescent  light  ta- 
ble (i.e..  the  viewing  illumination  forms  part  of  the 
colorimetric  description).  Specifying  the  transforms- 
tirn  between  PFC  input  counts  and  feature  space 
then  allows  the  passage  between  data  space  and  the 
UCS  space.  Some  of  the  results  of  examining  the 
feature-space/color-space  relationship  are  described 
in  a later  section,  but  some  of  the  concepts  that  will 
be  used  are  introduced  here. 
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The  linear  sensitivity  of  a color  product  is  the 
number  of  perceptible  color  steps  (which  include 
brightness  differences)  moved  through  for  a unit 
motion  in  the  dau  space.  The  dimensions  of  linear 
sensitivity  are  JND’t  per  count.  For  Products  1 and 
3,  linear  sensitivity  is  a function  of  both  location  in 
the  dau  space  and  the  direction  of  motion. 

As  mentioned  previously,  the  transformation  be- 
tween dau  space  and  color  space  is  regarded  as  con- 
tinuous, despite  the  16-level  truncation  and  the  un- 
derlying digital  nature  of  the  transformation.  Under 
this  approximation,  which  will  yield  general  truths,  a 
planar  distribution  of  points  in  data  space  occupies  a 
two-dimensional  surface  (generally  curved)  in  color 
space.  A differential  area  in  dau  space  damans  into 
a differential  area  in  color  space  dAt . Respective 
unitt  are  square  counu  and  square  colors.  This  step 
leads  to  the  chromatic  expansion  ratio  {dAt  )KdAd ), 
with  uniu  of  square  colors  per  square  count.  (There 
is  similarly  a chromatic  expansion  ratio,  cubic  colors 
per  cubic  count,  for  the  full  three-dimensional  color 
transformation.  For  now,  however,  only  planar  dis- 
tributions in  data  space  are  considered.)  The 
chromatic  expansion  ratio  is  one  measure  of  how  the 
available  area  (or  volume)  in  color  space  is  budgeted 
to  the  corresponding  portions  of  dau  space. 

A finite  set  of  discriminable  colors  is  produced  by 
the  PFC.  In  analogy  to  the  sutisticians’  term  “prob- 
ability mass,"  the  discriminability  mass  is  defined  as 
the  volume  of  the  UCS  space  accessible  to  the  PFC. 
On  a two-dimensional  surface  in  the  UCS  space,  a 
smaller  number  of  discriminable  colors 
exists;  for  that  surface,  the  discriminability  mass  is 
approximately  one  color  times  the  area  of  the  sur- 
face. An  optimum  transformation  of  data  into  color 
space  will  have  a surface  area  of  largest  extent,  sub- 
ject to  other  constraints  such  as  continuity, 
monotonicity,  smoothness  in  linear  sensitivity,  and 
orthogonality  (angle-preserving  nature)  in 
transforming  data  from  data  space  into  color  space. 

Visual  orthogonality  is  a concept  deserving  a little 
more  explanation.  It  is  tied  in  turn  to  geodesic  paths 
in  color  spaces.  Imagine  all  possible  curves  connect- 
ing two  color  experiences  in  a particular  color  space. 
Now  imagine  that  you  are  observing  a color  patch  the 
color  of  which  is  changing  in  the  manner  described 
by  motion  along  one  of  those  paths.  Keep  track  of 
the  number  of  times  that  the  color  of  the  observed 
patch  just  barely  noticeably  changes  in  color  in  mov- 
ing along  the  path,  and  assign  to  the  path  the  number 
of  changes.  The  geodesic  path  between  the  colors  is 
the  path  that  has  the  minimum  number  of  changes. 


The  perceptibility  of  color  difference  it  proportional 
to  the  JND's  counted  along  the  geodesi  nath  con* 
necting  the  colors.  Around  any  color,  there  is,  in 
general,  an  ellipsoid  the  points  of  which  are  just 
birely  noticeably  different  from  the  center.  The 
gsodesic  path  between  colors  passes  through  the 
smallest  number  of  the  JND  ellipses.  For  small 
regions  about  any  color  center,  the  color  space  can  be 
warped  so  that  the  ellipsoids  in  the  region  become 
spheres  (this  is  the  local  definition  of  a UCS  space). 
In  that  local  region,  with  the  ellipses  distorted  into 
spheres,  the  Euclidean  metric  (s2  — Ax3  + Ay2  + 
Ai2,  with  i being  the  distance  between  points)  is  pro* 
portional  to  the  color  difference.  In  a UCS  space, 
marks  made  on  a color  path  at  JND  intervals  will  be 
equally  spaced  along  the  path.  Motions  occurring  in 
perpendicular  directions  in  the  UCS  space  will  have 
the  greatest  difference  in  the  kinds  of  charge  in  color. 
In  psychophysical  terms,  examples  are  brightness 
change  compared  to  hue  change  or  hue  change  com* 
pared  to  saturation  change.  Brighter  and  dimmer  are 
in  opposite  directions  certainly  but  are  the  same  kind 
of  change.  The  author  proposes  that  a desirable 
feature  of  a data-to-color  transformation  is  for 
different  kinds  of  change  in  data  space  to  correspond 
to  different  kinds  of  color  change;  i.e.,  the  transfer* 
mation  from  data  space  to  color  space  should  be 
orthogonal,  or  angle  preserving.  Operations  can  be 
performed  in  data  space  before  the  transformation, 
of  course;  e.g.,  to  emphasize  numerical  features  of 
particular  interest. 


HISTORY  OF  PRODUCTS  1 ANO  3 
ALGORITHM  DEVELOPMENT 

Product  1 predates  Product  3.  It  independently 
scales  three  bands  of  Landsat  data  (multispectral 
scanner  (MSS)  channels  1, 2,  and  4)  into  the  three  in* 
puts  of  the  PFC  so  that  the  six  standard  deviations 
centered  on  the  mean  of  an  MSS  channel  occupy  in 
linear  fashion  the  full  range  of  the  PFC’s  assigned  in- 
put channel.  The  goal  in  its  formulation  was  max- 
imal chromatic  expansion  with  only  an  acceptable 
minority  of  points  being  saturated  (falling  beyond 
the  0*  to  255*count  input  range  of  the  PFC).  it  is 
readily  apparent  that  the  chromatic  representation  of 
a point  in  feature  space  will  vary  with  the  statistics  of 
the  scene;  as  the  means  and  standard  deviations  of 
the  data  move  in  feature  space,  the  relationship  be- 
tween a given  feature  space  vector  and  input  counts 


for  the  PFC  will  be  altered.  Analyst  experiences  led 
to  a desire  for  a “true  color"  product,  in  which  a 
given  feature  space  vector  would  have  a more  highly 
consistent  appearance  than  as  put  onto  film  under 
Product  |.  In  Product  I,  each  channel  has  its  own 
gain  and  bias  associated  with  the  means  and  standard 
deviations  in  each  MSS  channel.  For  Product  3,  the 
MSS  statistics  are  used  to  govern  essentially  only  one 
parameter,  an  overall  gain.  There  is  zero  bias,  and  the 
gain  (change  in  PFC  counts  per  change  in  MSS 
counts)  has  a constant  relationship  among  the  three 
PFC  channels.  In  Product  3,  a feature  space  vector 
transformed  into  the  PFC  gun  cube  would  be 
changed  in  magnitude,  but  not  in  direction,  by  vary- 
ing scene  statistics.  Both  direction  and  magnitude 
could  vary  under  Product  1.  The  equations  for  the 
Product  1 and  3 transformations  from  feature  space 
to  PFC  input  space  are  given  in  the  appendix. 


COLORIMITRIC  ASSUMPTIONS  FOR 
PRODUCT  1 AND  3 ALGORITHMS* 

In  scaling  data  linearly  into  the  PFC,  one  assumes 
either  that  the  PFC  is  a UCS  machine  or  that  the 
manufacturer  of  the  machine  “knew"  somehow  the 
manner  in  which  it  was  to  be  used  and  therefore  built 
into  it  a feature  causing  it  to  concentrate  chromatic 
discriminability  into  the  desired  pans  of  the  input 
cube.  Neither  is  quite  the  case,  actually;  the  manufac- 
turer (together  with  the  film  supplier  and  the  film 
developer)  supplies  a machine  that  has  so  large  a 
gamut  of  expressible  colors  that  it  is  a little  difficult 
to  go  wrong.  That  fact  and  the  quality  control  exer- 
cised over  the  fine  tuning  of  the  machine  and  film 
processing  give  a stability  of  color  representation 
sufficient  for  much  of  the  analysts'  success. 

The  uniform  chromaiicity  assumption  for  the 
PFC  arises  naturally.  The  Weber-Fechner  law  states 
that  to  be  just  noticeably  brighter,  one  patch  of  light 
must  exceed  in  luminance  that  of  another  patch  by  a 
constant  fraction.  The  linearization  of  the  PFC  in 
transmission  density  (ref.  S)  for  each  input  channel 
gives  precisely  that  result  for  single  activation  of 
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each  channel.  In  other  words,  the  PFC  is  a UCS 
machine  when  only  one  of  the  channels  at  a time  is 
activated  Because  the  result  of  activating  two  chan* 
nets  at  a time  gives  s result  that  looks  different  than 
either  of  the  channels  that  were  added  together,  it  is 
easy  to  assume  that  the  PFC  generalizes  from 
uniform  scale  along  the  axes  to  uniform  Male  any* 
where  in  the  input  cube.  The  author  believes  that  in 
the  development  of  Products  1 and  3,  without  its 
ever  having  been  explicitly  stated,  the  PFC  was  im- 
plicitly regarded  as  having  UCS  properties  over  til  its 
input  range.  The  failure  in  chromatic  consistency  in 
the  appearance  of  a feature  type  under  Product  1, 
rather  than  a lack  of  uniform  discriminability,  led  to 
the  effort  of  defining  Product  3. 

The  actual  nonuniform  discriminability  of 
features  separated  by  a constant  Euclidean  distance 
in  feature  space  was  masked  by  a truncation  of  the 
PFC's  input  range  to  16  levels  in  each  channel  for 
both  products.  The  truncation  was  done  for  visually 
esthetic  reasons— many  things  were  tried,  and  a few 
were  chosen.  Field  boundaries  are  reported  to  be 
more  sharply  distinct  with  the  inputs  limited  to  16 
levels  per  channel.  The  net  result  is  that  a coarser 
grid  of  color  points  is  occupied  by  data,  with  inter- 
point  discriminability  “crowbarred”  to  be  larger. 
This  is  a poor  man's  approach  to  increased  visual 
separability. 

The  goal  of  Product  3 was  color  fidelity;  the  un- 
derlying assumption  is  that  maintaining  relative  pro- 
portions among  the  three  inputs  to  the  PFC  will 
maintain  the  chromaticity  (hue  and  saturation)  of  a 
picture  element  (pixel),  allowing  only  lightness  to 
vary.  If  the  PFC  inputs  were  in  proportion  to  the 
transmission  rather  than  to  the  logarithm  of  the 
transmission,  that  goal  would  have  been  realized. 
With  the  existing  linearization  in  density,  there  is  not 
the  chromatic  constancy  expected.  Product  3, 
however,  comes  much  closer  to  that  concept  of 
chromatic  fidelity  than  does  Product  I. 

Color  vision  is  throe  dimensional;  that  is  why 
there  are  no  more  than  three  inputs  to  the  PFC  (or 
any  other  color-generating  process,  though  the  four- 
color  printing  process  uses  black  ink  to  darken  a col- 
or). Thus,  a dimensionality-reduction  scheme  is 
needed  in  making  imagery  from  Landsat  MSS  data 
because  the  lour  channels  in  the  MSS  are  spectrally 
independent.  Of  several  schemes  (such  as  calculating 
principal  components  for  four-channel  distribution 
or  taking  a standard  rotation  of  the  four-dimensioned 
data  and  displaying  the  first  three  in  either  case),  the 
simplest  scheme— dropping  one  channel— was 


chosen.  Because  of  a high  degree  of  correlation  be- 
tween channels  3 and  4 of  MSS  data,  channel  3 was 
eliminated.  Hie  channel  assignments  between  MSS 
channels  and  PFC  channels  were  made  to  achieve  a 
high  degree  of  similarity  bet  went  the  appearance  of 
the  final  product  and  the  color-infrared  aerial  photo- 
graphs with  which  the  analysts  woe  already  very 
familiar.  Thus,  MSS  channel  4 was  assigned  to  red, 
channel  2 to  green,  end  channel  t to  blue. 


QUANTITATIVE  RESULTS 

A typical  LAC1E  scene  was  subjected  to  col- 
orimetric analysis;  the  details  are  given  in  the  appen- 
dix and  this  section  contains  only  the  quantitative 
results. 

Figures  1, 3, 5,  7,  and  9 are,  in  order,  the  linear  sen- 
sitivity for  separations  in  the  tasseled  cap’s  greenness 
direction,  the  cosine  of  the  angle  between  the 
transformed  brightness  and  greenness  directions 
(which  are  at  right  angles  in  data  space),  the 
chromatic  expansion  ratio,  and  the  JND  ellipses  for 
Product  I.  Figures  2, 4, 6, 8,  and  10  are  similar  quan- 
tities for  Product  3.  The  axes  are  the  tasseled  cap's 
brightness  and  greenness.  The  curvilinear  outline  is 
the  envelope  of  the  scatter  plot  of  the  data;  the  full 
scatter  of  the  data  is  fairly  well  confined  close  to  the 
plane  of  the  figure  and  within  the  outline.  The  rhom- 
bohedral  outline  shows  the  limits  reachable  by  the 
PFC  without  saturation. 

Figures  I to  4 show  that  Product  I has  a noticeably 
higher  linear  sensitivity  than  does  Product  3.  (Con- 
tour lines  are  marked  with  the  values  of  the  linear 
sensitivity  ratio.)  Significant  portions  of  the  data  fall 
outside  the  saturation  limits  for  Product  1;  Product  3 
does  not  saturate  high-brightness/low-greenness  pix- 
els, but  the  upper  end  of  the  green  arm  nonetheless 
saturates.  A common  feature  of  figures  1 to  4 is  the 
presence  of  a marked  variation  in  the  linear  sen- 
sitivity across  the  scatter  plot,  with  higher  values  of 
sensitivity  toward  larger  values  of  the  tasseled  cap’s 
brightness.  Linear  sensitivity,  however,  does  not  tell 
the  whole  story,  as  begins  to  become  apparent  in  the 
next  two  figures. 

Figures  5 and  6 give  the  cosine  of  the  angle  be- 
tween transformed  unit  vectors  that  in  data  space 
parallel  the  tasseled  cap's  brightness  and  greenness 
directions  (and  hence  are  perpendicular  in  data 
space) . For  Product  I , there  is  a strong  antialinement 
between  the  transformed  unit  vectors.  As  a result, 
some  directions  of  separation  in  data  space  are  newly 
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indistinguishable,  whereas  data  space  separations  at 
right  angles  to  the  nearly  indistinguishable  direction 
are  highly  visible.  The  linear  sensitivity  is  thus  a 
function  of  the  direction  of  separation  in  data  space, 
and  determination  of  the  sensitivity  in  only  two 
directions  does  not  describe  the  full  situation.  (It 


FIGURE  I.— Product  1 brightness  sensitivity.  Contour  values 
are  expressed  in  colors  per  count. 


FIGURE  2.— Product  3 brightness  sensitivity.  Contour  values 
are  expressed  in  colors  per  count. 


takes  three  parameters  to  describe  an  ellipse — 
eccentricity,  rotation,  and  mqjir  axis,  for  example.) 
Incidentally,  an  orthogonal  transformation  would 
have  not  only  cos  9 - 0 but  als  t equal  linear  sen- 
sitivity in  all  directions. 

Figures  7 and  8 give  an  idea  of  th.;  budgeting  of  the 


FIGURE  3.— Product  1 greenness  sensitivity.  Contour  values  are 
expressed  in  colors  pe-  ount. 


FIGURE  4.— Product  3 greenness  sensitivity.  Contour  values  are 
expressed  in  colors  per  count. 


891 


available  discrimmability  mass  in  the  data  space.  The 
fact  that  the  saturation  boundaries  more  closely  con* 
form  to  the  data  distribution  for  Product  1 im- 
mediately indicates  that  less  of  the  discriminability 
mass  is  allocated  to  portions  of  data  space  actually 
unoccupied  by  pixels;  that  is  confirmed  by  these  two 


BRIGHTNESS 


FIGURE  5.— Cosine  of  the  ingle  between  the  transformed 
brightness  and  greenness  directions  for  Product  I. 


BRIGHTNESS 


FIGURE  6.— Cosine  of  the  angle  between  the  transformed 
brightness  and  greenness  directions  for  Product  3. 


figures.  Both  products  concentrate  the  dis- 
criminability mass  toward  higher  tasseled  cap  bright- 
ness rather  than  toward  regions  rationally  chosen  as 
being  relevant  to  an  image  analysis  problem  (such  as 
catching  the  motion  of  a pixel  off  the  soil  line  as  it 
begins  to  green  up). 


BRIGHTNESS 

FIGURE  7. — Product  I discriminability  mass  distribution.  Con- 
tour values  ate  expressed  in  squire  colors  per  square  count. 


BRIGHTNESS 

FIGURE  I.— Product  3 discriminability  mass  distribution.  Con- 
tour values  are  espressed  In  square  colors  per  square  count. 
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The  discriminability  mass  distribution  does  not 
tell  the  whole  story,  either.  The  discriminability  is 
not  a scalar;  a given  amount  or  discriminability  mass 
can  be  squashed  out  flat,  with  the  result  that  a large 
amount  of  variation  in  the  directional  characteristics 
of  linear  sensitivity  is  introduced  without  alteration 
of  local  values  of  square  colors  per  square  count.  The 
JND  ellipses,  however,  do  tell  the  whole  story. 

Figures  9 and  10  summarize  all  the  information  in 
the  preceding  eight  figures.  The  ellipses  plotted  in 
these  two  figures  are  the  intersections  of  the  JND 
ellipsoids  with  ti.e  brightness/greenness  data  plane 
(at  the  average  value  of  the  third  and  fourth  tasseled 
cap  parameters,  yellowness  and  nonesuch).  The  dis- 
tance from  the  center  of  an  ellipse  to  the  curve  in  a 
particular  direction  is  how  far  one  moves  a pixel  in 
data  space  before  barely  being  able  to  perceive  the 
color  change— again,  under  the  approximation  that 
there  is  a continuous  transformation  between  data 
space  and  color  space.  The  chromatic  expansion  ratio 
is  the  inverse  of  the  area  of  an  ellipse.  The  cosine  of 
the  angle  described  earlier  is  deducible,  but  the 
ellipses  give  all  the  information  contained  in  cos  0, 
and  more.  The  linear  sensitivity  is  the  inverse  of  the 
distance  from  center  to  curve;  therefore.*  all  param- 
eters are  present. 

From  previous  discussion,  desirable  features  of  a 
coums-to-color  transformation  would  be  that  the 
ellipses  are  as  small  as  possible  and  that  their  ec- 
centricity and  orientation  accommodate  realistic  dis- 
crimination problems.  The  portion  of  UCS  space  ac- 
cessible to  any  color-generating  machine  is  limited, 
and  the  problem  of  making  optimal  color  imagery  is 
one  of  budgeting  in  the  allocation  of  the  color  space 
to  the  data  space.  For  example,  if  the  task  is  crop 
identification,  one  does  not  wish  to  spend  color 
volume  discriminating  between  clear  and  turbid 
water.  Until  another  criterion  is  put  onto  quantitative 
basis,  the  author  proposes  that  small  values  of  ec- 
centricity (near-circular)  are  desirable  because 
analysts  arc  accustomed  to  looking  at  scatter  plots  in 
unsealed  coordinates;  having  equal-sized  circles  for 
the  JND  ellipses  gives  analysts  the  same  visual 
perspective  on  the  data.  In  an  example  of  a rationale 
for  other  than  small,  low-eccentricity  ellipses,  for 
communication  between  an  analyst  and  a computer 
doing  classification  on  the  basis  of  Mahalanobis  dis- 
tance (ref.  6),  the  JND  ellipses  giving  the  computer 
and  the  analyst  the  same  perspective  would  be  alined 
with  the  Mahalanobis  ellipses  and  given  the  same 
eccentricity. 

The  major  differences  in  the  JND  ellipse  behavior 


of  Products  I and  3 are  that  the  ellipses  for  Product  3 
are  typically  larger  in  area  and  more  nearly  circular; 
i.e.,  there  is  a trade-off  between  chromatic  expansion 
ratio  and  orthogonality  in  the  two  products.  The 
JND  ellipses  for  Product  I are  very  elongated  in 
parts  of  data  space,  with  the  major  axis  extending  in 


BRIGHTNESS 

FIGURE  10.— Product  3 JND  ellipses  (1.5  x). 
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the  direction  or  very  poor  linear  sensitivity  men- 
tioned earlier.  The  motion  of  a pixel  ofT  the  soil  line 
is  in  a direction  to  be  highly  visible,  being  in  a direc- 
tion of  increasing  greenness  and  decreasing  bright- 
ness. The  ellipses  are  so  alined  that  the  moved  pixel 
cannot  be  distinguished  from  a pixel  of  lower  initial 
brightness  that  is  still  lying  on  the  soil  line.  That 
comment  is  true  of  both  products,  but  more  strongly 
so  of  Product  I.  Both  products  also  show  a variation 
in  size  and  orientation  of  the  1ND  ellipses  over  the 
scatter  plot.  This  variation  is  a consequence  of  the 
non-lICS  behavior  of  the  PFC. 

The  preceding  discussion  is  in  terms  of  a con- 
tinuous relationship  between  Landsat  MSS  count 
space  and  the  three-dimensional  UCS  space.  In 
specializing  to  the  many-to-one  count  vector-to-coior 
relationship  introduced  by  the  truncation  to  16  levels 
per  channel,  the  ellipses  will  be  replaced  by  rec- 
tilinear cells.  Their  general  alinement  and  other 
features  will  follow  the  trends  shown  by  the  ellipses. 


IMAGE-TO-IMAQE  COLOR  STABILITY 

Both  Product  l and  Product  3 perform  manipula- 
tions on  data  vectors  before  sending  them  off  to  the 
PFC  for  conversion  to  color.  The  attempt  is  to  stan- 
dardize between  images  against  data  variations  from 
Sun  angle  etc.  Product  I subtracts  a bias  vector  and 
scales  each  component  independently;  Product  3 
subtracts  no  bias  and  applies  to  the  channels  scale 
factors  that  derive  from  a single  parameter.  Thus, 
from  image  to  image.  Product  1 modifies  a data  vec- 
tor in  both  direction  and  magnitude,  whereas  Pro- 
duct 3 essentially  modifies  only  its  magnitude.  (Note 
that  by  subtracting  a bias  vector  parallel  to  the  data 
vector.  Product  3 could  have  had  a bias  included  that 
would  have  left  the  direction  of  the  data  vector 
unchanged.  The  result  would  have  been  increased 
chromatic  expansion.)  Product  3’s  stated  goal  was 
chromatic  fidelity;  by  a standardizing  transformation 
that  left  the  data  direction  unchanged,  it  is  implicit 
that  the  PFC  was  regarded  to  have  some  sort  of  con- 
stant chromatic  behavior,  as  an  input  vector  is  being 
changed  only  in  magnitude.  Product  3’s  data  stan- 
dardization comes  much  closer  to  chromatic  consis- 
tency than  does  Product  l’s  because  of  the 
unchanged  direction  of  the  modified  vector. 

Along  a radial  path  from  the  origin  in  the  PFC  in- 
put cube,  a constant  ratio  of  counts  exists  among  the 
activated  channels.  If  the  PFC  were  linearized  in 
transmission  rather  than  in  transmission  density,  the 


radial  paths  would  have  constant  hue  and  saturation. 
Because  of  the  logarithmic  relationship  between  in- 
put counts  and  transmission,  however,  paths  of  con- 
stant hue  and  saturation  are  curved.  Perhaps  the 
analyst  tends  to  compensate  for  the  curve,  but  this 
has  not  been  investigated. 

Because  Product  3 is  governed  by  a single 
parameter,  one  expects  that  there  will  be  color  con- 
sistency for  a given  data  vector  in  images  of  scenes 
having  similar  statistics.  Indeed,  this  trend  has  been 
reported  for  scenes  having  similar  tasseled  cap 
brightness  (R.  Cicone,  private  communication). 
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Appendix 

Equations  for  Mapping  Prom  MSS  Data  Space 

to  Color  Space 


The  mapping  from  Landsat  data  to  PFC  input  is 
as  follows.  The  subscript  R refers  to  the  red  PFC 
channel;  the  subscript  G,  to  green;  and  the  subscript 
B,  to  blue.  Numerical  subscripts  refer  to  MSS  chan* 
nels  1 to  4.  Means  are  indicated  by/*;  standard  devia* 
tions,  by  o-. 

Product  I 


with  L)’J  being  the  largest  integer  n y. 

The  colorimetric  model2  relates  the  CIE  tri- 
stimulus  values 


ut  = /*/  + 3a,  / = 1 ,2,4  to  the  PFC  input  count  vector  T 

l{  s nt  - 3 q(  i = 1,2,4  Film  transmission  density: 


Product  3 


ju,  = 2/3  (l.lp,  + u2  + 2ju4) 


«4  = «j/2 

l(  « 0 / = 1,2,4 

Scales  and  biases: 


(7.133 
0 
0 


° ° \( 

7.357  0 ) I 

0 6.929/  \ 


Film  transmission: 

r,  = 10D/  / = 

PFC  channel  activations: 


PFC  channel  counts: 

Cr  = S4*4  + b4 

CG  = 52*2  + *2 
= 5jXj  + by 

where  T is  the  MSS  count  vector.  Truncation  to  16 
levels  follows: 


ai 


T± 

Tj,max 


where  rJ  max  is  obtained  from  Cy  — 255. 


Tristimulus  values: 
_ /X\  ( 13.53 

^3  My)  = [ 10.86 

\Z / \46.13 


35.90  39.74  \ / aB\ 

47.09  32.04  j [ aQ  ) 

6.646  .09343/  \aRJ 


*i  -[!] x 16 


i « R,G,B 


^Richard  D.  Juday,  “Colorimetric  Principles  As  Applied  to 
Multichannel  Imagery,"  Master's  thesis,  to  be  published. 
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Hie  illumination  source  was  found  to  be 


The  particular  scene  analyzed  had  values 


89.18  \ 

100  ) 
k 52.89  / 

Reference  3 gives  the  UCS  approximation: 


L*  - 25 


a*  = 500 


b*  = 200 


The  Landsat  count  vector 


is  obtained  from  the  tasseled  cap  values 


<KY>  n -11.6 
<KN>  = -0.38 


which  suffice  to  relate  values  of 


to  values  of  ( L*a*b *). 


-/an 


with  /(•)  being  invertible. 

The  linear  sensitivity  LS  in  the  direction  T(in  the 
tasseled  cap  plane)  is  the  limit  as  |T  | becomes  small 
of 


LS  = 


y 


by  (ref.  7): 


0.33231 

-0.28317 

-0.89952 

-0.01 594\ 

/KB 
I KG 
l KY 
\KN 

.60316 

-.66006 

.42830 

.13068  ) 

.26278 

.38833 

-.04080 

.88232/ 
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where 


The  discriminability  mass  distribution  is  then 


dr-/(r+T)  -/(*) 

Note  that  LS  is  generally  dependent  on  both  X and 
TP.  LS  is  plotted  in  figures  1 to  4. 

Obtaining  LS  for  paralleling  KB  (giving  A*j\ 
paralleling  KG  (giving  A<^,  and  the  intermediate 
direction  (giving  Ac^  yields  the  cosine  of  the 
angle  between  A7J  and  \e$. 


A?j  = ATj  + Ac^ 

IaTjP  = |a?,|2  + |a?2|2  + 2|a?,|  | aTj | cos 0 


Ac  I Square  colors 

(6  ,|  | 6 jI  [_  Square  count 

which  is  plotted  in  figures  7 and  8. 

The  JND  ellipses  are  plotted  in  the  tasseled  cap 
plane  by  noting  that  for  the  ellipse  centered  at  X,  the 
distances  to  the  ellipse  in  three  properly  chosen 
directions  define  the  ellipse.  Along  the  KB  axis,  the 
distance  is  /,  - 1/LS,;  along  KG,  t2  - l/LSj;  and  in 
the  intermediate  direction,  r3  -*  I/LS3.  Solving  for 
the  coefficients  of  the  general  centered  conic  equa- 
tion 


Ax2  + Bxy  + Cy2  + F - 0 


The  values  of  cos  0 are  shown  in  figures  5 and  6.  by  means  of  the  determinant  equation 

The  area  of  the  parallelogram  having  sides  Acj* 
and  Afj  (and  diagonal  A?j)  is 

Ac  = 2-yJg(g  - LSl}  (g  - LS2)(g  - LS3 ) 
where 

g = 1/2(15,  + LS2  + £53)  results  in 


X2  XY  Y 2 1 

V Vi  > i!  i 

x22  x2y2  y22  1 

xi2  *,r,  r,'-  > 


X2  (i5,2)  + XY(LS2  - LS2  - LS2) 

t Y2  (LS22)  -1  = 0 


These  are  the  ellipses  plotted  in  figures  9 and  10. 
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Generation  of  Uniform  Chromaticlty  Seale  Imagery 

From  Landaat  Data 


R.  D.  Juday,aF.  Johnson ,a  R.A.  Abotteen ,b  andM.  D.  Poreb 


INTRODUCTION 

This  paper  documents  a method  used  for  generat- 
ing uniform  chromaticity  scale  (UCS)  imagery  from 
Landsat  data.  A previous  study  (ref.  1)  was  made  to 
map  multichannel  Landsat  data  into  color  space 
using  a maximal  chromatic  expansion  method.  This 
study  extends  the  work  of  reference  1 to  the  use  of  a 
UCS  in  the  form  of  color  Him  products.  (Familiarity 
of  the  reader  with  standard  colorimetric  no- 
menclature is  assumed;  references  2 and  3 are  recom- 
mended for  the  novice.) 

The  motivation  behind  generating  UCS  imagery 
from  Landsat  data  is  uniform,  controlled  percep- 
tibility of  color  difference  caused  by  differences  in 
Landsat  data  vectors.  One  can  move  a certain  dis- 
tance around  any  color  center  before  noticing  the 
difference  between  the  color  center  and  the  trans- 
lated point.  The  distance  one  can  move  depends  on 
the  direction  of  the  motion  and  on  the  location  of  the 
color  center  itself.  The  generally  ellipsoidal  surface 
surrounding  the  color  center  is  one  step  in  percep- 
tibility from  the  color  center.  In  the  nonlinear 
transformation  to  a UCS  space,  the  ellipsoids  become 
spheres  with  the  same  radius  at  all  color  centers.  In 
this  circumstance,  the  color  difference  is  simply  the 
length  of  the  straight  line  connecting  the  points  for 
which  color  difference  is  desired. 

An  orthogonal  mapping  from  three-dimensional 
feature  space  into  color  space  will  have  the  property 
that  visual  discriminability  of  picture  elements  (pix- 
els) in  the  resulting  imagery  will  be  in  direct  propor- 
tion to  the  Euclidean  distance  between  the  pixels  in 
the  feature  space.  Various  methods  of  dimen- 
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sionality  reduction  (i.e.,  color  space  is  three-dimen- 
sional, whereas  muitispectral  and  possibly  multitem- 
poral scanner  data  have  many  dimensions)  and 
alinement  of  feature  space  to  color  space  are  possi- 
ble. 

One  of  these  transformations  was  used  to  generate 
the  UCS  imagery  from  the  Landsat  data  reported 
herein.  A description  of  this  transformation  is  given 
in  the  following  section.  Formulation  of  the 
algorithm  used  for  generating  UCS  imagery  is  pre- 
sented in  the  third  section.  Conclusions  and  recom- 
mendations of  this  study  can  be  found  in  the  final 
section. 


UCS  TRANSFORMATION  USED 
IN  THIS  STUDY 

The  (L*a*b*)  UCS  transformation  (ref.  4)  was 
used  in  this  report.  In  a UCS  space,  a Euclidean 
metric  is  proportional  (approximately)  to  percep- 
tibility of  color  difference.  (See  reference  2 for  a 
description  of  some  color-difference  formulas  and  a 
more  thorough  exposition  of  the  subject.)  The 
(L*a*b*)  transformation  was  adopted  by  the  Com- 
mission Internationale  de  l’Eclairage  (CIE)  (the  In- 
ternational Illumination  Committee)  for  evaluation; 
a final  selection  of  a universal  UCS  approximation 
has  not  been  made. 

Let  X,  Y,  and  Zbe  the  CIE  tristimulus  values;  the 
UCS  space  is  generated  from  the  tristimulus  values 
as  follows. 


l 

L • » 25^100^  3 - 16;  1 < / < 100  (1) 
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FORMULATION  OF  THE  UCt  ALGORITHM 


where  Xq,  Y0,  and  Zq  define  the  color  of  the 
nominally  white  object-color  stimulus;  i.e.,  the  il- 
lumination spectrum,  taken  here  as  that  of  the  light 
table  illuminating  the  transparencies.  The  L* coordi- 
nate may  be  thought  of  as  the  lightness  or  brightness 
of  a color,  <i*as  the  green/magenta  balance,  and  6*as 
the  yellow/blue  balance. 

Slices  of  the  (1*0*6*)  space  (at  constant  values  of 
<0  that  are  accessible  to  the  color  “guns"  of  the  pro- 
duction film  converter  (PFC)  were  generated.  Plots 
of  some  of  those  slices  are  shown  in  figure  1 for  a* 
values  of  —10,  -5, 0,  5,  and  10.  In  these  plots,  an 
edge  marked  B — 0 indicates  that  the  blue  gun 
count  of  the  PFC  is  0 at  that  edge.  An  edge  marked 
R — 1 indicates  that  the  red  gun  count  of  the  PFC 
is  2SS  at  that  edge.  A point  in  the  enclosed  area  of 
any  of  the  plots  in  figure  1 corresponds  to  red  (R), 
blue  (B),  and  green  (G)  PFC  gun  counts  between  0 
and  255.  A PFC  product  of  the  ( L*a*b *)  space,  sliced 
at  a*  « 10,  is  shown  in  figure  2.  Formulas  used  for 
calculating  gun  counts  in  figure  2 can  be  found  in  the 
following  section. 

Other  UCS  transformations  exist  in  the  literature 
(refs.  5 and  6).  These  transformations  might  be  con- 
sidered in  later  research.  The  (L*a*6*)  space  was 
chosen  over  those  discussed  in  references  5 and  6 for 
the  following  reasons. 

1.  The  ( L*a*b *)  space  has  been  adopted  by  the 
CIE  for  evaluation  as  an  approximation  to  a true 
UCS  space  for  the  object-color  solid. 

2.  Inversion  from  the  (L*a*6*)  UCS  approxima- 
tion into  the  CIE  ( X , T,  Z)  system  is  mathematically 
tractable.  According  to  D.  L.  MacAdam  (private 
communication),  the  (L,  J,  g)  system  (ref.  5)  is 
philosophically  preferable  to  the  (Z.*a*6*)  system  for 
this  application.  However,  there  are  some  considera- 
tions, such  as  the  incorporation  of  Semmelroth's 
crispening  factor  (ref.  7),  that  create  difficulties  in 
using  the  (L,y,  g)  system. 


In  generating  UCS  imagery,  the  light-table  and 
film  parameters  are  considered.  The  transformation 
from  PFC  counts  to  transmission  values 
(wavelength  dependent)  is  regarded  as  stable.  The 
primary  colors  considered  in  this  study  are  red  (R), 
green  (G),  and  blue  (B).  This  section  is  divided  into 
three  parts.  The  first  part  deals  with  calculating  the 
primary  tristimulus  values  for  light  table  and  film. 
The  second  part  deals  with  fitting  a prototype  Land- 
sat  data  structure  into  the  (1*0*6*)  UCS  space.  The 
third  part  presents  the  algorithm  that  generates  color 
gun  counts  from  (L*o*6*)  values  for  PFC  product 
along  with  four  UCS  images  of  a LACIE  segment. 


Colorimetric  Description  of  the 
Transparency  Generation  Process 

The  PFC  images  a cathode-ray  tube  (CRT) 
through  color  filtration  onto  color  reversal  film, 
which  is  then  developed.  The  CRT  display  is  con- 
trolled by  numerical  input;  for  each  color,  the  film 
density  is  very  nearly  linear  with  respect  to  input 
counts.  The  counts-to-density  relationship  is 
carefully  maintained  in  exposing  and  developing  the 
film,  and  the  relationship  is  herein  regarded  as  stable. 

An  approximation  is  made  that  the  manner  of  ad- 
dition of  sequential  images  can  be  expressed  in  the 
form 


X - £ X,  (4) 

I 


where  A' is  the  first  tristimulus  value  and  / is  an  index 
for  the  red,  green,  and  blue  PFC  primaries.  The  same 
approximation  is  made  for  Y and  Z,  assuming  that 
the  film  system  produces  the  same  final  result  as 
would  be  obtained  by  separation  images  projectively 
added.  The  approach  avoids  certain  practical  prob- 
lems such  as  interimage  effects  and  reciprocity 
failure  (ref.  8).  The  complete  colorimetric  model  of 
the  PFC  is  given  in  an  unpublished  thesis.1  For  the 


*R.  D.  Juda y.  “Colorimetric  Principles  as  Applied  to 
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purposes  of  the  present  piper,  assume  that  a known 
one-to-one  relationahip  exists  between  the  (L*a*b*) 
apeoe  and  the  aet  of  three  input  commands  for  the 
PFC. 


Pitting  Lendeat  Data  Space 
Into  the  UCt  Space 

Two  multitemporal  LACIE  segmenu  were  used 
to  generate  scatter  plots.  Hie  Landsat  data  in  each 
unitemporai  segment  were  routed  using  the  tasseled 
cap  transformation  (ref.  9)  with  a bias.  The  bright- 
ness, greenness,  and  yellowness  biases  were  taken  to 
beO,  30,  and  S3  counts,  respectively.  The  scatter  plots 
of  Landsat  dau  in  the  (K^Kg)  space  for  segments 
1618  and  164S,  respectively,  are  shown  in  figures  3 
and  4.  Figures  S and  6 are  scatter  plots  of  the  data  in 
the  (KfrKJ  space.  The  scatter  plots  in  figures  3 and  4 
were  usea  to  produce  an  overall  Landsat 
dau  space  scatter  envelope.  This  dau  space  scatter 
envelope  contains  the  raw  £4  and  Kt  counts;  i.e.,  the 
bias  was  removed  and  was  fitted  into  the  (L'b0)  DCS 
space  as  shown  in  figure  7.  The  UCS  space,  sliced  at 
0*  - 10  (fig.  2),  was  used  because  it  had  the  best  fit 
to  the  envelope. 

According  to  color  theory,  there  is  no  preferential 
orienution  between  dau  space  and  the  UCS  space. 
For  subjective  reasons,  however,  the  brightness  of 
the  data  space  was  alined  with  L*  the  lightness  direc- 
tion. Poinu  on  the  pound  that  would  appear  lighter 
in  conventional  aerial  photographs  will  tend  to  ap- 
pear lighter  here  also.  The  portion  of  color  space  ac- 
cessible to  the  PFC  has  a greater  extent  in  (fb0) 
than  in  (fa0),  and  the  first  two  tasseled  cap  compo- 
nents have  the  greatest  variance.  Thus,  the  first  two 
components  were  laid  into  (L*b*). 

The  component  Ky  is  taken  parallel  to  0*  to  com- 
plete the  orthogonal  relationship  between  the  UCS 
space  and  the  first  three  tasseled  cap  components. 
The  two  poinu  p,  and  pj  shown  in  figure  7 were  used 
to  calculate  the  transformation  from  Landsat 
(Kb,Kt,Ky)  dau  space  into  (fa*b*)\ 


u - Ft  + 5 (5) 


where  u - a column  vector  of  L*.  b*.  and  o* 

• — \Kf,Kb,KX,  and  e'  is  the  transpose  of  • 
8 "* 
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biological  stages  of  small  grains.  Figures  3 and  4,  with 
figure  7,  provide  an  interpretive  key  for  the  images  in 
figure  8.  Small-grains  fields  in  biostage  1 apt  v to 
have  a blue  color  (low  greenness  value  Ag|),  where 
they  change  from  light  yellow  orange  > Kg> ) to 
a darker  yellow  orange  {Kg}  > K.  ) in  going  from 
biostage  2 to  3.  In  biostage  4 (Kgf  < Kgj ),  the  small- 
grains  fields  appear  to  have  a pinkish  color.  This  is  in 
agreement  with  how  the  data  were  fit  into  the  UCS 
space  as  shown  in  figure  7.  The  pinkish  cut  shows  an 
increase  in  the  yellow  component;  Ky  is  scaled 
toward  positive  a\  which  is  in  the  general  direction 
of  the  PFC  red  gun. 


CONCLUSIONS  AND  RECOMMENDATIONS 

An  algorithm  for  generating  uniform  chromaticity 
scale  imagery  from  Landsat  data  hu  been  presented. 
A computer  program  wu  written  to  implement  the 
algorithm,  and  UCS  film  products  were  generated. 
The  colors  in  the  film  and  their  temporal  change  are 
consistent  with  those  expected  for  the  particular  scal- 
ing of  Kauth  components  into  the  (L*a*h*)  color 
space.  The  UCS  film  product  hu  not  been  subjected 
to  the  practical  tut  of  competing  with  previous 
transformations.  In  that  competition  (to  be  done  out- 
side the  purview  of  this  report),  the  philosophically 
satisfying  notion  of  transforming  Landut  data  so 
that  a one-count  difference  is  equally  perceptible  at 
all  locations  in  data  space  will  be  tuted. 

The  authors  recommend  that  analyst-interpreters 
tut  the  UCS  imagery  using  a variety  of  LACIE  seg- 
ments. Preliminary  examination  indicatu  that  the 
UCS  product  offers  the  following  pouibilitiu. 

1.  A single  film  product  that  will  supplant  two 
film  products  in  current  use 

2.  Improved  visibility  of  data  differencu  in 
regions  in  data  space  that  are  critical  to  crop  iden- 
tification 


3.  An  analytic  route  to  the  determination  of  data- 
space  transformations  that  will  be  optimal  for  partic- 
ular discrimination  problems— Tor  example,  in 
another  project,  the  transformation  hu  been  used  to 
display  water  bodiu  in  Landsat  data,  with  encourag- 
ing results. 
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FIGURE  2. — A PFC  product  of  the  ( l.  'a 'b  *)  space  sliced  at  a*  = III. 
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Image  and  Numerical  Dlaplay  Alda 
for  Manual  Interpretation 

R.A.Aboiteen0 


SUMMARY 

In  the  Large  Area  Crop  Inventory  Experiment, 
image  and  numerical  display  aids  for  manual  in- 
terpretation were  produced  to  assist  in  selecting 
and/or  identifying  representative  samples  of  sig- 
natures in  a given  Landsat  scene.  Four  methods  for 
producing  numerical  display  aids  were  developed 
and  are  discussed  in  this  paper.  The  four  methods 
employed  are  clustering  techniques,  data  compres- 
sion, phenological  growth  pattern  extraction,  and  ag- 
gregation of  like  spectral  data  on  a two-dimensional 
spectral  plot. 


INTRODUCTION 

Image  interpretation  is  an  important  method  for 
acquiring  training  data  for  classification  of  Landsat 
images  in  the  LACIE  (ref.  1).  Interpreting  a scene 
for  classification  requires  that  training  samples  of  all 
spectral  signatures  in  the  given  scene  be  selected  and 
correctly  labeled.  This  procedure  becomes  especially 
difficult  when  multiple  passes  over  a scene  are  to  be 
interpreted.  The  variation  of  the  spectral  signatures, 
in  a multitemporal  sense,  makes  it  difficult  to  select 
and  identify  all  the  various  signatures  in  a scene.  To 
address  these  problems,  four  types  of  image  and 
numerical  display  aids  were  developed. 

The  first  and  the  second  image  interpretation  aids 
were  obtained  by  applying  nonsupervised  pattern 
recognition  techniques  (clustering)  and  data  com- 
pression. The  clustering  method  identifies  the  in- 
herent classes  in  the  scene.  Color  film  is  generated 
from  the  cluster  image,  with  each  cluster  having  a 
distinct  color;  the  color  corresponds  to  the  value  of 
the  cluster  mean  (ref.  2).  In  interpreting  multipass 
data,  a principal  component  (PCOMP)  transforma- 
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tion  is  applied  to  the  cluster  image.  By  this  means, 
the  multitemporal  spectral  variation  of  the  scene  is 
compressed,  or  summarized,  into  a three-dimen- 
sional image  which  can  be  displayed  as  a color  image. 
To  track  phenological  growth  patterns  of  various 
crops,  the  principal  component  greenness  (PCG) 
transformation  was  introduced  as  a third  aid.  The 
PCG  transformation  maps  each  of  n multitemporal 
acquisitions  onto  the  greenness  axis  and  then  com- 
presses these  ff  greenness  channels  into  three  new 
channels  using  a linear  combination  (i.e.,  the  first 
three  principal  components)  of  the  n green  channels. 

To  enable  the  analyst  to  view  the  structure  of  the 
Landsat  data  in  spectral  space,  a two-dimensional 
spectral  plot  of  the  data  was  developed  as  the  fourth 
aid.  The  spectral  plot  (ref.  2)  takes  advantage  of  the 
inherent  two-dimensionality  of  Landsat  data  (ref.  3). 
The  plots  are  constructed  to  assist  in  relating  picture 
elements  (pixels)  in  the  scene  to  their  locations  on 
the  spectral  ptot. 


CLUSTER  IMAGE 

1* 

i*  A cluster  image  is  generated  first  by  clustering  the 
data  in  the  scene  and  then  by  replacing  each  data 
sample  with  the  mean  of  the  cluster  to  which  it 
belongs.  A simulated  color-infrared  (CIR)  film  of 
the  cluster  image  can  be  generated  by  a production 
film  converter  (PFC).  One  of  the  main  features  of 
the  cluster  image  is  that  two  spectrally  similar 
clusters  are  shown  by  the  film  product  to  have  simi- 
lar colors.  This  feature  could  easily  be  lost  if  an  ar- 
bitrary color  assignment  were  used  to  generate  a film 
product  from  the  one-dimensional  cluster  map,  as 
shown  in  figure  1(a).  A CIR  film  product  of  the 
cluster  image  is  normally  generated  using  the  same 
gain  and  bias  as  that  used  to  produce  the  original  CIR 
image.  This  technique  results  in  a CIR  film  product 
of  the  cluster  image  that  resembles  the  standard  CIR 
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FIM  KF  1.— Fight-channrI  cluster  images  of  segment  1576.  Uric-aster  County.  Nebraska.  (a)  ( luster  map:  arbitrary  color  assignment, 
(hi  ( IK  cluster  image;  channels  I.  2.  and  4.  (cl  ( IK  cluster  image;  channels  5,  6.  and  N.  (d)  l’(  OMP  duster  image:  first  three  prin- 
cipal components. 


film  product,  as  shown  in  figures  1(b)  and  1(c). 

A color  key  to  (he  clusters  is  generated  by  assign- 
ing a square  of  100  samples  (pixels)  to  each  cluster 
Each  pixel  in  the  square  is  then  assigned  the  value  of 
the  cluster  mean  it  represents  The  color  keys  are 
then  ordered  according  to  the  Kauth  greenness  num- 
ber (ref.  3). 

It  was  discovered  from  observing  cluster  images 
on  C1R  him  that  they  can  he  used  as  aids  in  defining 
spectral  classes  and  thus  in  standardizing  the  image 
interpretation  procedure.  In  addition,  an  increase  in 
the  conuast  of  adjacent  fields  is  apparent,  which 
assists  in  the  delineation  of  training  fields 


PCOMP  CLUSTER  IMAGE 

A PCOMP  duster  image  of  a scene  is  generated  by 
applying  principal  component  transformation  to  the 
cluster  image  described  in  the  preceding  section 
figure  l(di  is  an  example  of  a PCOMP  cluster  image 
Ready  and  Wintz  (ref.  4)  have  shown  that  the 
PCOMP  transformation  applied  to  aircraft*  and 
satellite-gathered  multispectral  data  is  ve.\  useful  for 
information  extraction,  since  the  first  few  principal 
components  contain  essentially  all  the  information 
present  in  the  original  spectral  bands.  Additional 
analyses  of  PC'OMP-transformed  Landsat  data  are 
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available  (ref.  S). 

The  PCOMP  transformation  is 

Y ■ MX  (1) 


where  X - a vector  of  n spectral  intensities  associ- 
ated with  each  pixel 

M ™ an  n by  n unitary  matrix  derived  from 
the  mixture  covariance  matrix  ix  of 
the  spectral  bands  such  that  the  rows 
of  M are  the  normalized  eigenvectors 
ofl* 

Y * a vector  of  n PCOMP’s 


The  covariance  matrix  of  the  PCOMP* 
transformed  data  then  becomes 


transformation,  most  of  the  data  variance  is  concen- 
trated in  the  first  few  PCOMP's.  It  was  observed 
that,  for  landsat  data  in  four  channels,  Xj  + X2  con- 
tained approximately  96  percent  of  the  total  data 
variance.  In  eight  channels  of  Landsat  data, 
X,  + X2  + Aj  contained  approximately  91  percent 
of  the  total  data  variance.  For  16-channel  Landsat 
data,  A | + X2  + X3  contained  approximately  82 
percent  of  the  total  date  variance. 

The  interest  in  the  first  three  PCOMP's  of  Land- 
sat data  arises  from  processing  color  film  products.  A 
PFC  generating  a color  product  of  Landsat  data  uses 
three  channels,  with  each  channel  assigned  to  the 
blue,  green,  or  red  film  converter  gun.  The  first  three 
PCOMP's  are  associated  with  these  three  channels  to 
produce  the  PCOMP  cluster  image.  Since  the  first 
three  PCOMP's  contain  most  of  the  data  variance, 
the  PCOMP  cluster  image  seems  to  be  a good  means 
of  compressing  multitemporal  data. 

Once  the  PCOMP  transformation  has  been  ap- 
plied to  both  the  cluster  image  and  the  color  keys,  the 
transformed  data  are  rescaled  to  lie  between  0 and 
2SS  to  allow  storage  of  the  image  in  a standard  com- 
puter format. 


where  A/r  is  the  transpose  of  M and  X,,  X2,  . . . , 
(the  variances  of  the  PCOMP’s)  are  the  eigenvalues 
of  2*  ordered  so  that  X,  > X2  > . . . > A„. 

Since  M is  a unitary  matrix,  the  PCOMP  transfor- 
mation preserves  the  total  data  variance;  i.e., 

E °x,  ‘ i \ (3) 

i=i  1 i~ t 

where  the  values  for  o-vare  the  variances  of  the  orig- 
inal spectral  bands.  Note  that  in  the  PCOMP 


PRINCIPAL  COMPONENT 
QREENNE88  IMAGE 

In  developing  the  PCG  image,  it  is  assumed  that  a 
temporal  change  in  the  Kauth  greenness  (ref.  3)  of 
an  agricultural  crop  is  an  indicator  of  a phenological 
change  in  that  crop.  The  PCG  transformation  is  for- 
mulated as  follows.  Let  n be  the  number  of  multiple, 
registered  Landsat  passes  over  a segment.  Let  Z be 
an  iV-dimensional  feature  vector  drawn  from  the  seg- 
ment image,  where  N — An  and  the  segment  image 
is  composed  of  passes  1,2 n.  The  feature  vec- 

tor Z is  mapped  onto  the  Kauth  greenness  feature 
vector  G using  the  following  transformation: 
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where  v,,  v^,  V3,  and  v4  are  entries  from  the  second 
row  of  the  Kauth  transformation  matrix.  Note  that 
A — (tty)  is  an  n by  N matrix  such  that  for  / — 1 


atk  " vk  when  * " 1 . 2, 3, 4 
a(k  • 0 when  A-  ■ 5.  6.....JV 


(5) 


For  / - 2, 3 «,  lety  ■ 4(/  - 1);  then 


■ 0 when  A ■ 1,2 /Vend*  #/  ♦ «;m 

*/.</♦«>  ‘ *'m;m  * 1 . 2. 3. 4 


1.2. 3.4 


(6) 


For  Landsat-1 


rows  of  M are  the  normalized  eigen* 
vectors  of 

P “ a vector  of  » PCG  bands 

The  covariance  matrix  of  the  PCGf*transformed  data 
then  becomes 


where  IP  is  the  transpose  of  f/and  Xg| , Xg< , . . . , Xg< 
(the  variances  of  the  PCG  bands)  are  the  eigenvalues 
of  Icordered  such  that  kg[  >kg}  > ...  > k^. 


V1 

-0.29o" 

v2 

B 

-0.562 

v3 

0.600 

v4 

0.491 

m m 

m m 

and  for  Landsat-2 


vl 

’-0.2832 ' 

v2 

-0.6601 

v3 

0.5774 

v4 

0.3883 

• « 

The  PCG  transformation  is 

P ■ UQ  (9) 

where  G “ a vector  of  » greenness  values  associ- 
ated with  each  pixel  and  correspond- 
ing to  the  n passes  as  shown  in  equa- 
tion (4) 

U ■ an  n by  n unitary  matrix  derived  from 
the  mixture  covariance  matrix  iG  of 
the  greenness  bands  such  that  the 


The  PCG  transformation  was  applied  to  10  four- 
pass  small-grains-growing  LACIE  segments  in  the 
,7,  United  States.  The  eigenvalues  and  eigenvector  com- 
' ponents  of  ic  for  the  10  segments  are  shown  in  table 

I.  Table  1 also  contains  the  percentage  of  the  data 
greenness  variability  explained  by  the  first  two  PCG 
bands.  For  most  of  the  segments  in  table  1,  note  that 
XC|  and  kgj  together  contain  more  than  85  percent  of 
the  data  greenness  variability. 

A detailed  graphical  analysis  was  performed  on 
segment  1988,  located  in  Finney  County,  Kansas. 
The  four  Landsat  passes  over  the  segment  were  ac- 
quired on  February  7,  April  18,  May  6,  and  June  12, 
(8)  1976,  which  correspond  to  winter  wheat  biowindows 

1, 2, 2,  and  3,  respectively.  The  four  greenness  bands 
were  obtained  using  equations  (4)  and  (8).  To  main- 
tain the  nonnegativity  of  the  greenness  bands,  a four- 
dimensional  bias  vector  was  added  to  equation  (4). 
All  the  components  of  this  bias  vector  were  taken  to 
be  16  counts.  A plot  of  the  eigenvector  components 
versus  greenness  bands  1,  2,  3,  and  4 is  shown  in 
figure  2.  Figure  3 shows  the  temporal  trajectories  of 
training  field  greenness  means  of  the  segment.  It  is 
noteworthy  that  the  temporal  trajectory  of  the  small- 
grains  field  mean  in  figure  3 resembles  the  first 
eigenvector  plot  in  figure  2,  whereas  the  temporal 
trajectory  of  the  non-small-grains  field  mean  resem- 
bles the  second  eigenvector.  This  correlation  suggests 
that,  for  segment  number  1988,  the  components  of 
the  first  eigenvector  define  a small-grains  trajectory 
and  the  components  of  the  second  eigenvector 
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Table  l.— Eigenvalues  and  Eigenvectors  of  Greenness  Image  Mixture  Covariance  Matrices  for 
10  Four-Pass  Acquisitions  qfSmall-Gralns-GrowIng  LACIE  Segments  in  the  United  States 


Segment  Eigenvalue!  _ 

VVYV 

Flnt 

eigenvector 

Second 

eigenvector 

Third 

eigenvector 

Fourth 

eigenvector 

(kt  + kg  ) 

1988  17031 

0.063 

-0018 

-0.120 

0991 

9287 

22.34 

SOS 

-814 

-.825 

-.136 

14.10 

803 

-830 

349 

.012 

2.06 

.309 

.949 

-OSS 

-.009 

1046  29.14 

-0010 

0.061 

-0.307 

-0.950 

87.08 

23.95 

.283 

.059 

-.911 

.295 

514 

.771 

.579 

859 

-055 

2.03 

-.570 

.811 

-.096 

.089 

1233  85 Jl 

0145 

-0.379 

-0.418 

-0.813 

85.71 

39.28 

.713 

-.408 

.570 

024 

1185 

.562 

.032 

-698 

444 

8.93 

.394 

.83 

116 

-.377 

1506  159.23 

0.048 

-0023 

-0056 

-0.997 

85.93 

83.26 

.965 

.054 

.255 

.031 

3389 

839 

.181 

-.952 

.060 

5.81 

-.096 

.982 

.160 

-.036 

1851  115.45 

0071 

0014 

0090 

0.993 

87.48 

96.86 

793 

.210 

.561 

-.110 

2703 

330 

808 

-.821 

034 

3.36 

-.291 

.955 

.054 

.002 

1967  300.42 

0.012 

0.987 

0.051 

—0.151 

88.59 

72.55 

.801 

.025 

-.598 

.026 

35.34 

.591 

-076 

.780 

-.189 

1269 

.095 

.138 

.156 

.970 

1538  154.40 

0.074 

-0.006 

0622 

0.780 

8862 

39.13 

.906 

-.407 

.030 

- 113 

15.31 

.375 

.698 

-.492 

.362 

9.53 

.182 

.589 

.609 

-.499 

1618  240.59 

0.019 

-0023 

-0.194 

0.981 

81.47 

100.61 

.293 

-.847 

-.431 

-.110 

70.23 

.956 

.264 

.128 

.013 

7.38 

-.008 

.462 

-.872 

-.162 

1655  98.41 

0.381 

0.087 

-0911 

0.135 

79  34 

42.02 

.877 

- 345 

.335 

Oil 

28.77 

.244 

864 

239 

370 

7.69 

.164 

.356 

- 033 

-919 

1694  78.02 

0.325 

0.933 

0.117 

0096 

88.85 

13.26 

.799 

- 198 

- .559 

- 105 

763 

458 

-.226 

.796 

- 326 

3.83 

216 

-.197 

.203 

935 

915 
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tettneni  INS,  Finney  County,  Kmm*. 


define  a non-small-grains  trajectory.  The  third  and 
fourth  eigenvectors  correspond  to  small  eigenvalues 
and  contain  little  information.  Hence,  the  first  and 
second  PCG  bends  would  appear  to  separate  the 
small-grains  field  means  from  the  non-small-grains 
field  means.  This  separation  is  indeed  evident  in 
figure  4,  which  shows  PCG  band  2 versus  PCG  band 


1 of  the  training  field  means.  The  linear  decision 
boundary  drawn  in  figure  4 is  arbitrary.  Figure  4 also 
shows  that  PCG  band  1 alone  is  sufficient  to  separate 
small-grains  field  means  from  non-small-grains  field 
means,  whereas  PCG  band  2 alone  cannot  separate 
the  small-grains  field  means  from  the  non-smalt- 
grains  field  means. 

For  most  segments  in  table  I,  examining  the  first 
and  second  eigenvector  components  indicates  that 
the  components  of  the  first  eigenvector  define  a 
small-grains  trajectory  and  the  components  of  the 
second  eigenvector  define  a non-small-grains  trajec- 
tory. This  is  a favorable  characteristic  of  the  PCG 
transformation  for  applications  in  multitemporal 
agricultural  Landsat  data  analysis. 

The  PCG-transformed  data  were  used  to  generate 
color  film  products  for  several  segments.  Because  of 
its  phenologies!  interpretability,  the  product  gener- 
ated from  the  PCG-transformed  Landsat  data  is 
referred  to  as  a temporal  greenness  interpretations! 
film  (TGIF)  product.  The  color  TGIF  product  ap- 
pears to  be  a very  useful  way  of  representing 
multitemporal  Landsat  agricultural  data.  Also,  the 
color  TGIF  product  summarizes  n color  film  prod- 
ucts and  can  be  used  as  an  image  interpretation  aid. 
Figure  5 illustrates  how  lour  acquisitions  are  sum- 
marized by  the  TGIF  product.  In  generating  the 
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T01F  product,  PCO  bands  1 , 2,  and  3 are  assign  ad  to 
the  red,  green,  and  blue  film  convener  guns,  respec- 
tively. 

An  experiment  was  conducted  to  determine  the 
amount  of  information  retained  in  the  PCO  image:  S 
segments  were  classified  3 times  using  all  16  original 
channels,  the  4 greenness  bands,  and,  Anally,  the 
first  2 PCO  bands.  The  classification  performance  is 
specified  here  in  terms  of  the  probability  of  correct 
classification  (PCO  and  the  estimated  small-grains 
proportion  in  the  segment  compared  with  the 
ground-truth  (GT)  small-grains  proportion.  The 
PGC  here  is  calculated  as  the  average  PCC  of  the 
training  and  test  samples,  which,  recording  to  the 
work  of  Foley  (ref.  6),  provides  quite  a good  estimate 
of  the  actual  error  probability  even  for  a small  num- 
ber of  samples.  For  each  segment,  SO  training  and  60 
test  samples  were  used  to  execute  a classification 
run.  The  classification  performance  for  each  of  the 
fire  LACIE  segments  is  shown  in  table  II.  Table  III 
shows  the  overall  classification  performance  for  the 
fire  segments,  where  the  mean  square  error  (MSE) 
in  smalt-grains  proportion  is  computed  as  follows. 


Let  ftQT)  and  p be  the  GT  and  a classification-esti- 
mated small-graini  proportion,  respectively;  MSE  is 
then 
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The  overall  classification  performance  using  the 
first  2 PCO  bands  is  close  to  the  overall  classification 
performance  using  all  16  original  channels.  This  ob- 
servation shows  that  the  first  two  PCG  bands  con- 
tain most  of  the  information  necessary  to  separate 
small  grains  from  non-small  grains  and  agrees  with 
the  previously  observed  correlation  in  figure  4 for 
segment  1986. 


Flmtjr  Caaaty.  Kama*. 
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KKil  RF  5. — Sixteen-ehannel-generaUd  TGIF  imaec  of segment 
1538.  McCone  County,  Montana,  (a'  April  25,  1976.  (b)  June  18, 
1976.  <c)  July  23,  1976.  (d)  August  11,  1976.  (r)  Tt.lF  image. 
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T.m.t:  II. — Classification  Performance  for  Fi\e  LACIE  Segments 
Using  Various  Methods 
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SPECTRAL  PLOT 

A spectral  plot  of  a Landsat  scene  is  a graph  of  one 
channel  of  the  data  versus  another.  The  spectral  plot 
relates  the  image  space  (i.e.,  the  spatial  domain)  of 
the  PFC  product  to  the  spectral  space  of  the 
classifier.  A schematic  diagram  showing  the  relation- 
ship between  image  space  and  spectral  space  is  given 
in  figure  6.  The  spectral  plot  uses  the  inherent  two- 
dimensionality  of  Landsat  data,  where  most  of  the 
spectral  class  separability  exists  (ref.  3).  For  exam- 
ple, overlaying  the  spectral  plot  of  training  field 
means  over  the  spectral  plot  of  the  scene  provides  a 
quick  view  of  missing  signatures. 

The  axes  used  for  generating  a spectral  plot  may- 
be two  selected  Landsat  channels  or  two  linear  com- 
binations of  channels:  i.e., 

W = tfX  + e (12) 


where  B = a two  by  n transformation  matrix  of 
rank  2 

e — a two  by  one  bias  vector 
W - a two  by  one  vector  of  the  channels  to 
be  plotted 


The  transformation  matrix  B might  be  formed 
from  the  first  two  rows  of  the  Kauth  transformation, 
from  tne  firs,  two  rows  of  M in  equation  (l),or  from 
a linear-combination  feature-selection  algorithm 
(ref.  7).  If  Sis  obtained  from  the  Kauth  transforma- 


TiHl.t  III. — Overall  Classification  Performance  for 
Five  Segments  Using  Various  Methods 
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lion,  then  the  dimension  of  the  data  n must  be  four 
since  the  Kauth  transformation  is  a four  by  four 
matrix. 

A color-coded  spectral  plot  contains  the  locations 
of  the  pixels  and  the  channels  to  be  used  for  coloring 
them  on  the  spectral  plot.  The  location  of  each  pixel 
on  the  spectral  plot  is  computed  using  the  radiance 
values  (or  linear  combinations  of  radiance  values)  of 
the  pixel.  Multiple  pixel  occurrences  at  the  same 
location  on  the  spectral  plot  are  shown  to  be  the  color 
of  the  pixel  corresponding  to  the  first  occurrence.  To 
illustrate  what  is  meant  by  a color-coded  spectral 
plot,  assume  that  a given  pixel  on  the  Landsat  image 
has  radiance  values  of  28, 30,  and  50  on  channels  l , 2, 
and  4,  respectively,  as  shown  in  figure  7.  Also,  let 
channel  4 be  plotted  against  channel  2.  The  color- 
coded  spectral  plot  is  created  by  assigning  the  values 
of  28, 30,  and  50,  respectively,  to  the  point  (30, 50)  on 
channels  1,  2,  and  3 of  the  spectral  image.  By  main- 
taining the  same  gains  and  biases,  the  color-coded 
spectral  image  can  be  displayed  in  the  same  color  as 
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the  original  Landsat  image.  Such  a spectral  plot  easily 
reveals  the  spectral  distribution  associated  with  origi- 
nal colors  in  the  Landsat  scene. 

The  color  of  the  pixel  on  the  spectral  plot  is  op- 
tional. It  can  be  colored  according  to  its  original  ra- 
diance value  or  the  mean  of  the  class,  cluster,  or  Held 
from  which  it  was  extracted.  Naturally,  the  pixel 
location  on  the  spectral  plot  can  also  be  colored  using 
the  PCOMP  value  for  the  pixel. 

Color-coded  spectral  plots  can  be  used  to  observe 
the  partitioning  of  spectral  space  imposed  by  cluster- 
ing or  by  maximum  likelihood  classification.  It  can 
be  used  also  to  view  the  spectral  locations  of  training 
samples.  A partition  of  the  two-dimensional  spectral 
space  by  the  maximum  likelihood  classification  rule 
is  depicted  on  the  color-coded  spectral  plot  by  color- 
ing the  pixel  on  the  plot  according  to  the  mean  of  the 
subclass  to  which  it  was  assigned.  A change  in  the 
color  or  its  intensity  on  such  a plot  determines  the 
maximum  likelihood  decision  boundary. 

When  multiple,  registered  Landsat  images  are 
available  for  a scene,  the  location  of  the  pixels  to  be 
plotted  can  be  selected  from  two  channels  of  one 
pass,  whereas  another  pass  is  used  for  color  defini- 
tion. Such  a color-coded  spectral  plot  is  especially 
useful  for  the  analysis  of  multitemporal  Landsat 
data.  Through  spatial  correlation  of  this  color-coded 
spectral  plot  and  the  Landsat  image  from  which  the 
plotting  axes  are  selected,  areas  of  temporal  change 
caused  by  factors  such  as  growth,  disease,  severe 
weather  conditions,  and  harvest  can  be  delineated.  A 
typical  application  of  the  color-coded  spectral  plot, 
which  is  currently  being  considered  by  LACIE,  is  to 
use  the  plot  as  an  aid  in  labeling  the  training  samples. 
This  is  done  by  providing  a spatial  correlation  be- 
tween the  spectral  plot  and  the  original  Landsat 
image. 
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CONCLUSION 

To  aid  in  interpreting  Landsat  imagery,  four  color 
image  display  techniques  have  been  developed. 
These  interpretation  aids  are  the  cluster  image,  the 
PCOMP  cluster  image,  the  PCG  image,  and  the 
color-coded  spectral  plot.  From  the  results  of  experi- 
mentation, the  four  display  techniques  have  been 
shown  to  be  useful  in  selecting  and/or  identifying 
training  data  from  Landsat  images.  The  developed 
interpretation  aids  are  being  considered  for  imple- 
mentation in  LACIE. 
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A Programed  Labeling  Approach  to  Image 

Interpretation 

M.  D.  Pore  a and  R,  A.  Abotleen  a 


INTRODUCTION 

The  procedure  for  analyzing  Landsat  multi- 
spectral  scanner  data  in  L ACIE  is  called  Procedure  1 
(P-1).  One  purpose  or  this  analysis  of  Landsat  data  is 
to  estimate  the  acreage  of  small  grains  in  a 9-  by  11- 
kilometer  rectangular  area,  referred  to  as  a segment. 
For  this  purpose,  the  four-channel  vector  of  spectral 
values  for  each  picture  element  (pixel)  in  the  seg- 
ment is  transformed  by  a production  film  converter 
(PFC)  to  form  a simulated  color-infrared  film  prod- 
uct of  the  segment.  There  are  22  932  pixels  in  a seg- 
ment. 

The  PFC  products  are  generated  for  several  dates 
throughout  the  growing  season  of  small  grains;  from 
these,  up  to  four  are  selected  by  the  analyst-in- 
terpreter (AI)  for  machine  processing.  Every  pixel 
falling  on  a 10-  by  10-pixel  grid  is  referred  to  as  a grid 
pixel  or  grid  dot.  The  first  grid  dot  is  on  row  10,  col- 
umn 10,  and  one  would  assume  that  it  represents  the 
same  "piece  of  real  estate”  on  the  ground  on  all  four 
acquisition  dates.  It  often  happens  that  grid  dots  ap- 
pear to  switch  fields  because  the  registering  of  the 
PFC  products  to  each  other  or,  more  precisely,  to  a 
reference  date  is  not  very  accurate. 

The  labeling  of  pixels  for  particular  acquisitions  is 
a more  precisely  defined  problem  than  multitem- 
poral labeling  since  the  grid  dots  may  not  fall  on  ex- 
actly the  same  real  estate;  however,  despite  this  lack 
of  perfect  registration,  P-1  requires  the  assignment  of 
multitemporal  labels  to  a subset  of  the  grid  dots.  The 
multitemporal  advantage  of  tracking  crop  growth 
outweighs  the  loss  due  to  imprecise  registration  in 
P-1.  Manual  labeling  techniques  require  the  AI  to  use 
not  only  PFC  products  but  also  agricultural  and 
meteorological  (agromet)  data  and  spectral  aids  in  an 
integrated,  judgmental  fashion.  To  control  an  antici- 
pated high  variance  in  these  techniques,  a semiauto- 
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mated  labeling  technology  was  developed.  The  prod- 
uct of  this  technology— LIST  (Label  Identification 
from  Statistical  Tabulation)— is  the  subject  of  this 
paper. 

The  LIST  operates  on  a discriminant  analysis 
basis  and  thus  has  the  potential  for  several  favorable 
qualities.  Among  these  are  the  ability  to  measure  the 
reliability  of  a label  and  the  ability  to  introduce  an  ar- 
bitrary bias.  The  latter  property  could  be  useful  in 
offsetting  other  biases,  such  as  those  due  to  a failure 
to  consistently  detect  small  grains.  In  summary, 
automation  in  labeling  can  introduce  the  favorable 
aspects  of  using  objective  information  and  integrat- 
ing continuous  variable  information  in  a way  that  an 
AI  cannot. 

This  paper  introduces  the  LIST  properties,  pre- 
sents detail  in  describing  the  LIST  development, 
gives  numerical  results  of  an  application,  and  dis- 
cusses the  evaluation  and  conclusions  about  LIST. 


LIST  DEVELOPMENT 

It  has  been  observed  that  two  AI's  can  study  the 
same  PFC  imagery,  agree  on  the  various  features 
(ggromet  information),  and  yet  come  to  different 
conclusions  about  pixel  labels.  This  may  be  due  to 
different  weighting  of  the  information  rather  than  to 
incorrect  labeling  or  personal  biases.  However,  this 
labeling  phenomenon  does  create  a high  variance  in 
labeling  that  is  undesirable. 

The  LIST  asks  the  AI  questions  related  to  simple 
properties  and  numerically  quantizes  the  AI  infor- 
mation, the  agromet  data,  and  the  spectral  informa- 
tion in  variable  formats  and  labels  through  use  of  a 
statistical  discrimination  process.  This  process  gener- 
ates a consistent  (lower  variance)  procedure  that  can 
be  manipulated  in  biases.  The  consistency  of  labeling 
is  twofold:  (1)  questions  relate  to  simple  properties 
to  generate  consistent  responses  among  AI’s,  and  (2) 
a discriminator  will  give  consistent  labels  from  a 


given  set  of  information  (responses). 

A historical  note  is  in  order  here.  Originally,  a 50- 
item  questionnaire  was  developed  by  experienced 
AI's  with  the  aim  of  including  all  possible  sources  of 
information  and  current  interpretation  channels  (ref. 

1) .  Because  image  interpretation  via  this  question- 
naire was  an  extremely  lengthy  process,  very  little 
data  were  processed  by  means  of  the  questionnaire. 
However,  with  the  small  amount  of  data  so  derived,  a 
more  operational  questionnaire  was  developed  (ref. 

2) .  This  latter  questionnaire  is  the  LIST  labeling  pro- 
cedure described  here. 

The  questions  posed  to  the  AI  to  be  answered 
from  the  PFC  imagery  products  (more  than  one 
product  may  be  available)  are  given  in  table  I.  The 
first  four  questions  are  answered  once  for  each  pixel, 
whereas  question  5 is  answered  four  times  for  each 
pixel  (once  for  each  acquisition  date).  Table  II  is  a 
list  of  the  questions  to  be  answered  in  an  automated 
fashion.  Each  question  is  answered  for  each  acquisi- 
tion with  the  exception  of  questions  3 and  6,  which 
are  multitemporal  trajectory  responses  (answered 
once  for  all  acquisitions). 

The  AI  questions  are  used  to  screen  the  pixels  and 
to  allocate  labels  of  “designated  other”  (DO), 
“boundary  pixel,”  or  “pure  pixel.”  The  DO  pixels  are 
automatically  labeled  “other,”  and  the  boundary  pix- 
els are  not  dealt  with  further;  they  are  left  with  the 


boundary  pixel  label.  The  pure  pixels,  however,  are 
labeled  by  the  discriminant  algorithm.  Only  auto- 
mated responses  are  used  in  the  discriminator; 
however,  AI  question  5 responses  are  used  to  answer 
automated  questions  5 and  6.  The  discriminator  will 
generate  labels  of  either  “small  grains”  or  “other." 

Any  of  several  discriminant  analysis  algorithms 
can  be  used  to  generate  the  labeling  classifier. 
However,  the  two  used  in  this  study  will  be  the  only 
algorithms  discussed  here.  Both  are  based  on 
minimizing  the  mean  square  error  (MSE).  The  first 
and  principal  algorithm  used  is  the  one  in  the  Statisti- 
cal Package  for  the  Social  Sciences  (ref.  3).  The  dis- 
crimination is  followed  by  a classification  of  mixture 
densities  based  on  Gaussian  distribution  assump- 
tions and  prior-category  probabilities  equal  to  the 
training  proportions.  This  classification  of  mixture 
densities  changed  very  few  labels  and  was  not  con- 
sidered a significant  attraction  (or  distraction)  from 
the  discriminant  algorithm.  This  algorithm  was  ver- 
satile and  contains  many  useful  options;  however,  it 
assumes  that  the  within-category  covariance 
matrices  are  equal  (traditional  Fisher  linear  dis- 
criminator) and  hence  can  be  improved. 

The  second  algorithm  is  a minimum  MSE  Baye- 
sian procedure  known  as  the  Patterson-Pitt 
algorithm  as  implemented  by  Thadani  (ref.  4)  and 
Ahlers  (ref.  5).  It  uses  a loss  matrix  and  prior-catego- 


Table  /. — LIST  Questions  for  the  Analyst 


Question 


Response 


1.  Pixel  is  obviously  in 


2.  Is  pixel  registered  with  regard  to  analyst-chosen 
registration  date  (i.e.,  in  the  same  category)? 

3.  Is  pixel  a mixed  pixel  (part  of  more  than  one 
field  or  boundary)? 

4.  Is  this  an  anomalous  pixel  (not  represents'  -e 
of  most  of  the  other  pixels  within  the  field)? 

5.  PFC  vegetation  canopy  indication  is 

(Use  all  available  imagery  film  types.) 


Nonagricultural  area.  STOP;  pixel  is  “designated  other" 

(DO). 

Agricultural  area  or  indeterminate. 

No.  STOP;  pixel  is  not  classifiable. 

Yes  or  indeterminate. 

Yes.  STOP;  pixel  is  not  classifiable. 

No  or  indeterminate. 

Yes.  STOP;  pixel  is  not  classifiable. 

No. 

(0)  No  vegetation  canopy 

(1)  Low-density  green  vegetation  canopy 

(2)  Medium-density  green  vegetation  canopy 

(3)  High-density  vegetation  canopy 

(4)  Senescent  (turning)  vegetation  canopy 

(5)  Harvested  canopy  (stubble) 
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Table  //.— Automated  LIST  Questions  for 
Classification  of  Small  Grains 


Question 

Response 

(corrected  to  60°  incidence). 

2,  Is  the  green  number  of  the  pixel  within  the 

Yes 

range  for  small  grains? 

No 

3.  Do  the  green  numbers  follow  a smalt-grains 

Yes 

trajectory? 

No 

5.  Is  the  vegetation  indication  of  the  pixel 

Yes 

valid  for  the  Robertson  biostage  of  wheat 

No 

for  the  acquisition? 

6.  Does  the  pixel  follow  a small-grains 

Yes 

vegetation  development  pattern? 

No 

ry  probabilities  equal  to  the  training  proportions.  The 
loss  matrix  used  was  the  traditional  zero-one  loss 
matrix. 

The  final  LIST  product  is  a set  of  pixel  labels  con- 
sisting of  “boundary,"  “small  grains,”  and  "other.” 
The  following  section  addresses  the  problem  of  what 
techniques  were  used  to  answer  the  automated  ques- 
tions used  to  produce  these  labels. 


LIST  KEYS 

It  has  been  shown  by  Abotteen  (ref.  6)  that  when 
the  components  of  the  first  greenness  image  eigen- 
vector are  plotted  as  a function  of  the  acquisition 
date,  the  shape  of  the  curve  is  similar  to  the  temporal 
wheat  trajectory.  It  was  also  show  n in  the  same  work 
that  high  values  in  the  first  principal  component 
greenness  (PCG)  band  correspond  to  small-grains 
pixels.  Hence,  the  PCG  statistic  is  introduced  here  as 
a feature  for  separating  small  grains  from  non-small- 
grains.  The  PCG  statistic  of  a dot  (pixel)  is  the  first 
PCG  band  value  for  that  dot.  It  can  be  calculated  by 
taking  the  inner  product  of  the  first  greenness  image 
eigenvector  with  the  green  number  vector  for  the 
pixel.  The  PCG  statistic  of  a pixel  answers  the  ques- 
tion: does  the  pixel’s  temporal  greenness  trajectory 
look  like  a small-grains  dot  greenness  trajectory? 


The  calculation  of  the  PC G autistic  for  a pixel  re- 
quires knowledge  of  the  first  greenness  image  eigen- 
vector. A model  is  developed  here  to  estimate  this 
eigenvector  given  the  Robertson  biostage  of  wheal 
(ref.  7).  Two  models  are  developed,  one  for  winter 
small  grains  and  the  other  for  spring  small  grains.  A 
plot  of  the  componenu  of  the  first  greenness  image 
eigenvector  as  a function  of  the  Robertson  wheat 
biostage  for  seven  quadritemporal  winter  small- 
grains  LACIE  segments  acquired  in  the  1976  crop 
year  is  shown  in  figure  I.  The  Robertson  biostage 
axis  in  figure  1 is  divided  into  six  increments.  For  ev- 
ery increment,  the  eigenvector  components  corre- 
sponding to  Robertson  biostage  numbers  failing  in 
the  increment  are  recorded  and  averaged.  The 
Robertson  biostage  number  range  for  every  incre- 
ment and  its  corresponding  average  eigenvector  com- 
ponent are  shown  in  table  III.  A plot  of  the  average 
first  greenness  eigenvector  components  as  a function 
of  the  Robertson  biostage  for  winter  small  grains  is 
shown  in  figure  2.  The  Sun-angle-corrected  (SAC) 
eigenvector  components  are  also  shown  in  figure  2 as 
the  dotted  line.  The  dotted  line  in  figure  2 is  not  sig- 
nificantly different  from  the  solid  line,  which  elimi- 
nates the  need  for  using  SAC  first  greenness  eigen- 
vector components.  Instead,  the  green  number  vec- 
tor for  a pixel  is  SAC  when  used  with  the  estimated 
first  eigenvector  for  calculating  the  pixel's  PCG 


FIGIRF  I. — First  greenness  image  eigenvector  components  as  a 
function  of  Robertson  biostages  for  seven  winter  small-grains 
segments. 
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Table  Hi—  Robertson  Biostage  Number  Range  and 
Corresponding  Average  First  Greenness  Eigenvector 
Components  for  Winter  Small  Grains 


Robertum 
biostage  range 

4vrfOjff  first  greenness 
eigen  \etror  component 

2.0  to  2.S 

0.09975 

2.5  to  3.0 

58433 

3.0  to  3.8 

.763 

3.8  to  4.5 

.3845 

4.5  lo  5.0 

.285 

5.0  to  7.0 

-.1165 

statistic.  A digitized  version  of  figure  2 is  shown  in 
table  IV,  where  Robertson  biostage  numbers  are  in- 
creased by  0.1. 

The  spring  small-grains  model  that  estimates  first 
greenness  image  eigenvector  components  is 
developed  in  a fashion  similar  to  that  just  described. 
A plot  of  the  first  greenness  eigenvector  components 
as  a function  of  the  Robertson  biostage  for  six 
LACIE  quadri  temporal  spring  small-grains  segments 
acquired  in  the  1976  crop  year  is  shown  in  figure  3. 
The  line  with  dotted  ends  in  figure  3 represents  the 
average  first  greenness  eigenvector  components  for 
spring  small  grains.  A digitized  version  of  this  line  is 
shown  in  table  V.  where  Robertson  biostage  num- 
bers are  increased  by  0.1. 


««r 

AVfftAOl  MUST  OMtNNU* 
tlOINVtCTOfl  COMtOfttftT  * * 
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ROBCRTOON  OtOOTAOl 

FIGURE  2.— -Averts*  first  greenness  Image  eigenvector  compo- 
nents as  a function  of  Robertson  blostages  for  winter  smalt 
grains. 


The  ground-truth-labeled  small-grains  grid  inter- 
section dots  on  a LACIE  segment  image  were  used  to 
calculate  a temporal  average  green  number  for  small 
grains.  A total  of  34  segments  was  used;  each  seg- 
ment had  several  acquisitions  in  the  1976  crop  year. 
The  Robertson  biostage  number  for  each  acquisition 
was  also  obtained.  When  the  temporal  average  green 
number  was  computed,  the  standard  deviation  was 
also  computed  for  small  grains.  The  segments  con- 
tained either  winter,  spring,  or  mixed  small  grains. 
The  small-grains  green  number  range  was  developed 


Table  IV. — Principal  Component  Greenness  Statistics  Generation  Table  for  Winter  Small  Grains  in  LIST 


Robertson 

biostage 

number 

First 

greenness 

eigenvector 

component 

Robertson 

biostage 

number 

First 

greenness 
eigen  sector 
component 

Robertson 

biostage 

number 

First 

greenness 

eigenvector 

component 

2.4 

0.10 

40 

0.53 

5.6 

0.085 

25 

.24 

4.1 

49 

5.7 

06 

26 

33 

4.2 

45 

58 

04 

2.7 

42 

43 

.41 

5.9 

01 

28 

.51 

44 

.38 

6.0 

-.01 

2.9 

.59 

4.5 

.35 

6.1 

-.04 

30 

63 

4.6 

.33 

62 

-.06 

3.1 

.66 

4.7 

30 

6.3 

-.08 

3.2 

69 

48 

.28 

64 

-.11 

3.3 

.73 

49 

.25 

65 

-.13 

34 

.76 

50 

.23 

6.6 

-.16 

35 

.73 

5.1 

.21 

6.7 

-.18 

3.6 

.69 

5.2 

.18 

6.8 

-.2 

3.7 

.65 

S3 

.16 

69 

-.23 

3.8 

.61 

54 

.13 

7.0 

- 25 

39 

.57 

5.5 

.11 
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FIGURE  3.— First  ircenntts  Imqt  eigenvector  components  and 
their  avenges  as  a function  of  Robertson  blostages  for  spring 
small  gnfns. 


for  winter  and  spring  small  grains.  The  segments 
designated  as  mixed  had  two  Robertson  biostage 
numbers  and  were  used  in  both  the  winter  and  spring 


smell-grains  green  number  range  models. 

The  Robertson  biostage  numbers  for  winter  and 
spring  small  grains  were  divided  into  several  sec- 
tions. For  each  range  of  Robertson  biostage  numbers 
representing  a certain  section,  the  average  small- 
grains  green  number  and  the  standard  deviation  cor- 
responding to  a Robertson  biostage  number  in  the 
section  are  recorded.  The  average  small-grains  green 
numbers  and  the  standard  deviations  within  a 
Robertson  biostage  number  range  are  averaged  to  ob- 
tain a grand  small-grains  green  number  mean  and  a 
grand  standard  deviation.  For  each  Robertson 
biostage  number  range,  the  corresponding  grand 
small-grains  green  number  mean  and  standard  devia- 
tion are  shown  in  table  VI  for  winter  and  spring 
small  grains.  The  winter  small-grains  green  number 
range  as  a function  of  the  Robertson  biostage  num- 
ber is  shown  in  figure  4 for  1 standard  deviation.  The 
lower  and  upper  bounds  of  the  green  number  in 
figure  4 are  computed  by  taking  the  grand  green 
number  mean  minus  and  then  plus  1 grand  standard 
deviation,  respectively.  Figure  5 shows  the  winter 
small-grains  green  number  range  as  a function  of  the 
Robertson  biostage  number  for  2 standard  devia- 
tions. The  spring  small-grains  green  number  range  as 
a function  of  the  Robertson  biostage  number  is 
shown  in  figure  6 for  1 and  2 standard  deviations. 


Tabu:  V. — Principal  Component  Greenness  Statistics  Generation  Table  for  Spring  Small  Grains  in  LIST 


Robertson 

biostage 

number 

First 

greenness 

eigenvector 

component 

Robertson 

biostage 

number 

first 

greenness 

i 'ijtfn  vector 
ivmponenf 

Robcrtum 

biostage 

number 

First 

MWtmess 

eiftenmtor 

component 

1.5 

-0.125 

3.4 

0.57 

5.3 

0.618 

1.6 

-.09 

3.5 

.60 

5.4 

.595 

1.7 

-.05 

3.6 

.62 

5,5 

.575 

1.8 

-.018 

3.7 

64 

5.6 

.53 

1,9 

.02 

3.8 

.66 

5.7 

,49 

2.0 

.059 

3.9 

.68 

5.8 

.44 

2.1 

.09 

4.0 

.695 

$.9 

40 

2.2 

.13 

4 1 

.715 

6.0 

35 

2.3 

.17 

4.2 

.73 

6.1 

.31 

2.4 

.205 

4.3 

.75 

6.2 

.27 

2.5 

.24 

4.4 

.77 

6.3 

.225 

2.6 

.28 

4.5 

.785 

6.4 

.18 

2.7 

.31 

4.6 

.765 

6.5 

.141 

2.8 

.35 

4.7 

.745 

66 

.10 

2.9 

.39 

4.8 

.72 

6.7 

,055 

3.0 

.42 

4.9 

.70 

6.8 

01 

3.1 

.46 

5.0 

.68 

69 

-.03 

3.2 

.50 

5.1 

.66 

7.0 

-.07 

3.3 

.53 

5.2 

.64 
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Table  VI. — Grand  Small-Grains  Green  Number  Mean 
and  Standard  Deviation 


Robertson 

bmstage 

range 


Grand 

green 

number 

mean 


Grand 

standard 

deviation 


H inter  small  grams 


2.0  lo  2.6  6.4]  2.64 

2.6lo  3.0  12. t S.S6 

3.0  lo  3 5 14.78  5.63 

3.5  lo  4.0  16.23  6.75 

4.0  io4.5  18.30  7.65 

4.5  lo  5.5  14.59  5.85 

5.5  lo  7.0  9.25  4.15 


Spring  small  grains 


1.0  to  2.0  5.12  2.68 

2.0  to  2.5  9.28  3.93 

2.5  to  3.0  17.03  8.13 

3.0  to  3.5  18.8$  7.63 

3.5  to  4,0  206  5 7.13 

4.0  to  5 0 20.57  6 32 

5.0  to  6.0  11.52  4.3 

6.0  to  7.0  8.25  365 


nOMHttON  BIOJIACI 


FIGURE  4.— Winter  small-t  rains  treeu  number  rante  ax  a func- 
tion of  Robertson  blostatet  (1  standard  deviation). 


Figure  7,  a chart  or  the  Robertson  biostage  on  the 
horizontal  axis  and  the  vegetation  canopy  on  the  ver- 
tical axis,  describes  an  automation  technique  for  the 
vegetation  canopy  questions.  (Table  VII  provides  the 
same  information  in  tabular  form.)  For  each  acquisi- 
tion. 3 point  is  located  in  figure  7 with  the  horizontal 
axis  coordinate  corresponding  to  the  Robertson 


1.0  2.0  10  SO  SO  10  7.0  10 

ROBERTSON  BIOSTAOE 

FIGURE  S.— Winter  srasll-trstas  irrvn  number  rinse  is  s func- 
tion of  Robertson  blostage*  (2  stindard  deviations). 


ROt-RTSON  BIOSTAOE 

FIGURE  t.— Sprlni  small-trains  green  number  rinse  as  a func- 
tion of  Robertson  blosiases. 
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FIGURE  7.— ‘Automation  technique. 


biostage  for  wheat  for  the  acquisition  and  the  vertical 
axis  coordinate  corresponding  to  the  answer  (for  a 
given  pixel)  to  question  5 (table  I).  If  the  point  is  in 
the  blank  or  the  dotted  area,  question  S (table  II)  is 
automatically  answered  with  a yes.  If  the  point  is  in 
the  shaded  (barred)  area,  the  answer  is  no  (for  that 
pixel  and  acquisition).  Vertical  borders  belong  to  the 
class  on  the  left. 

Question  6 (table  II)  is  answered  with  a yes  if  the 
points  corresponding  to  the  four  acquisitions  are  all 


in  the  blank  or  dotted  regions  and  if  at  least  two  are 
in  the  blank  region.  This  is  equivalent  to  the  rule  cor- 
responding to  table  VII:  all  four  acquisitions  must  be 
classified,  and  at  least  two  must  be  first-class 
responses  for  question  6 (table  II)  to  be  answered 
with  a yes.  Hence,  the  dotted  region  in  figure  7 (and 
the  second-class  response  in  table  VII)  is  used  as  a 
different  designation  from  the  blank  region  (first- 
class  region)  for  answering  question  6 (table  II)  only. 

Figure  7 is  like  a crop  calendar  that  reflects  the 
growth  characteristics  of  small  grains.  This  chart 
may  need  to  be  revised  to  reflect  the  peculiarities  of  a 
particular  growing  season  or  a particular  region  of  the 
country  For  example,  it  has  been  suggested  that  the 
region  corresponding  to  Robertson  biostages  5. 6 to 
6.0  and  vegetation  canopy  indication  1, 2, or  3 (parts 
of  the  sixth  row  in  table  VII)  could  be  shaded 
(removed  from  the  second-class  response)  for  seg- 
ments from  winter  wheat  regions.  This  reflects  the 
necessity  for  margin  in  acceptable  vegetation  canopy 
indications  in  mixed  and  spring  wheat  areas  and  for 
more  uniformity  in  winter  wheat  areas.  The  present 
key  (table  VII)  is  intended  to  be  quite  general  and 
may  be  used  in  this  state. 


TEST  RESULTS 

Four  AI's  were  used  to  test  the  quality  of  the  ques- 
tions for  discriminating  small  grains  (agricultural 


Table  VII. — Automation  Technique  In  Tabular  Form 


Robertson 
blosiage  range 

First-class  response 

Second-class  response 

10  to  2.0 

No  vegetation  (0) 

Green  vegetation  (t.  2. 3) 

2.1  to  2.5 

No  vegetation  or  green  vegetation 
(0. 1, 2.  3) 

2.6  to  3.0 

Green  vegetation  (l.  2,  3) 

No  vegetation  (0) 

3.1  to  5.0 

Green  vegetation  <1.  2.  3) 

5.1  to  5.5 

Green  vegetation  or  turning  (1.  2.  3, 4) 

Harvested  (5) 

5.6  to  6.0 

Turning  (4) 

Green  vegetation  or  harvested 
(1.2.  3.  S> 

6.1  to  6.9 

Turning  or  harvested  (4,  5) 

7.0 

Harvested  (5) 

Turning  (4) 
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crops)  from  non-imull-grains.  Each  A!  analyzed  16 
segments  and  took  approximately  2.5  hours  per  seg- 
ment, Earlier  studies,  such  u the  study  of  Register 
and  Hocutt  (ref.  8),  have  indicated  that  interpixel 
correlations  decrease  with  distance  and  that  a dis- 
tance of  10  pixel  widths  corresponds  to  negligible 
correlation.  Hence,  dot  grids  are  assumed  to  be  inde- 
pendent samples  with  respect  to  crop  types. 

Separate  analyses  were  performed  for  the  1976 
winter  wheat  and  spring  wheat  sites  (eight  of  each). 
All  Kansas  blind  sites  in  new  LACIE  stratum  11 
with  available  ground  truth  were  chosen  as  the 
winter  test  sites.  The  eight  spring  sites  were  chosen 
from  the  blind  sites  in  new  LACIE  stratum  21  (fig. 
8)  for  strau  locations.  Since  ground  truth  was  re- 
quired in  stratum  21,  segments  were  chosen  to  be 
representative  of  the  three-state  coverage  of  the 
stratum.  The  data  within  each  stratum  were  further 
partitioned  into  four  training  segments  and  four  test 
segments.  Table  VIII  describes  this  breakdown. 

For  each  segment,  four  acquisition  dates  were 
chosen  arbitrarily  (without  respect  to  special  area 
agromet  conditions  or  cloud  cover)  to  cover  the 
1975-76  growing  season  of  wheat.  Table  IX  gives 
these  dates  and  the  respective  Robertson  biostages 
for  winter  wheat  (WW)  and  spring  wheat  (SW). 
Three  types  of  PFC  products  were  generated:  Prod- 
uct 1,  Product  2,  and  the  Kraus  product  (reference  9 
describes  these  films)  The  films  were  made  into 
research,  test,  and  evaluation  packets  (separate  from 
LACIE  operational  packets)  to  maintain  a restricted 
experimental  environment  of  labeling  without  full- 
frame  imagery  (185-kilometer  square  area  of  land)  of 
the  broad  area  of  interest  and  without  ancillary 
agromet  information.  Hence,  accuracies  should  be 
below  those  experienced  in  an  operational  labeling 
system.  The  discriminants  were  determined  using 
ground  truth  for  the  four  training  segments  and  ac- 
curacy was  determined  by  using  the  discriminant 
function  to  classify  the  four  test  segments.  Percen- 
tages of  pixels  correctly  labeled  were  calculated  from 
contingency  tables  of  ground  truth  by  LIST  labels. 

The  particular  variables  that  a stepwise  discrimi- 
nant procedure  admits  are  a function  of  the  number 
of  training  samples,  the  variability  of  the  particular 
area  sampled,  and  the  acquisition  dates.  Certainly, 
implementation  in  LACIE  of  a training  sample  of 
the  size  used  here  is  not  recommended;  hence,  dis- 
criminant vectors  and  tests  for  category  mean 
differences  are  not  presented.  Instead,  tables  for  test 
accuracy  (on  segments  not  used  in  training)  are  pre- 
sented. Figure  9 is  a key  for  these  contingency  tables. 


FIGURE  Crop  reporting  Starlet*  mi  sew  LACIE  *trata  lit 
the  t.S.  Great  Plain*. 


Table  VM,—~  LIST  Data  Set 


Stratum 

LACIE 

number 

Type 

Purpose 

County,  state 

11 

1019 

Winter 

Train 

Norton.  Kina 

II 

1035 

Winter 

Train 

Ford.  Kant 

II 

1855 

Winter 

Train 

Trego.  Kana. 

11 

1865 

Winter 

Train 

Steven*.  Kant. 

II 

1020 

Winter 

Teat 

Rawltni.  Kan* 

II 

1852 

Winter 

Teal 

Lane.  Kant. 

11 

I860 

Winter 

Te*t 

Hodgeman.  Kana. 

11 

1880 

Winter 

Teat 

Ellin.  Kant 

21 

1542 

Spring 

Train 

Rouaevelt.  Mont 

21 

1650 

Spring 

Train 

Hettinger.  N.  Dak 

:i 

1651 

Spring 

Train 

Bowman.  N.  Dak 

21 

1667 

Spring 

Train 

Harding.  S.  Dak. 

21 

1530 

Spring 

Teat 

Phillip*,  Mont. 

21 

1656 

Spring 

Ten 

Morton,  N Dak. 

21 

1660 

Spring 

Tent 

Logan.  N.  Dak 

21 

1668 

Spring 

Tern 

Perkin*.  S.  Dak 

930 


Table  IX.—  LIST  Data  Acquisition  Dates  (1976) 


Segment 

County 

Daiet  (H'W  btottase. 
SH  Plottage, 
tf  applt roWr) 

1019 

Norton,  Kant. 

Jan.  19  (2.4) 
Feb.  6 (2.5) 
June  12  (4.6) 
June  30  (5.4) 

1020 

Rawlins,  Kant. 

Feb.  2$  (2.5) 
Apr.  10(2.7) 
June  3 (3.7) 
July  II  (6.0) 

1035 

Ford,  Kant. 

Mar.  13  (2.6) 
May  6 (3.4) 
June  1 (4.1) 
July  1 (6.0) 

1 530 

Phillips.  Mont. 

June  1 (3.5, 3.1) 
June  )« (3.9,40) 
July  7 (5.5,  5.0) 
Au|.  12  (7  0,6.0) 

1542 

Room  veil.  Mont. 

Apr.  25  (2.5.11) 
June  11(4.3, 3.4) 
July  6 (5.7, 5.0) 
July  24  <6.0. 6.0) 

1650 

Hettinger.  N.  Dak. 

May  9 (3.2,  2.0) 
May  27  (3.1.  3.0) 
Au|.  7 (6.0, 6.0) 
Au|.  25  (6.0,6  0) 

1651 

Bowman,  N.  Dak. 

May  10  (3.3.  22) 
May  29  <4.0.  3.0) 
July  2)  <6.0. 60) 
Aug.  1 <6.0.  6.0) 

1656 

Morton.  N.  Dak. 

May  9 (J.O.  20) 
July  2 <60.  4.4) 
July  20  <7.0. 6.0) 
Aup.  7 <70,  7.0) 

1660 

Logan.  N.  Dak. 

May  7 (3.1.20) 
June  12  <4.2. 3.7) 
Aup  6 (60.  6.0) 
Aup.  23  <6.0, 6.0) 

1667 

Harding.  S.  Dak. 

May  10  <3.4.  2.3) 
May  29  <4.3.  3.2) 
July  21  (59. 60) 
Aug.  1 <6.0. 6.0) 

1661 

Pferkint,  S,  Dak 

Apr.  22  (26. 1.7) 
May  9 <3.3. 2.3) 
May  21  <40.  31) 
Aug.  7 <6,0. 6.0) 

County 

Dam  (WW  btostage, 
SW  btouage. 
tf  applicable) 

1152 

Lent,  Kant. 

Mar.  31  (26) 
May  7 (3.2) 
June  20  (5  J) 
July  17  (6.0) 

1155 

Trego,  Kant. 

Mar.  13  (26) 
Apr.  II  (30) 
June  20  (5.7) 
July  17  (60) 

1160 

Hodgeman,  Kant. 

Mar.  13  (2.5) 
May  6 (3.3) 
June  2 (4.1) 
July  1 (6.0) 

1165 

Stevens.  Kant. 

Feb.  7 <2.4) 
May  15  (36) 
June  20  (SJ) 
July  1 (6.0) 

IMO 

Ellis,  Kant. 

Mar.  13  (26) 
May  6 (3.2) 
June  10  (4.9) 
July  16  (6.0) 
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Four  analyses  were  performed  on  the  winter  seg- 
ments: two  using  the  quadratic  discriminator,  one 
using  the  stepwise  discriminant,  and  one  using  the 
AI  labels.  These  results  and  the  variables  used  are 
given  in  Figure  10  for  all  four  AI’s,  each  responding  to 
the  four  winter  test  segments.  (The  appendix  gives 
variable  definitions  for  all  analyses.)  As  presently 
programed,  the  quadratic  discriminator  was  deter- 
mined to  accrue  numerical  analysis  errors  of  com- 
putation at  an  unacceptable  rate  and  was  not  used  in 
the  spring  site  analyses. 

All  spring  sites  were  treated  as  mixed  wheat  sites, 
even  when  winter  wheat  analysis  was  patently  un- 
necessary. The  mixed  wheat  philosophy  was  to  give 
positive  responses  automatically  where  indicated  for 
either  spring  or  winter  wheat.  For  example,  if  the 
canopy  trajectory  for  a pixel  was  similar  to  a winter 
wheat  trajectory  (SUM  is  high  for  winter  biostage 
numbers)  but  dissimilar  to  a spring  wheat  trajectory 
(SUM  is  low  for  spring  biostage  numbers),  then 
KEYS  and  SUM  were  based  on  winter  biostages  for 
that  pixel.  The  results  for  the  spring  sites  are  given  in 
figure  11. 


The  AI  percentage  of  small  grains  and  the  LIST 
percentage  of  small  grains  were  consistently  below 
the  ground-truth  percentage  of  small  grains  (m  < I in 
fig.  9)  regardless  of  the  type  of  discriminant  used. 
This  is  partly  attributed  to  the  facts  that  omission 
rates  are  apparently  always  less  than  commission 
rates  (b  < c in  fig.  9)  and  that  there  is  a fairly  consis- 
tent tendency  for  nearly  4 percent  of  the  DO  pixels 
to  be  small  grains  (e/(e  + 0 0.038). 

Although  midseason  estimation  cannot  be  effec- 
tively analyzed  since  acquisition  date  selection  for 
end-of-season  estimation  is  usually  inappropriate  for 
midseason  estimation  and  specialized  midseason 
questions  (e.g.,  automated  prototype  green  number 
trajectories)  have  not  been  developed,  such  an 
analysis  is  presented  here,  recognizing  that  lower 
than  realistic  accuracy  is  expected.  Such  an  analysis 
indicates  the  efficacy  of  present  keys  and  may  be 
heuristically  valuable  in  pointing  to  new  develop- 
ments. A rather  high  accuracy  (PCL  in  the  ter- 
minology of  fig.  9)  and  a moderate  decrease  in  the 
percentage  of  small  grains  reported  (m  < I in  the  ter- 
minology of  fig.  9)  is  demonstrated  in  figure  12. 
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FIGURE  10.— LIST  test  accuracy  on  winter  sites,  (a)  AI  labels,  (b)  Linear  discriminant,  (c)  0 with  B and  G only,  (d)  Q17. 
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FIGURE  II.— LIST  lest  accuracy  on  spring  sites,  (a)  Al  labels,  (b)  Linear  discriminant,  (c)  Linear  with  B-G-BIO  step,  (d)  Linear  with 
B-G-BIO  direct. 
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FIGURE  12. — Midseason  test  accuracy,  (a)  Winter  sites,  (b)  Spring  sites. 


EVALUATIONS  AND  CONCLUSIONS 

The  phenomenon  of  a nearly  4-percent  DO  being 
small  grains  constitutes  a source  of  bias  that  is  ap- 
parently consistent  over  diverse  geographic  regions 
and  that  is  readily  measurable.  The  unexpectedly 
high  PCL  (high  means  close  to  Al  label  accuracy)  in 
the  “undeveloped  discriminator"  for  midseason 


labeling  analyses  suggests  that  a directed  develop- 
ment of  a midseason  LIST  labeler  (as  opposed  to  a 
casual  byproduct  of  an  end-of-season  LIST  labeler) 
would  yield  a highly  accurate  operational  labeling 
system. 

The  present  Classification  and  Mensuration  Sub- 
system procedural  philosophy  is  for  the  Al  to  select  a 
reference  acquisition  date  (film)  and  to  mentally  ad- 
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just  registration  discrepancies  of  other  acquisitions  to 
give  label  accuracy  to  the  “real  estate"  represented  in 
the  reference  film.  It  is  becoming  increasingly  evi- 
dent that  LIST,  and  in  fact  any  labeling  procedure 
that  relies  on  spectral  aids  (e.g.,  trajectories),  is  in- 
herently based  on  a different  philosophy.  Since  ac- 
quisitions are  usually  not  identically  registered, 
spectral  values  for  a pixel,  across  several  acquisi- 
tions, .h  refore  represent  the  area  about  the  “real 
estate  id  not  a precise  pixel  of  one  date.  Boundary 
pixels  and  mixed  pixels  (across  a boundary)  have 
spurious  spectral  trajectories;  i.e.,  the  trajectory  is 
not  sampled  from  a single  category  of  interest  but 
rather  is  switched  from  one  category  to  another. 
Such  trajectories  tend  to  confuse  the  labeling  process 
and  reflect  a basic  modeling  error  in  image  in- 
terpretation. LIST,  on  the  other  hand,  labels  what  is 
represented  by  the  spectral  trajectory  (in  this  case, 
the  grid  dot  intersection  on  the  PFC  (film)  product). 


To  make  this  more  meaningful,  LiST  first  filters  out 
the  boundary  (and  mixed)  pixels  and  treats  these 
pixels  as  a nonlabelable  class  to  be  proportioned.  In 
summary,  LIST  does  not  label  real  estate;  it  does 
label  film  grid  intersection  pixels.  This  philosophical 
change  is  implied  by  the  increased  reliance  on 
spectral  trajectories. 

The  high  accuracies  demonstrate  that  the  concept 
of  a programed  statistical  discrimination  approach  to 
pixel  labeling  is  valid  and,  in  particular,  that  the  LIST 
procedure  performed  comparably  with  the  AI's  in 
the  restrictive  environment  of  these  test  conditions. 
This  is  a highly  successful  result  that  confirms  the 
efficacy  of  the  LIST  questionnaire.  However,  it  can 
be  easily  and  obviously  improved  through  the 
further  development  and  training  of  the  automated 
keys,  particularly  green  number  ranges  and  trajecto- 
ries. 


Appendix 

Variable  Definitions  for  Analyses* 


Variables  Definition 


Variables 


Definition 


BIOl,  BI02,  BI03,  BI04 
or 

WBIOl  through  WB104 
SBIOl  through  SBI04 
G1.G2,  G3,  G4 
Bl,  B2,  B3,  B4 
GREEN1  through 
GREEN4 

KEY1  through  KEY4 


Canopy  trajectory 


PCGW,  PCGS 


Winter  wheat  Robertson 
biostages  for  the  respec- 
tive acquisitions 
Spring  wheat  biostages 
Green  numbers 
Brightness  numbers 
Yes/no  answer:  Is  green 
number  in  the  small- 
grains  range? 

Yes/no  answer:  Is  canopy 
in  the  small-grains 
range? 

Yes/no  answer:  Is  canopy 
trajectory  acceptable  for 
small  grains? 

PCG  statistic  for  winter 
and  spring  wheat, 
respectively 


GW1  through  GW4 
GS1  through  GS4 
BW1  through  BW4 
BS1  through  BS4 


The  products  of  Gi  x 
WBIOi  for  i = 1,2,3, 4 
The  products  of  Gi  X 
SBIOi  for  i = 1, 2,3,4 
The  products  of  Bi  X 
WBIOi  for  i - 1,2, 3,4 
The  products  of  Bi  x 
SBIOi  for  i = 1,2, 3,4 


•See  reference  2 for  the  numerical  derivations. 
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Status  of  Yield  Estimation  Technology:  A Review  of 
Second-Generation  Model  Development  and 

Evaluation 

R.  G.  Stuff, a T.  L.  Barnett ,a  G . O.  Boatwright,0  D.  E.  Phinney,band  V.  S.  Whlteheadf1 


SUMMARY 

Multiple  regression  models  were  selected  as  the 
LACIE  yield  estimation  baseline,  primarily  on  the 
basis  of  experience  and  expediency.  Their  require- 
ments for  long,  region-specific  historical  records  of 
yield  and  weather  data  and  inherently  damped 
responses  were  recognized  as  a priori  limitations  rela- 
tive to  LACIE  objectives.  These  limitations  and  the 
potential  improvements  claimed  for  other  ap- 
proaches were  the  principal  motives  for  initiating  a 
Research.  Test,  and  Evaluation  (RT&E)  program  to 
evaluate  and  develop  more  amenable  models  for  the 
agricultural-meteorological  tagromet)  estimation  of 
yield. 

Candidate  alternatives  were  labeled  as  second  or 
third  generation,  based  on  their  progressively  more 
detailed  resolution  elements  and  the  effort  needed  to 
make  them  operational.  The  objectives  of  research  in 
second-generation  models  were  to  obtain  (1)  yield 
estimation  capability  for  any  arbitrary  unit  of  area 
and  (2)  greater  responsiveness  and  accuracy  in  yield 
estimates  through  the  use  of  additions!  data  sources 
applied  at  smaller  spatial  and  temporal  scales.  Also, 
candidate  second-generation  models  were  to  be 
evaluated  in  a research  mode  by  characterizing  their 
expected  performance  relative  to  the  multiple  regres- 
sion (first-generation)  models. 

Limits  on  independent  data  available  to  test  candi- 
date models  required  the  use  of  different  evaluation 
procedures.  Test  runs  of  the  Baier  model  (ref.  1)  dis- 
played inadequate  results  when  applied  outside  the 
Canadian  spring  wheat  area,  and  major  revisions 
would  be  required  to  adapt  it  to  winter  wheat. 


“NASA  Johnson  Space  Center.  Houston.  Texas. 
^Lockheed  Electronics  Company,  Inc..  Houston,  Texas. 


Historically  based  time-series  estimates  for  in- 
dividual areas  were  used  when  replications  over 
years  or  other  accuracy  statistics  were  not  availably. 
Time-series  projections  were  found  to  be  better  pre- 
dictors of  the  197S  distribution  of  spring  wheat  yields 
across  counties  and  districts  in  four  states  than  pre- 
dictions by  the  Earth  Satellite  Corporation 
(EarthSat)  moisture  stress  model  (ref.  2).  The 
Feyerherm  model  (see  the  paper  by  Feyerherm  and 
Paulsen  entitled  “A  Universal  Model  for  Estimating 
Wheat  Yields"),  which  was  developed  as  a follow-on 
to  the  Baier  model  test,  was  compared  to  the  regres- 
sion models  through  10  years  of  test  predictions  in 
the  U.S.  Great  Plains  states.  Performances  were  esti- 
mated to  be  equivalent  for  spring  wheat  but  not  as 
good  as  the  first-generation  models  for  winter  wheat. 
Additional  test  predictions  for  states  in  the  U.S.  Com 
Belt  and  the  U.S.  Pacific  Northwest,  for  India,  and 
for  an  oblast  in  the  U.S.S.R.  were  used  to  verify  the 
quasi-universal  applicability  of  the  Feyerherm 
model.  Other  evidence  for  universality  was  found  in 
development  of  the  Haun,  Cate-Liebig,  and  Center 
for  Climatic  and  Environmental  Assessment 
(CCEA)  II  models  (see  reference  3,  the  paper  by 
Cate  et  al.  entitled  "The  Law  of  the  Minimum  and  an 
Application  to  Wheat  Yield  Estimation,"  and  the 
paper  by  LeDuc  entitled  "CCEA  Second-Generation 
Wheat  Yield  Model  for  Hard  Red  Wheat »Mi  -North 
Dakota").  ' , 

Candidate  models  which  use  the  Landsat-derived 
leaf  area  index  (LAI)  in  transpiration  and  growth- 
based  yield  models  were  developed  by  Kanemasu 
(ref.  4),  and  other  basic  yield-Landsat  relationships 
were  investigated.  It  was  concluded  that  data  base  in- 
adequacy was  the  factor  limiting  performance  in  all 
the  second-generation  models  considered  and  that 
each  of  the  models  has  more  yield-predicting 
capability  than  was  reached  during  LACIE. 


Mg  q^b  wagnaa  — 
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INTRODUCTION 

The  basic  objectives  of  LACIE  were  to  evaluate 
and  demonstrate  remote  crop  production  estimation 
technology  in  a practical  application.  Also,  a support* 
ing  Research,  Test,  and  Evaluation  (RT&E)  func- 
tion was  established  (ref.  $)  to  develop  and/or  vali- 
date improvements  for  the  Applications  Evaluation 
System. 

For  remote  estimation  of  wheat  yields,  multiple 
regression  equations  derived  from  historical  crop 
and  weather  data  were  considered  the  most  expe- 
dient approach  (see  the  paper  by  McCrary  et  al.  en- 
titled “Operation  of  the  Yield  Estimation  Sub- 
system”). These  models  had  been  developed  since 
the  early  I900's  (ref.  6)  to  analyze  historical  yield 
variability  for  individual  states,  but  their  prediction 
accuracy  for  large  areas  was  not  known.  Several  in- 
herent weaknesses  in  the  regression  models  were 
identified  initially;  the  key  weaknesses  of  concern 
were  as  follows: 

1.  Restriction  of  applicability  to  areas  with  long 
historical  records 

2.  Insensitivity  because  of  (a)  averaging  “out” 
effects  of  local  and  short-duration  phenomena  by 
state  and  monthly  variables,  (b)  limited  numbers  of 
parameters  which  could  he  estimated  independently 
for  any  given  length  of  record,  (c)  lack  of  attention  to 
crop  calendar  changes,  and  (d)  use  of  surrogate  varia- 
bles not  directly  related  to  crop  functioning  (such  as 
precipitation  for  soil  or  crop  moisture  and  year  for 
technological  trend) 

Since  it  would  require  n>  least  LACIE  Phase  I 
(1974-75)  to  determine  whether  the  regression 
models  would  support  the  LACIE  accuracy  goal 
(estimates  of  regional  production  within  10  percent 
of  the  true  value  90  percent  of  the  time),  the 
feasibility  of  alternative  yield  models  was  an  issue 
for  RT&E  from  the  beginning. 

The  use  of  analog  areas  or  the  acquisition  of  a 
universal  model  were  the  options  considered  for 
overcoming  the  first  (area  specificity)  limitation. 
Theoretical  claims  and  empirical  evidence  to  support 
each  option  were  available,  but  neither  had  been 
tested.  In  support  of  yield,  RT&E  was  designed  to 
address  both  problems.  Descriptions  and  examples 
of  research  in  the  analog  region  method  are  given 
elsewhere  (see  reference  7 and  the  paper  by  Strom- 
men  et  al.  entitled  “Development  of  LACIE  CCE  A-l 
Weathcr/Wheat  Yield  Models”);  this  paper  will  sum- 
marize the  research  and  evaluation  of  models 
designed  to  be  applicable  to  any  given  region. 


A special  conference  was  held  at  the  beginning  of 
LACIE  to  review  the  state  of  the  art  in  wheat  yield 
models  (ref.  8).  Haun  (ref.  9),  Robertson  (ref.  10), 
and  others  presented  evidence  that  models  with 
universally  applicable  yield-weather  coefficients 
were  possible  if  they  accounted  for  factors  such  as 
the  following. 

1.  Defining  environmental  variables  in  biological 
time  rather  than  in  calendar  time 

2.  Using  soil  moisture  rather  than  precipitation  as 
the  moisture  supply  variable 

3.  Use  of  varieties,  fertilizer  application,  irriga- 
tion, etc.,  to  explicitly  explain  yield  trends 

4.  Natural  differences  in  soil  fertility,  water-hold- 
ing capacity,  etc. 

5.  Variable  representation  of  daily  to  weekly 
weather  and  soil  series  to  family  levels  of  detail 

Models  with  these  characteristics  were  “ailed  sec- 
ond generation  in  contrast  to  the  less-detailed  area- 
specific  but  senior  first-generation  models.  A discus- 
sion of  second-generation  models  and  how  they  com- 
pare to  other  models  is  given  in  the  appendix. 

The  more  detailed  and  theoretical  system  (third- 
generation)  models  of  crop  growth  mechanisms  were 
considered,  but  their  expansion  n>  predict  yields  with 
conventionally  reported  agricultural  and 
meteorological  data  was  not  practical  within  the 
scope  of  LACIE.  Existing  third-generation  models 
(see  appendix)  were  designed  primarily  for  research 
rather  than  for  operational  yield  predi  tion. 
Especially  lacking  were  the  submodels  necessary  for 
the  operation  of  these  models  with  conventional  me- 
teorological data  (air  temperatures  and  precipitation) 
and  the  extension  of  grain  yields  to  aggregabte  units 
of  area.  One  source  estimated  the  cost  of  developing 
a third-generation  model  to  be  $25  million. 

Another  initial  proposal  was  that  crop  appearance 
as  observable  in  satellite  data  could  readily  con- 
tribute much  to  yield  estimation  (ref.  8).  The  basic 
rationale  for  using  appearance  variables  comes  from 
the  fact  that  mathematical  models  cannot  account 
for  the  number  and  complexity  of  environmental 
and  agricultural  factors  which  are  integrated  into 
crop  yields.  Appearance  could  provide  an  estimate  of 
integrated  results  at  any  point  in  time  if  correctly  in- 
terfaced with  agromet  information.  Since  spectral- 
yield  relationships  should  change  with  stage  of  crop 
development,  growth  history,  and  cultural  practices, 
the  agromet  components  were  assessed  a leading  or 
host  role  in  a combined  model.  Also,  early  in  the 
season  and  i.nder  cloudy  conditions,  only  agromet 
data  may  be  available.  Thus,  research  to  derive  yield 
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information  from  Landsat  data  was  conducted  con* 
currently  with  investigations  using  meteorological 
and  other  agricultural  data. 

The  overall  objective  of  supporting  research  in 
second-generation  yield  models  was  to  attain  a step 
improvement  over  the  LACIE  baseline  yield  estima- 
tion technology.  Specific  objectives  were  as  follows. 

1.  To  obtain  a yield  estimation  capability  for  any 
arbitrary  unit  of  area  such  as  the  S-  by  6-nauticai-mile 
LACIE  sample  segment,  the  LACIE  agrophysical 
stratum,  or  a foreign  region  without  sufficient 
historical  data  to  develop  adequate  multiple  regres- 
sion models. 

2.  To  acquire  a model  that  can  be  readily  ex- 
panded to  use  additional  data  (temporal  and  spatial 
resolution)  and  data  sources  as  the  corresponding 
functional  relationships  to  yield  are  developed.  Ex- 
amples of  data  and  information  sources  not  used  in 
the  baseline  models  were  Landsat,  soil  survey,  soil 
moisture  models,  pest  models,  crop  calendar  models, 
and  nitrogen  use  models. 

3.  To  obtain  a model  that  uses  variables  more 
directly  related  to  yields  than  the  ones  used  in  the 
baseline  models  to  provide  estimates  that  are  more 
responsive  (and  correspondingly  more  accurate)  to 
actual  yield  fluctuations. 

The  objective  of  comparative  testing  and  evalua- 
tion of  yield  models  was  to  characterize  the  probable 
performance  of  candidate  models  relative  to  the 
LACIE  baseline.  The  minimum  requirements  to 
meet  this  objective  were  considered  to  be  the  follow- 
ing. 

1.  Comparison  of  overall  accuracies  of  yield  pre- 
dictions for  an  independent  set  of  test  data 

2.  Validation  of  model-generated  prediction  er- 
rors (variance) 

3.  identification  of  model  strengths  and  weak- 
nesses relative  to  the  specific  objectives  given  pre- 
viously 

Also,  it  was  considered  desirable  to  periodically 
examine  model  development  to  assess  additional 
research  requirements  and  likelihood  of  success. 
This  latter  objective  was  one  of  the  responsibilities  of 
the  yield  procedures  advisory  group  (see  the  paper 
by  McCrary  ct  al.). 


TECHNICAL  APPROACH 

The  overall  approach  was  to  first  evaluate  the  ex- 
isting yield  models  that  could  potentially  meet  the 
research  objectives.  The  candidate  models  initially 


found  were  developed  by  Baler  (ref.  I)  and  Haun 
(ref.  3),  and  contracts  were  awarded  to  Kansas  State 
University  (KSU,  Peyerherm)  andClemson  Univer- 
sity (Haun)  during  Phase  I for  their  evaluation  and 
adaptation  to  LACIE  requirements.  Additional 
research  for  the  development  of  Landsat-spectral- 
yield  relationships  was  begun  at  Texas  A & M 
University  (TAMU,  Harlan,  ref.  11).  the  Environ- 
mental Research  Institute  of  Michigan  (ERIM,  Col- 
well and  Suits,  ref.  12),  and  KSU  (Kanemasu,  ref.  4). 
As  new  problems  appeared,  other  research  or  evalua- 
tion was  initiated.  The  major  efforts  are  described  in 
separate  symposium  papers  and  other  publications; 
only  a summary  of  the  individual  approaches  is 
given  here. 

The  Baier  model  consists  of  a product  of  regres- 
sion functions  for  three  meteorological  variables 
with  each  function  containing  12  coefficients  to  ac- 
commodate polynomial  weights  for  stage  of  develop- 
ment. Fitting  to  experimental  yields  was  ac- 
complished with  an  iterative  procedure,  and  the 
coefficients  and  model  software  were  supplied  by 
Baier.  After  it  was  found  that  the  model  would  have 
to  be  rederived  (see  results  section)  and  difficulties 
with  the  fitting  algorithms  were  encountered,  the 
model  was  abandoned.  The  yield  equation  eventually 
reached  a linear  regression  form  and  became  known 
as  the  Peyerherm  model  (see  the  paper  by 
Feyerherm  and  Paulsen).  An  overview  of  the  funda- 
mental model  components,  as  described  in  the  ap- 
pendix, is  given  in  figure  1. 

The  unique  feature  of  the  Haun  model  is  the 
submodel  of  a growth-development  index  based  on 
daily  observations  of  relative  leaf  size  and  leaf  num- 
bers for  plants  in  experimental  plots.  Maximum  and 
minimum  temperatures  and  a Thornthwaite  esti- 
mated soil  moisture  parameter  are  then  integrated 
through  models  for  predicting  the  indexes.  The  in- 
dexes, along  with  preseason  precipitation,  formed  in- 
dependent variables  for  regression  against  county 
yields. 

EarthSal  (ref.  2)  used  submodels  to  estimate  daily 
precipitation,  modified  Penman  potential 
evapotranspiration  (PET),  soil  moisture,  actual 
evapotranspiration  (ET).  and  crop  calendars  at  22.5- 
by  22.5-kilometcr  (12.5-  by  12.5-nautical-mile)  cells 
from  first-order  stations  and  meteorological  satellite 
inputs.  Spring  wheat  yield  for  each  cell  was  predicted 
with  a linear  regression  equation  containing  a linear 
trend  term  (year  number)  and  a squared  moisture 
stress  term  (1  - ET/PET)  averaged  from  planting  to 
ripening.  Coefficients  in  the  equation  were  estimated 
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from  22  years  of  historical  data  from  Williams, 
Burleigh,  and  Cass  Counties,  North  Dakota.  Stratum 
(county,  crop  district,  and  state)  yield  values  were 
obtained  by  direct  averaging  of  cell  yields. 

During  Phase  II,  developmental  efforts  were  con- 
centrated on  the  Feyerherm  model.  As  it  became  evi- 
dent that  additional  revision  of  the  Feyerherm 
model  was  necessary,  new  exploratory  efforts  were 
initiated  in  each  of  the  cooperating  agencies  during 
Phase  III.  Also,  an  additional  effort  was  made  to 
develop  error  propagation  procedures  for  the 
Feyerherm  model,  since  that  requirement  was  not 
included  in  the  model  development  contract. 

The  approach  explored  by  the  National  Oceanic 
and  Atmospheric  Administration  (NOAA)  was  to 
use  crop-district-level  historical  data  and  to  define 
weekly  weather  variables  on  a crop  time  scale  for  use 
in  a regression  model  (see  the  paper  by  LeDuc). 
Palmer  water  balance  functions  were  used  to  gener- 
ate the  moisture  supply  variable.  Cultural  practices, 
such  as  irrigation,  fallowing,  varieties,  and  fertiliza- 
tion, were  considered  as  variables  to  explain  trends. 
North  Dakota  was  used  to  evaluate  the  feasibility  of 
this  approach. 


A wheat  yield  modeling  team  was  formed  in  the 
Science  and  Education  Administration  (SEA)  of  the 
U.S.  Department  of  Agriculture  (USDA).  The  pro- 
posed approach  would  ultimately  equate  yield  to  its 
morphological  components  (heads  per  acre  times 
kernels  per  head  times  weight  per  kernel).  Each  com- 
ponent would  contain  empirical  functions  of 
weather,  agronomic  data,  spectral  reflections  (Ltnd- 
sat),  and  other  factors  derived  by  submodels.  The 
model  development  was  designed  to  project  beyond 
the  time  frame  of  LACIE  and  include  the  collection 
of  detailed  experimental  field  data.  Initially,  a field 
study  on  winterkill  was  conducted  to  support  LACIE 
and  model  development. 

Cate  et  al.  proposed  during  Phase  III  that  the  Law 
of  the  Minimum  (Liebig)  be  used  to  relate  the  effects 
of  some  variables  to  crop  yields  and  tested 
algorithms  for  obtaining  such  functional  relation- 
ships (see  the  paper  by  Cate  et  al.).  The  basic  theory 
is  that  yield  is  determined  separately  (without 
substitution)  by  the  value  of  the  individual  variable 
in  the  involved  set  which  is  most  limiting.  The 
capability  to  add  another  variable,  which  has  a 
known  relationship  to  yield  when  it  is  the  limiting 
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factor,  without  changing  existing  functions  is  con- 
sidered a major  advantage. 

Through  analysis  of  many  data  sets,  including 
those  used  by  CCEA  and  Feyerherm,  the  proposed 
Law  of  the  Minimum  model  currently  exists  in 
terms  of  submodeled  variables  corresponding  to  syn- 
thesis and  loss  of  yield  matter.  To  demonstrate  the 
concept,  a model  with  synthesis  based  on  a single 
variable  and  with  coefficients  derived  from  research 
results  was  used  to  generate  a set  of  spring  wheat  test 
predictions.  With  this  approach,  a successful  model 
could  be  considered  akin  to  third-generation  models. 

Attempts  to  determine  which  yield-related  crop 
■parameters  are  “viewed"  in  Landsat  data  were  made 
in  both  theoretical  and  field  indicator  studies.  At 
ERIM,  crop  growth  and  bidirectional  reflectance 
simulation  models  were  used  to  infer  spectral-yield 
relationships  (ref.  12).  TAMU  acquired  helicopter- 
based  field  spectrometer  measurements  in  the 
LACIE  intensive  test  sites  and  subsequently 
analyzed  the  data  for  yield  information  (ref.  11). 
Other  correlation  studies  between  intensive  test  site 
yields  and  Landsat  data  were  performed  at  the 
NASA  Johnson  Space  Center  (JSC). 

Two  approaches  for  using  Landsat-based  predic- 
tions of  leaf  area  were  investigated  by  Kanemasu 
(ref.  4).  One  was  an  attempt  to  improve  on  the  tradi- 
tional evapotranspiration  yield  models  by  separating 
transpiration  from  soil  evaporation  with  the  aid  of 
satellite-estimated  leaf  areas.  Secondly,  leaf  area  esti- 
mates were  used  to  compute  light  interception  by  the 
crop  and  corresponding  growth  (net  carbon  ex- 
change from  photosynthesis  and  respiration).  Ad- 
justment factors  applied  to  grain  yield  (head  weight) 
could  then  give  yield. 

Comparative  evaluations  were  based  primarily  on 
the  statistical  analyses  of  independent  test  predic- 
tions by  the  yield  models  (candidate  and  baseline) 
and  their  departures  from  USD  A values.  Predictions 
for  10  or  more  independent  years  were  the  preferred 
test  data  since  year-to-year  yield  variability  is  the 
target  of  interest.  Mean  differences  and  mean 
squared  differences  were  subjected  to  the  paired 
t-test  for  bias  and  to  the  Wilcoxon  paired  rank  test 
for  relative  accuracies.  Ratios  of  modeled  to  ob- 
served prediction  errors  (variances)  were  compared 
with  the  standard  F-test.  Details  of  the  Wilcoxon 
paired  rank  test  are  given  by  Snedecor  and  Cochran 
(ref.  13)  and  by  Seeley  el  al.  (ref.  14). 

To  evaluate  models  which  could  not  be  operated 
retroactively  to  obtain  test  replications  over  years, 
time-series-based  yield  estimates  for  separate  area 


units  were  used  as  an  alternative  baseline.  For  exam- 
ple. in  order  to  compare  the  prediction  to  the  corre- 
sponding values  from  the  LACIE  models  in  a case 
with  only  1 year  of  test  predictions,  it  would  have  to 
be  assumed  that  the  model-propagated  error  (ac- 
curacy) estimates  are  valid  and  are  independent  be- 
tween regions.  Conversely,  comparing  yield  predic- 
tions from  a universal  agromet  model  to  those  from 
area-specific  time-series  estimates  is  valid  only  out- 
side areas  used  to  derive  coefficients  in  the  agromet 
model.  Time-series  estimates  are  defined  as  projec- 
tions on  trend  lines  fitted  to  yields  reported  by  the 
USD  A for  years  prior  to  the  lest  case.  Additional 
tests  to  evaluate  error  characteristics  and  model 
responsiveness  based  on  trend  projections  are  being 
documented  by  Stuff  and  Houston. 

Various  other  analytical  and  sensitivity  analyses 
w->re  used  separately  or  in  conjunction  with  the  pre- 
viously described  tests  to  evaluate  a model's  respon- 
siveness to  actual  inputs  as  well  as  to  systematic  and 
random  errors  in  the  inputs.  A sensitivity  analysis 
was  done  by  Hildreth  (ref.  IS)  using  the  Feyerherm 
model. 


RESULTS 

Test  runs  of  the  Baier  model  using  U.S.  data  pro- 
duced major  divergences  between  predicted  and  ac- 
tual (plot)  yields  or  realistic  daily  contributions  to 
yields.  Especially  erratic  results  were  obtained  when 
winter  wheat  yields  were  estimated  using  data  from 
Kansas.  The  erratic  results  were  first  attributed  to  the 
fact  that  input  data  were  outside  the  range  of  that 
used  to  develop  the  model;  however,  censoring  in- 
puts to  the  developmental  ranges  provided  little  im- 
provement. Separate  evaluation  of  the  soil  moisture 
submodel  gave  acceptable  results;  however,  the  crop 
calendar  submodel  was  found  inadequate  for  winter 
wheat.  After  collecting  the  data  set  for  rederivation 
of  model  coefficients,  it  was  decided  that  the  func- 
tional form  of  the  yield  model  could  be  improved; 
thus,  the  Feyerherm  model  evolved. 

Efforts  to  adapt  and  upgrade  the  Haun  model  indi- 
cated that  several  improvements  were  necessary  or 
possible.  Variables  representing  effects  of  posthead- 
ing conditions,  submodels  for  predicting  planting 
and  emergence  detes.  and  variables  representing 
technological  trend.*  were  considered  the  mqjor 
weaknesses  and  limitations.  Also,  the  model  had  not 
been  developed  or  calibrated  for  winter  wheat.  A 
plan  to  collect  new  data  from  experimental  fields 
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over  a wide  variety  or  wheat-growing  areas  was  initi- 
ated by  Haun. 

The  EarthSat  model  for  spring  wheat  was  evalu- 
ated during  Phase  il.  Because  of  the  model's  require- 
ment for  meteorological  satellite  data,  only  a current 
year  (I97S)  of  test  predictions  could  be  generated, 
and  no  model-propagated  values  of  prediction  ac- 
curacy were  formed.  Thus,  predictions  for  Montana, 
North  Dakota,  South  Dakota,  Minnesota,  and  their 
counties  and  crop  districts  were  compared  to  time- 
series  models  as  an  alternative  predictor  of  the  actual 
USDA  (preliminary)  values.  Linear  trends  based  on 
1948-74  crop  district  data  were  added  to  the  base 
yields  per  county  given  by  Larson  and  Thompson 
(ref.  16)  to  derive  the  time-series  estimates.  Ratios  of 
root  mean  squared  error  (RMSE)  indicated  that  the 
time-series  projections  were  S7  percent  more  accu- 
rate than  the  model  at  the  county  level  and  77  per- 
cent more  accurate  at  the  district  level  (table  I). 

A map  of  relative  errors  for  the  EarthSat  county 
predictions  (( > - Ylsiu)l  YVSM.  fig.  2)  shows  that 
errors  became  larger  and  more  erratic  with  distance 
from  the  counties  used  to  calibrate  the  model.  Simple 
least  squares  comparisons  between  predicted  and  re- 
ported yields  (table  I and  fig.  3)  indicated  that  the 
predictions  were  independent  of  the  USDA  values. 
(The  hypothesis  that  slope  is  equal  to  0 is  not  re- 
jected at  the  0.0S  level  of  probability.) 

Clearly,  the  EarthSat  model  did  not  meet  the  sec- 
ond-generation model  accuracy  objective.  By  com- 
parison. the  LACIE  models  predicted  about  70  per- 
cent more  accurately  than  the  time-series  models, 
(i.e„  R1SELAC|fc  ■»  2.0  for  these  four  stales  in 
1975).  Some  of  the  critical  deficiencies  were  con- 
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sidered  to  be  allowances  for  the  differential  effects  of 
stress  during  the  season,  temperature  effects,  and  soil 
fertility  variability.  The  preparation  of  data  for  some 
of  these  improvements  was  begun  by  EarthSat 
before  completion  of ; ;ic  evaluation. 

The  lack  of  rc«p:  :.se  in  the  EarthSat  model  at  the 
county  and  district  levels  provides  an  example  of  im- 
balance between  detail  in  the  model  and  its  applica- 
tion scales  as  discussed  in  the  appendix.  Also,  a sepa- 
rate analysis  of  predictions  of  precipitation  showed 
that  the  meteorological  submodel  was  not  as  accurate 
as  the  first-order  stations  for  predicting  precipitation 
at  cooperative  meteorological  stations. 
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942 


The  Feyerherm  model  was  compared  to  the 
L ACIE  baseline  during  Phase  III.  Ten  years  or  model 
predictions  were  made  at  the  crop  district  level  and 
were  aggregated  to  L ACIE  pseudozones  using  USDA 
acreages  (sec  the  paper  by  McCrary  et  a!.).  Statistics 
for  individual  pseudozones  and  aggregation  by  wheat 
types  are  given  in  table  II.  The  Wilcoxon  statistic 
calculated  for  crop  types  was  0.0S  and  1.78  for  spring 
and  winter  wheal,  respectively.  The  nonsignificance 
at  the  0.0$  level  of  probability  does  not  reject  the  hy- 
pothesis that  the  accuracy  of  the  two  spring  wheat 
models  was  equal. 

A restriction  on  the  equality  of  the  test  statistics 
because  of  different  meteorological  station  densities 
should  also  be  noted.  The  use  of  the  denser  coopera- 
tive station  network  in  the  calculation  of  test  yields 
for  the  CCE  A moacl  can  be  expected  to  significantly 
increase  estimation  precision  (lower  RMSE)  in  some 
cases.  Since  the  cooperative  station  data  are  not 
available  on  a real-time  basis,  the  Feyerherm  ac- 
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curacy  more  correctly  simulates  operational  condi- 
tions. 

The  approximate  equivalent  accuracy  obtained  by 
the  Feyerherm  model  was  considered  a significant 
technical  demonstration  of  the  concept  of  univer- 
sality in  second-generation  models.  To  evaluate  the 
extendability  of  the  models  to  a greater  extent,  test 
runs  were  made  for  several  areas  outside  the  U.S. 
Great  Plains.  Comparisons  of  these  predictions  with 
USDA  values  are  given  in  table  III  and  f»p  arc  4.  The 
results  are  comparable  to  those  obtained  for  the  U S. 
Great  Plains.  However,  since  4 to  10  years  of  histori- 
cal data  are  used  to  adjust  the  model  in  each  region, 
the  model  should  be  considered  only  quasi-universal. 

The  sensitivity  analyses  by  Hildreth  (ref.  1$) 
showed  that  yield  predictions  by  the  Feyerherm 
model  were  stable  with  respect  to  all  variables  in  the 
yield  equations.  A large  temperature  sensitivity  in 
the  biocalendar  estimates  of  stage  2.$  for  winter 
wheat  suggested  that  another  stage  may  be  more  ap- 


T Oil  I III. — Preliminary  rests  on  I'xtendinn  the 
Feyerherm  M inter  Wheat  Model 
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propriate  for  defining  yield  variables.  The  tem- 
perature and  precipitation  responses  were  reason- 
able. 

Deficiencies  in  the  modeling  data  base  appeared  to 
be  a mjgor  limiting  factor  in  the  performance  of  the 
Feyerherm  model.  In  particular,  the  data  contained 
few  observations  of  actual  crop  calendars  and  no 
measurements  of  soil  moisture  or  other  soil 
parameters.  The  use  of  submodels  to  generate  these 
data  (fig.  1)  can  introduce  large  errors  in  any  given 
case.  Also,  in  some  cases,  the  meteorological  stations 
were  considerable  distances  from  the  yield  plots; 
therefore,  considerable  “noise"  would  be  introduced 
in  the  precipitation  values  and  soil  moistures 
assigned  to  the  plots. 

The  Catc-Liebig  exploratory  spring  wheat  model 
was  run  for  10  years  of  independent  test  cases.  The 
mean  squared  errors  (see  the  paper  by  Cate  el  al.) 
were  generally  smaller  than  those  obtained  with  the 
L ACIE  or  Feyerherm  models,  but  the  difference  was 
not  statistically  significant  according  to  the  Wilcoxon 
test  (r  “ 1.17  versus  i-(O.OS)  ” 1.645).  The  practical 
significance  of  the  results  was  the  predictive  power 
illustrated  by  the  two  variables,  the  equivalence  be- 
tween a model  coefficient  and  experimental  results, 
and  the  applicability  of  the  model  to  an  extended 
region. 

The  CCEA-II  North  Dakota  prototype  model  was 
evaluated  initially  by  examining  test  predictions  for 
years  in  which  the  L ACIE  model  registered  major  er- 
rors. Comparisons  indicated  that  these  errors  with 
the  CCEA-II  model  were  smaller  than  with  the 
LAC1E  models  (see  the  paper  by  LeDuc). 

Examples  of  simulated  leaf  areas  and  bidirectional 
spectral  reflectance  in  the  infrared  (IR)  and  red 
LandAt  channels  are  given  in  figure  $.  The  theoreti- 
cal grain  yields  were  based  on  the  assumption  ?h  u 
yield  is  proportional  to  the  duration  of  green  leaf  area 
after  crop  heading  (ref.  12).  Also,  the  yield  relation- 
ships were  constrained  by  other  assumptions  neces- 
sary for  the  crop-growth  model  and  the  lack  of  data 
to  test  underlying  assumptions  or  submodels. 

In  practice,  it  was  found  that  l.andsat-prcdictcd 
leaf  areas  and  measured  leaf  areas  correlated  with 
coefficients  between  0.5  and  0.8  when  ranges  over 
the  entire  season  were  involved  Comparisons  of 
various  transformations  and  channel  combinations 
of  landsat  data  gave  approximately  the  same  results. 
Sample  models  are  given  by  Kanemasu  (ref.  4)  and 
results  are  reproduced  in  figure  6. 

Tests  performed  in  conjunction  with  the  develop- 
ment of  the  Kanemasu  transpiration  model  showed 


DAY  OP  THE  YEAH 


FIGURE  i.— ShualatfS  crop  aaS  iprctral  >iImi  mint  ERIM 
trowth  and  tqrlMion  rcOedoncr  model*. 

significant  correlation  between  actual  and  predicted 
yields  on  a field  basis.  No  aggregated  results  were 
derived  to  compare  the  procedure  to  a time-series  oi 
other  mudel.  Exploratory  versions  of  the  Landsat 
leaf  area  growth  model  indicated.that  a considerable 
amount  of  work  remains. 

Other  indicator  studies  based  on  correlations  be- 
tween Landsat  and  crop  data  showed  that  refiec 
lances  correlated  as  well  with  yield  as  did  other  crop 
parameters.  Sample  correlations  taken  from 
Thompson  (ref.  17)  are  given  in  table  IV.  The 
typically  lower  correlations  between  the  intermedi- 
ate crop  parameters  (ground  cover  in  this  case)  and 
yield  have  been  interpreted  to  indicate  that  Landsat 
data  measure  multiple  yield-related  factors  (ref.  18). 
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FIGURE  6.— Comparison  of  observed  and  predicted  LA!  using 
models  developed  by  Kanemasu  (ref.  4). 


CONCLUSIONS 

The  most  notable  overall  result  of  second-genera- 
tion yield  model  RT&E  was  that  none  of  the  models 
evaluated  performed  significantly  better  than  the 
LACIE  baseline  models.  Because  of  the  limitations 
of  the  regression  models  described  in  the  introduc- 
tion, it  could  be  argued  that  equal  accuracy  by  the 
Feyerhcrm  or  Cate-Liebig  models  in  the  U.S.  Great 
Plains  would  represent  more  accuracy  than  the 
regression  models  in  foreign  areas.  However,  the  ob- 
jective of  significantly  better  performance  than  the 
LACIE  models  in  the  U.S.  Great  Plains  “yardstick” 
region  is  valid  because  of  the  uncertainties  associated 
with  foreign  yield  data  used  in  comparative  testing. 

Model-building  data  inadequacies  could  be  iden- 
tified as  a limiting  factor  in  each  of  the  models  evalu- 
ated. The  -pe  and  quality  of  the  data  bases  ap- 
peared to  be  more  important  in  second-generation 
than  in  first-generation  models.  Certainly,  any  of  the 
models  could  be  improved  with  a better  data  base, 
but  the  relative  accuracy  of  the  resulting  models  still 
cannot  be  projected. 

The  Feyerherm  and  Cate-Liebig  models  provided 
estimates  of  the  degree  of  universality  that  may  be 
possible  with  second-generation  models.  Although 
they  provide  evidence  for  universal  weather-yield 
relationships,  they  indicate  that  a few  (4  to  10)  years 
of  historical  data  may  be  required  to  adjust  the  rela- 
tionships to  local  conditions. 

If  th^  second-generation  yield  models  were  com- 
pared to  biological  entities,  estimates  of  their  relative 
development  within  the  LACIE  conditions  and  time 


span  would  be  as  summarized  in  figure  7.  Although 
the  Feyerherm  model  reached  the  most  advanced 
status,  it  is  likely  that  it  would  be  rated  only  at  an  in- 
fantile or  juvenile  level  for  an  ideal  data  base.  In 
terms  of  evolutionary  potential,  each  model  is  con- 
sidered embryonic. 

In  addition  to  the  adequacy  of  data  bases  for  build- 
ing second-generation  models,  other  technical  issues 
remain  unsolved.  For  example,  the  optimum  area 
representation  has  not  been  determined  for  any  par- 
ticular model.  The  extent  to  which  weather  informa- 
tion should  be  averaged  or  sample  sizes  assigned  to 
data  from  weather  stations  or  satellites  should  also 
be  addressed.  The  potential  for  early-season  yield 
predictions  using  serial  correlations  (both  yield  and 
related  factors  in  space  and  lime),  economic  factors, 
and  preseason  conditions  was  not  assessed  relative  to 
weather  uncertainties  in  different  parts  of  the  crop 
season.  The  relationships  between  expected  yields, 
areas  not  harvested,  and  classification  errors  are  fac- 
tors to  be  investigated.  Certainly  some  direct  ac- 
counting should  be  made  for  biological  plagues  (dis- 
eases, insects,  and  weeds)  in  second-generation 
models,  even  though  eventually  they  may  be 
assessed  mainly  by  satellite  variables. 

It  is  strongly  recommended  that  a set  of  criteria  or 
prerequisites  be  developed  for  screening  models  in 
LACIE  follow-on  research  or  test  programs.  Partially 
as  a result  of  LACIE,  there  are  now  more  candidate 
models  and  several  aspects  should  be  considered  in 
their  screening.  The  scope  and  quality  of  the  data 
base  should  be  reviewed,  as  well  as  factors  in  the 
model.  Internal  procedures  for  propagating  errors  or 
variances  for  the  predictions  should  be  included  in 
the  model  development,  and  basic  indications  of 
model  competence  should  be  provided. 


FIGURE  7. — Schematic  diagram  of  yield  model  development 
and  relative  status  for  LACIE  data  sets. 
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Appendix 

Description  and  Relation  of  Second-Generation  to  Other 
Yield  Model  Terminology  and  Concepts 


When  considering  the  universe  or  yield-determin- 
ing factors— from  the  within-ileld  variability  of  soil, 
chemical,  and  physical  properties;  the  genetic 
variability  in  a crop  species;  and  the  cultural  and 
management  options  employable  by  each  different 
farmer  to  all  the  possible  combinations  of  daily 
weather;  the  dynamics  of  insect,  weed,  and  disease 
populations;  and  the  variable  and  unforeseen 
capabilities  of  particular  fields  of  plants  to  adapt— 
the  number  is,  for  all  practical  purposes,  infinite.  A 
first-order  breakdown  of  these  factors  is  illustrated  in 
figure  8.  To  mathematically  relate  observations  of  a 
finite  number  of  variables  to  yield  is  by  definition  a 
statistical  problem  or  abstraction.  Even  yield-deter- 
mining biochemical  processes  that  can  be  described 
by  deterministic  equations  require  stochastic 
parameters  to  appropriately  quantify  their  rates,  ir 
puts,  or  outputs.  Consequently,  numerous  ap- 
proaches have  been  proposed  for  estimating  crop 
yields  from  information  on  one  or  more  of  the  re- 
lated factors  at  different  levels  of  biological  detail, 
functioning,  and  scope. 

Crop  prediction  models  have  been  classified  ac- 
cording to  various  criteria  but  a standardized  tax- 
onomy is  not  apparent.  Stanhill  (ref.  19)  divided 
general  approaches  into  statistical  or  experimental 
(controlled  environments,  simulation,  etc.),  and 
Baicr  (ref.  20)  added  an  intermediate  category  which 
he  called  weather  analysis  models.  With  more  basic 
criteria  such  as  the  nature  of  functional  relations  be- 
tween variables,  models  arc  commonly  classified  as 
empirical  or  mechanistic  (ref.  21).  Other  divisions 
used  jointly  or  sequentially  with  the  above  arc 
stochastic  or  deterministic  (influence  of  probability 
parameters  on  inputs  or  outputs),  analytical  or 
numerical  (equation  solving  methods),  continuous 
or  discrete  (degree  of  continuity  in  possible  variable 
quantities),  and  dynamic  or  static  (dependence  of 
model  components  on  time  coordinates). 

Models  frequently  arc  described  as  physiological 
or  phenomenological  and  by  other  mainly  subjective 
terms  used  to  indicate  the  physical  abstractness 
and/or  biological  hierarchy  involved — not  model 
validity.  A mechanistic  simulation  model  may  con- 
tain environmental  variables,  “controlling"  rates  of 


photosynthesis,  respiration,  etc.,  but  be 
physiologically  less  valid  than  an  empirical  statistical 
model.  An  example  is  found  in  moisture  functions. 
Three  empirical  growth-moisture  relationships  are 
given  in  figure  9.  When  water  concentration  of  the 
root  environment  is  unconditionally  increased,  detri- 
mental levels  are  reached  because  of  the  exclusion  of 
oxygen  to  the  roots  (rice  is  an  exception  since  it  can 
internally  transport  oxygen  to  the  roots  from  the  at- 
mosphere); thus,  curve  B should  be  observed  in  cases 
where  precipitation  exceeds  soil  holding  capacities. 
Two  separate  functions  representing  the  positive 
effect  of  increased  moisture  and  the  negative  effect 
of  root  asphyxiation  would  be  the  most 
physiologically  correct.  Yet,  curve-A-type  functions 
are  frequently  used  in  mechanistic  simulation 
models  (ref.  22),  and  those  illustrated  in  figure  10 
commonly  occur  in  empirical  statistical  models. 
Clearly,  the  empirical  statistical  model  using  the 
more  abstract  precipitation  variable  (more  removed 
from  plan;  moisture  than  soil  moisture)  may  be 
physiologically  m^.e  valid  than  the  mechanistic 
simulation  model  that  does  not  account  for  asphyxia- 
tion or  other  negative  effects  of  excessive  moisture 
(such  as  associated  disease,  hail,  and  lodging 
damage).  It  is  safe  to  assume  that  a physi- 
cal/physiological explanation  can  be  found  for  any 
empirical  yield  model  which  has  predicting 


Mtil  HK  8. — flrxt  Mibdltislon  of  yield  factor*. 
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capability;  and,  conversely,  a mechanistic  simulation 
model  completely  designed  around  direct  physiologi- 
cal processes  may  not  have  any  yicld-prvdieting 
capability. 

In  LACIE,  yield  models  were  classified  as  first, 
second,  or  third  generation  according  to  their  readi- 
ness for  implementation  or  resources  necessary  to 
achieve  readiness.  Since  readiness  depended  on 
status  of  model  development,  availability  of  input  re- 


KltllRf 9. —Characteristic  plant  mpHiws  to  moisture  supplx 
obsvried  under  different  experimental  conditions. 


quirements,  comparative  testing,  etc.,  the  LACIE 
classifications  closely  paralleled  the  three  categories 
defined  by  Baicr  (ref.  20).  Some  general  charac- 
teristics associated  with  the  three  classes  of  models 
are  summarized  in  table  V. 

A key  factor  in  development  and  operat  onal  costs 
is  the  amount  or  level  of  detail  that  a model  is 
designed  to  capture.  Corresponding  to  the  increased 
level  of  detail  included  in  second-  and  third-genera- 


jum  MtcmunoN.  at. 

t'Uit  lt!  Itt. — I' samples  at  nonlinear  moisture  effects  found  in 
t.M  ti;  Phase  It  >leld  models. 
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Uon  models  are  general  difTerences  in  the  spatial  and 
temporal  scales  that  their  inputs  and  outputs 
typically  represent.  The  general  scale  ranges  covered 
by  models  in  each  category  are  sketched  in  figure  1 1. 
The  most  cost-effective  scales  for  a first-generation 
model  should  be  the  largest  level  of  aggregation  at 
which  the  temporal  and  spatial  variations  of  acreages 
and  yield-determining  factors  do  riot  cause  model 
predictions  to  exceed  the  desired  tolerance  limit.  If  a 
sufficient  amount  of  the  year-to-year  yield  variability 
is  caused  by  the  factors  operating  at  smaller  scales, 
success  in  yield  estimation  will  depend  on  having 
such  a level  of  detail  built  into  a model.  Given  a 
detailed  third-generation  model  of  known  precision 
for  fields  or  groups  of  fields,  the  application  scale 
issue  is  one  of  determining  the  tolerable  sampling  er- 
ror and  feasibility  of  the  required  sampling. 

A third  axis  could  be  added  to  figure  1 1 depicting 
the  level  of  biological  detail  from  a total  crop 
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FIGURE  II — Relationship  of  yield  model  types  to  Input/oulput 
scales. 
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FIGURE  12. — Fundamental  components  In  yield  mistering  for  production  estimation. 


ecosystem  to  molecular  levels  with  approximately 
the  same  distribution  of  model  types.  Stanhill  (ref. 
19)  related  meteorological  phenomena  to  time  and 
space  scales.  Likewise,  a hierarchy  of  soil  classifica- 
tion detail  could  be  identified  for  the  spatial  scale. 

The  fundamental  components  involved  in  models 
foi  yield  estimation  are  relational^  illustrated  in 
figure  12.  There  may  be  salient  or  subtle  differences 
in  the  nature  of  these  components  for  each  model 
type  which  could  be  used  as  a classification  criterion. 
Corresponding  descriptors  for  the  three  classes  of 
models  are  summarized  in  table  VI.  For  example, 
second-  and  third-generation  models  use 
progressively  more  submodels  to  generate  variables 
that  are  more  directly  related  to  yield-forming  proc- 
esses than  first-generation  models.  A third-genera- 


tion model  designed  to  use  conventional  data  should 
have  submodels  to  estimate  canopy  structure;  to 
derive  subcanopy  values  of  environmental  variables; 
and  to  estimate  nitrogen  application,  weed,  insect, 
and  disease  development,  photosynthesis,  respira- 
tion, and  translocation  for  any  given  field  or  sample 
“points"  in  a crop  stratum. 

Generally,  in  second-generation  models,  a 
heuristic  approach  which  combines  biological  theory 
with  empirical  results  is  used  to  define  the  cause- 
effect  relationships  between  yields  and  available  in- 
formation. The  LAC1E  objective  of  universality 
allows  the  models  to  be  less  dependent  on  historical 
data;  however,  some  actual  data  are  still  assumed  to 
be  necessary  for  the  local  conversion  from 
“modeled"  to  normally  harvested  yields. 


Tabu  17. — Descriptions  of  Fundamental  Components  for  Three  Types  of  Models 
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Simple  algebraic  to  extended 
series  of  differential  equa- 
tions 

Strata 

adjustments 

© 

Dependent  on  data  base 

Adjustments  for  various  soils 
management  or  other 
unique  strata  influences 
required 

Theoretically  not  required 

“Numbers  art  crovt- referenced  to  figure  12 
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A Universal  Model  for  Estimating  Wheat  Yields* 


A.  M.  Feyerfiemp  and  G.  M.  Paulsen a 


INTRODUCTION 

The  development  of  universal  wheat  yield  models 
to  show  separate  and  joint  effects  of  weather  and 
culture  on  yields  applicable  to  fall-  and  spring- 
planted  wheat  on  a global  basis  is  discussed.  The 
model  was  built  with  the  restriction  of  using  only 
weather-related  variables  (WRV’s)  based  on 
meteorological  variables  currently  observable 
globally.  Therefore,  only  daily  minimum  and  max- 
imum temperatures  and  precipitation  were  used. 

Early  in  the  research,  it  was  decided  to  use 
varietal-trial-plot-yield  data  from  state  experiment 
stations  in  a wide  range  of  climates  in  the  U.S.  Great 
Plains  to  build  basic  relationships  among  yields, 
weather,  and  culture.  Discussed  here  are  steps  taken 
to  develop  the  model;  its  application  on  a 
macrociimatic  scale  in  the  United  States,  the 
U.S.S.R.,  and  India;  and  potential  improvements. 
Added  details  of  the  work  described  here  can  be 
found  in  final  reports  of  contract  work  (refs.  1 to  3). 
The  work  of  Robertson  and  Baier  with  crop  calen- 
dars (ref.  4),  moisture  budgets  (ref.  3),  and 
yield/weather  modeling  (ref.  6)  had  the  greatest  in- 
fluence on  the  research  efforts. 


MODEL  DEVELOPMENT 


Model  Form 

Consider  the  problem  of  estimating  wheat  yield 
(production  per  unit  area)  in  a given  region  such  as  a 
county  or  crop  reporting  district  (CRD)  with 
weather  information  from  a station  (5)  and 


’Contribution  No.  78-268-A,  Department  of  Statistics  and 
Statistical  Laboratory  and  Department  of  Agronomy,  Kansas 
Agricultural  Experiment  Station,  Manhattan.  Kansas. 
aKansas  State  University,  Manhattan,  Kansas. 


knowledge  of  specified  cultural  practices  applied  In 
region  (R).  The  estimate  of  yield  for  year  y is  desig- 
nated by  MODy(R,S)  and  is  calculated  as  follows. 


MODy(R,S)  - A(*,S)  + B * WACy{R,S)  (1) 


where  AfODy(R,S)  - model-estimated  yield  in  year 
y for  region  (R)  with  weather 
at  station  (S) 

^(Rv$)  * a constant,  calculated  from 
historical  yield  and  cultural 
data  for  region  (A)  and 
weather  at  station  (5) 

B — a universal  constant  (0.75  for 
winter  wheat;  0.50  for  spring 
wheat) 

* — multiplication  sign 
WACy(R,S)  ■*  contribution  to  yield  of 
weather  and  culture  ( W AC) 

More  specifically. 


3 

W4Cy(K,S)  = £ Pfy <*)  * VYA/JO 
/“I 

* [ *ys) + »VS>*/V*>] (2) 


where,  for  region  (/?)  and  station  (5)  in  year  y, 
VYAy(R)  - a varietal  yielding  ability  (VYA) 
component,  which  is  an  average  of 
VYA  values  for  varieties  planted  in 
yeary 

PJy(R)  - proportion  of  wheat  under  cropping 
practice./  (/  — 1 - continuous;./  — 2 
- fallow;/  — 3 - irrigated) 

Nljy(R ) — amount  of  elemental  nitrogen  ap- 
plied for  cropping  practice  / (J  — 
1,2,3) 
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Wjy(S)  — weather-generated  component  of 
yield  for  wheat  under  croppi  ng  prac- 
tice; (/-  1,2,3) 

1*0 (S)  • weather-generated  coefficient  of  Nl 
The  last  two  quantities  are  mathematical  func- 
tions of  WRV's  calculated  from  daily  readings  of 
precipitation  and  minimum  and  maximum  tem- 
peratures. A major  part  of  model  development  in- 
volved determining  the  relationship  between  wheat 
yields  and  WRV's  to  generate  values  of  H^(5)  and 
1*0(5),  which  is  discussed  in  the  following  sections. 


Data  Bata 

The  weather-related  yield  components  were 
assumed  to  be  linear  functions  of  WRV's,  designated 
by  X.  Thus, 

Wj(S)  = BqXq  + BxXXj  + ...  + j = 1,2,3 

(3) 

The  coefficients  denoted  B in  equation  (3)  are 
universal  constants  with  separate  sets  for  fall-planted 
(winter)  and  spring-planted  (spring  and  durum) 
wheat.  Likewise,  1*0(5)  was  assumed  to  be  a linear 
function  of  WRV. 

Plot  data  from  intrastate  and  regional  nurseries 
for  varietal  trials  and  meteorological  data  (daily  pre- 
cipitation and  minimum  and  maximum  tem- 
peratures) from  nearby  weather  stations  provided 
the  basic  data  td  estimate  the  values  of  B in  equation 
(3)  and  in  Wq (5)  and  to  calculate  VYA  values  for 
specific  varieties.  Plot  data  included  yield  by  variety, 
cropping  practice  (continuous  or  fallowed),  and 
amount  of  elemental  nitrogen  applied.  In  addition, 
planting  and  heading  dates  from  the  plots  were  used 
to  calibrate  crop  calendars. 

Plot  data  were  secured  from  64  state  agricultural 
experiment  stations  in  the  mqor  wheat-producing 
states  of  the  U.S.  Great  Plains  and  the  Eastern  Great 
Plains.  Estimates  of  the  winter  wheat  coefficients  (B- 
values)  in  equation  (3)  were  based  on  1034  location- 
years;  those  for  spring  wheat,  on  306. 

Once  the  universal  constants  (coefficients)  in 
equation  (3)  and  1*0(5)  were  estimated,  it  was  possi- 
ble to  generate  historical  values  of  W/.S)  (J  - 1,2,3) 
at  any  weather  station.  The  W^(S)  values  were  com- 
bined with  historical  data  of  the  U.  S.  Department  of 


Agriculture  (USDA)  Statistical  Reporting  Service 
(SRS)  and  Economic  Research  Service  (ERS)  on 
varieties  planted,  proportion  of  wheat  under  the 
three  cropping  practices,  and  amount  of  nitrogen  ap- 
plied to  generate  W/(C1(/?,S)  values,  by  year,  for 
CRD't  in  the  U.S.  Great  Plains.  The  historical  WAC 
values  were  combined  with  USDA  SRS  estimates  of 
yield,  first  to  estimate  the  universal  constant  Band 
then  to  estimate  A{R.S)  values  in  equation  (1). 


Standardization  of  Ylolda 
and  Crop  Calendars 

To  relate  yield  variation  using  many  varieties  to 
weather  variation  using  data  over  a wide  range  of  cli- 
mates, Kansas  State  University  (KSU)  had  to  adjust 
yields  to  a “standard"  variety  and  to  calculate 
WRV’s  over  common  phenological  phases  (e-g., 
jointing  to  heading)  rather  than  coincident  calendar 
days  (e.g.,  April). 

To  accomplish  the  first  task,  KSU  developed 
VYA  values  for  varieties  that  became  popular  with 
producers.  The  VYA  values  were  computed  by  first 
comparing  yields  of  each  variety  with  every  other 
variety  over  all  location-years  for  which  data  were 
available  within  regions  of  varietal  adaptability.  The 
VYA  value  for  a variety  (v)  was  an  expression  of  the 
yield  capability  of  variety  (v)  to  that  of  a standard 
variety  (s)  as  a ratio.  The  final  value  assigned  to 
variety  (v)  incorporated  not  only  the  direct  com- 
parison of  (v)  and  (j)  but  also  indirect  comparisons 
through  application  of  a chain  rule  with  other 
varieties  as  intermediaries.  Some  representative 
VYA  values  are  shown  in  table  I. 

The  need  to  identify  common  phenological  phases 
in  different  climates  was  satisfied  by  Robertson's 
biometeorological  time  scale  (BMTS)  (ref.  4)  for 
spring  wheat  and  an  adjusted  form  of  the  BMTS  (ref. 
I)  for  winter  wheat.  Correspondence  of  points  on  the 
BMTS  to  phenological  stages  is  as  follows. 


BMTS 

Stage 

BMTS 

Stage 

0.0 

P— Planting 

3.0 

H— Heading 

1.0 

E— Emergence 

3.5 

M— Milk 

1.5 

T— Tillering 

4.0 

D— Dough 

2.0 

J— Jointing 

5.0 

R— Ripe 

2.5 

F— Flag  leaf 

(Robertson's  BMTS  included  stages  with  whole 
numbers;  KSU  added  names  to  1.5,  2.5,  and  3.5  to 
facilitate  the  discussion.) 
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A Soil  Moittur*  Budgot 


Estimation  of  Universal  Conatants 


Soil  moisture  conditions  are  better  indicators  of 
water  supply  or  stress  for  plants  than  is  precipitation. 
Baier  and  Robertson's  versatile  soil  moisture  budget 
(VSMB)  (ref.  3)  was  used  by  KSU  to  simulate  soil 
moisture  conditions.  The  VSMB  needs  only  daily 
precipitation  and  temperature  extremes  to  operate 
rnd  provides  a tool  to  reconstruct  historical  soil 
moisture  conditions  from  data  available  worldwide. 
In  application,  KSU  used  a plant-available  water 
capacity  of  10  inches  for  all  seasons  and  locations. 


Table!. — VYA  Values o/SomeRepresentative 
Varieties 


U.S.  Great  Plains 

Variety 

VYA 

Release  date 

region 

Winter  wheal 


Northern 

Kharkof 

1.00 

1900 

Cheyenne 

1.04 

1933 

Winalta 

1.13 

1961 

Scout 

1.25 

1963 

Central 

Turkey 

.85 

1875 

Kharkof 

.88 

1900 

Comanche 

100 

1942 

Bison 

1.07 

1956 

Scout 

1.22 

1963 

Sage 

1.23 

1973 

Southern 

Kharkof 

.81 

1900 

Comanche 

1.00 

1942 

Triumph 

1.07 

1940 

Concho 

1.12 

1954 

Triumph  64 

1.21 

1964 

Eastern 

Trumbull 

.93 

1916 

Pawnee 

100 

1943 

Butler 

1.09 

1947 

Ben  Hur 

1.17 

1966 

Arthur 

1 35 

1968 

Spring  and  durum  wheal 


Northern 

Marquis 

085 

1907 

Thatcher 

1.00 

1934 

Canthatch 

1.09 

1959 

Crim 

1.15 

1963 

Wells 

1.19 

1960 

Era 

142 

1970 

Values  for  the  constant  B in  equation  (3)  were 
estimated  by  regressing  standardized  plot  yields 
(yields  divided  by  VYA)  on  the  WRV's  (3fs). 
Definitions  for  the  WRV's  are  as  follows. 

AE(a,b ) “ VSMB  simulated  evapotranspiration 
from  stage  a to  stage  b 

PE(a.b)  — VSMB  simulated  potential 
evapotranspiration  from  stage  a to 
stage  b 

RE(a,b)  - AE(a,b)/PE(a.b) 

SM(a,b\a)  * (1  - RE(a,b)/a\+%  a soil  moisture 
stress  term 

CNT{a)  - contents  of  zones  4 and  5 in  the 
VSMB  at  stage  a 

SSM(a-a)  - (1  - CNT(a)/a]+ , a subsoil 
moisture  stress  term 

PR(a.b)  - precipitation  from  stage  a to  stage  b 
XPR(a,b:a)  ■■  \PR(a,b)  — oJ+,  an  excess  precipita- 
tion term 

Tl\a.b:a)  - PR(a.b)  if  PR(a.b)  * a 

— a if  PR(a.b)  > o,  a truncated  pre- 
cipitation quantity 
77V  - daily  minimum  temperature 
TX  “ daily  maximum  temperature 

ATX(a.b)  - average  daily  maximum  tem- 
perature from  stage  a to  stage  b 
ATX(a.b-.a)  - [ATX(a.b)  - a)+ 

TN(a,b:a)  — average  daily  value  of  (77V  — «]+ 
from  stage  a to  stage  b 

TX(a,b.a)  ■*  average  daily  value  of  [TX  - al+ 
from  stage  a to  stage  b 
JT  — long-term  average  daily  temperature 
for  January 

FL  — 0 for  continuously  cropped  wheat;  1 
for  wheat  on  fallowed  ground 

In  the  preceding  definitions,  (31+  - X if  X 0, 
but  (Y]+  “ 0 if  X < 0.  A number  of  the  definitions 
involved  thresholds,  designated  by  a;  and  the  values 
of  the  variable  are  constant  for  arguments  either 
above  or  below  the  threshold  values. 

The  entries  in  table  II  combine  to  express  the 
year-to-year  and  location-to-location  variations  in 
yield  due  to  meteorological  variation.  The  signs  on 
the  fl-values  and  their  magnitudes  appear 
agronomically  acceptable.  The  winter  wheat  model 
reflects  some  known  facts;  namely,  the  deleterious 
effects  of  moisture  stress,  particularly  from  jointing 
to  the  milk  stage;  excess  precipitation  after  heading; 
and  warm  temperatures  throughout  the  season.  For 
spring-planted  wheat,  warm  temperatures  are  cer- 
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tainly  deleterious.  The  moisture  stress  situation  is 
not  so  clear-cut.  The  model  indicates  that  precipita- 
tion is  most  beneficial  if  “dryout”  periods  occur  be- 
tween adequate  rains. 

The  subscript  yon  some  of  the  WRV’s  in  table  II 
indicates  that  the  variables  assume  different  values 
depending  on  cropping  practice.  The  KSU  computer 
version  of  the  VSMB  curies  two  moisture  budgets, 
one  for  continuous  wheat  and  one  for  wheat  planted 
on  fallowed  soil.  The  beneficial  effects  of  fallowing 
are  expressed  indirectly  through  the  soil  moisture 
terms  and  directly,  in  the  winter  wheat  model, 
through  the  FL  term. 


Table  II.— -Formulas  to  Calculate  Weather 
Components  of  YiehP 


lEq.  mi 


B 

WRV(X) 

Winter  wheal 

JOS 

1 

“10.3 

ISSA/702))1 

~ 16.5 J|l  - 002  ’ 77] 

sM/trm 

-.37  *JT 

ISM/J.P.OM)]1 

ISM/FMOM))1 

ISM/H.M:0.9)\} 

XPft(H.l>.4) 

-15*77 

-6.40 

— I.0J 

-06 

ATXiE.D 

-.17 

ATXlTJl 

-Inin  +o.23*  man] 

mJ.P.SO) 

-.40 

mFM.SO) 

-.60 

TXiH.MU ) 

-.47 

TX(M.t>.  86) 

.37(1  — 0.02S  * 77] 

FLj 

Spring  wheal 

154.4 

1 

3.66 

TPiP./.i) 

TPlPM.S) 

TP/PM9) 

3.11 

-2.45 

-9.16 

REiTJ) 

3.86 

AEIH.M) 

1.89 

CSTiM) 

ATftJ.F) 

-.47 

-.37 

ATXiF.Ht 

—.341!  +0.11  - PRiH.m 

ATXIHM 

-.59 

ATX(M.D) 

-.29 

ATX(D.Rlt) 

*V*M  i»  miHMNd  to  buthtlt  pa  ten,  urapcniurc  to  detrw-Fihrcnhei!.  «ni 
pracipiuikM  to  incha 


For  irrigated  winter  wheat  (J  m 3),  alt  SM  and 
SSM  terms  are  set  equal  to  zero  and  Ft « 0.  For  In 
rigated  spring  wheat,  moisture-related  variables  are 
fixed  as  follows'.  7?,(F./.3)  - 3.0;  TPxiPMS)  ■ 
5.0;  TPy(PM9)  - 5.0;  AfyHM  - 2.5;  and  CNTy 
(M)  - 4.0. 

The  coefficient  of  Nt  in  equation  (2)  is  weather 
related,  and  W0(S)  - 0.1?  + 8)(0.016  - 

0.007  * JT\  for  winter  wheat  and  simply  0.09  for 
spring  wheat.  For  winter  wheat,  a leaching  effect  due 
to  excess  precipitation  reduces  the  contribution  of 
each  pound  of  nitrogen  to  yield.  Hera,  AY  is  measured 
in  pounds  per  acre.  That  completes  discussion  of  the 
universal  constants  in  equation  (2). 

To  estimate  the  constants  in  equation  (1),  B and 
A(R,S),  KSU  chose  and  matched,  approximately  one 
to  one,  CRD's  with  first-order  U.S.  Weather  Bureau, 
Federal  Aviation  Administration,  and  military 
weather  stations  in  the  U.S.  Great  Plains.  KSU  then 
computed  WAC(R,S)  values,  given  in  equation  (2), 
for  as  many  as  22  years  (1955-76)  for  most  of  the 
region-station  (if, 5)  combinations.  Government- 
reported  (USDA  SRS)  yields  GOV(R)  were  retrieved 
for  each  region  (CRD)  and  b{R)  was  computed  as 
follows: 


£ (Zy  ~ 7){Xy  ~ 

Wl  ■ (4) 

£ (*,  - Tf 

y t 


for  each  (if.S),  where  Zy  — GOVy(R)  and  Xy  — 
WAC^R.S)  and  the  sums  were  over  years  y.  Then.  B 
was  calculated  as  a weighted  average  of  b(R)  as 
follows. 


(5) 

K-l 


where  q(R)  - proportion  of  U.S.  Great  Plains  har- 
vested acres  allocated  to  region  (R)  and  R$  — total 
regions.  The  q(R)  value  was  calculated  from  average 
USDA  SRS  acreage  estimates  for  1971-75.  Results 
gave  B — 0.75  for  winter  wheat  and  B “ 0.50  (0.51 
before  rounding)  for  spring  wheat. 


954 


Finally,  a value  for  the  regional  constant  A(R, St 
was  calculated  from  historical  data  by  the  formula 

Aifl.S)  - GOV{fl)  B*WAC{R.S)  (6) 


where  the  means  were  calculated  over  as  many  years 
as  were  covered  by  data.  In  application,  only  the 
historical  means  of  government-reported  yields  are 
used  for  estimation.  All  other  constants  in  the  model 
are  universal  and  were  derived  independently  of  the 
region  {R)  for  which  an  estimate  was  desired. 


APPLICATIONS 

Application  of  the  model  requires  values  for  the 
WRV's  and  the  cultural  variables.  Values  for  all 
WRV’s  can  be  generated  from  daily  readings  of  pre- 
cipitation, minimum  and  maximum  temperatures  at 
a station  (S),  tabled  values  of  Q (solar  radiati  on  at  the 
edge  of  the  atmosphere),  and  day  lentil.  The 
cultural  variables  needed  for  a region  ( R ) ut  VYA, 
amount  of  nitrogen  applied,  and  proportions  of 
wheat. 

Historical  values  for  cultural  variables  may  be 
more  difficult  to  determine  than  are  values  of 
WRV's.  However,  the  model  is  relatively  insensitive 
to  modest  errors  in  observation,  and  estimates  from 
“experts”  can  be  used  to  good  advantage.  The  model 
has  some  seif-correcting  capability  in  that  consistent 
overestimates  or  underestimates  of  WAC(R.S) 
values  over  seasons  can  be  partly  offset  by  the 
A(R.S)  values,  which  may  be  recalculated 
periodically.  Further,  cultural  variables  change 
slowly  from  year  to  year.  It  is  weather  that  produces 
abrupt  shifts  in  yield  in  any  given  semiarid  region 
(R)  from  one  year  to  the  next. 


Unitod  States 

Results  of  KSU  application  of  the  model  in  all  the 
major  wheat-growing  areas  of  the  United  States  are 
summarized  in  tables  111  and  IV.  The  model  was  ap- 
plied to  weather  station-region  (CRD)  combinations 
with  a density  of  less  than  one  station  per  CRD,  and 
the  yields  were  aggregated  upward  to  multistate 
results.  Average  acreages  during  1971-75  were  used 
as  weights  in  the  aggregation.  The  USDA  SRS  yields 
were  aggregated  upward  with  the  same  weights. 


Table  lll.-\'*Kparison » of Model  (MOD) 
and  SRS  Estimates  of  Yields 
MVS  Great  Plains 
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26 

27 

1976 

27 
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26 
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Table  IV.— Comparison  of  Model  (MOD)  and  USDA 
SRS  Winter  Wheat  Yield  Estimates  for  the 
Eastern  Great  Mains  and  the  Northwest 
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The  multi  for  the  U.S.  Oran  Plaint  (table  III) 
show  that,  despite  a sparse  weather  network,  model* 
generated  yields  were  within  ±1  bushel  per  acre  of 
SRS  estimates  in  yean  of  lowest  yields.  On  the  other 
hand,  the  model  underestimated  SRS  yields  by  3 to  4 
bushels  per  acre  in  the  years  of  highest  yield*. 

Table  IV,  for  the  Eastern  Great  Plains,  shows  an 
increase  in  yields  from  196S  to  1972  due  largely  to  in- 
creased nitrogen  applications  and  the  introduction  of 
high-yielding  semidwarf  varieties  like  Arthur.  The 
model  overestimated  yields  in  1973  and  1974  became 
of  a septoria  epidemic  and  in  1976  because  of  late 
Deeses  at  heading  time.  The  model  does  not  account 
for  (ones  due  to  disease,  epidemics,  and  late  freezes. 
For  those  years,  the  model  estimates  what  yields 
could  have  been  without  diseases  and  late  freezes. 

Application  to  the  Northwest,  shown  in  table  IV, 
provides  a challenging  test  for  the  model  bemuse  no 
data  from  that  region  were  used  to  estimate  universal 
constants.  In  addition,  historical  data  on  cropping 
practices  were  unavailable  for  Washington  and 
Oregon,  and  “rough"  estimates  were  used  in  the 
calculations.  Except  for  a few  years,  the  model  and 
the  SRS  estimates  agree  well.  The  overestimate  for 
1973  is  partly  accounted  for  by  a large  area  of  win- 
terkill in  Washington.  In  1976,  poor  yields  in  Idaho 
reduced  the  average.  The  underestimate  for  1971 
probably  resulted  from  general  underestimation  of 
high  yields  by  the  model. 


U.S.S.R. 

The  KSU  model  was  applied  to  three  oblasts 
(states)  in  the  U.S.S.R.;  comparisons  between  model 
estimates  and  yields  reported  by  the  U.S.S.R.  are 
shown  in  table  V.  The  poor  winter  wheat  yields  in 
1968  in  the  Khmel'nitskiy  oblast  (part  of  the 
Ukraine)  were  detected  by  the  model.  Although 
U5J.R.  winter  wheat  yields  were  low  in  1972,  such 
was  not  the  case  in  Khmel'nitskiy;  the  model  sub- 
stantiated that  fact. 

For  spring  wheat,  there  is  a difference  of  7 bushels 
per  acre  between  MODsnd  GOV (or  Kurgan  in  tv  "'); 
a difference  that  large  can  appear  with  data  tt-x 
only  one  weather  station  for  such  a large  region.  The 
agreement  between  MOD  and  GOV  estimates  is  ex- 
tremely good  for  Tselinograd  in  the  Kazakhstan 
region.  Especially  notable  was  the  model's  detection 
of  a relatively  high  yield  in  1972. 


Table  V,— -Comparison  qf  Model  (MOD)  and 
Government-Reported . GOV)  Yields  for  Three 
Oblasts  In  me  U.S.S.R. 
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41 
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India 

The  model  was  applied  to  five  wheat-growing 
states  in  India  for  the  4 years  from  1972  to  1975; 
comparisons  of  model  estimates  and  yields  reported 
by  the  Government  of  India  are  shown  in  table  VI. 
The  model  was  run  with  a normal  crop  calendar 
because  Robertson's  BMTS  was  not  applicable  at  the 
lower  latitudes.  Irrigated  wheat  was  assumed  to  be 
composed  of  high-yielding  varieties  (VYA  • 1.30) 
with  30  pounds  per  acre  of  nitrogen  applied.  Hie 
universal  constant  B in  equation  (1)  was  set  equal  to 

O. 70  because  the  analysis  was  run  before  the  final 
decision  to  urn  0.75  was  made. 

With  irrigation  and  high-yield  varieties,  India 
achieved  rather  uniform  year-to-year  yields  in  those 
five  states.  Table  VI  shows  not  only  yields  but  also 
the  proportion  of  wheat  irrigated  F;and  not  irrigated 

P.  The  weather  components  ( Wj  AS)  of  eq.  (2);  y — 
1972. 1973. 1974,  l9?S;y  - U)  of  WAC  values  were 
averaged  over  the  weather  stations  within  a stale  and 
indicate  the  importance  of  irrigation  in  some  states. 
The  A(R)  values,  averaged  over  AIRS)  values  (eqs. 
(I)  and  (6)),  reflect  relatively  poor  soils  in  Uttar  Pra- 
desh. 

In  summary,  the  model  shows  how  to  combine 
weather  and  culture  to  explain  yield  variation  from 
state  to  state  in  India. 
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Tabu  Vt.— Comparison  of  Winter  Wheat  Yields  in  India 


(a)  Estimated  and  reported  yields 


Marmt 

Yield,  butane.  for — 

ytar 

Po-itab 

Rajasthan 

Haryana 

Uttar  Pradesh 

ethor 

MOi'  GOV 

MOD 

GOV 

MOD 

GOV 

MOD 

GOV 

MOD  GOV 

1071 

31 

36 

10 

to 

16 

30 

17 

10 

22 

2* 

1073 

34 

33 

II 

10 

16 

16 

16 

11 

— 

— 

1074 

36 

33 

16 

16 

17 

13 

16 

14 

21 

IS 

1075 

36 

36 

10 

10 

17 

16 

10 

17 

11 

10 

/!<*>• 

-0.7 

-10 

-3.; 

- 

0.0 

-03 

(b)  Proportion  of  wheat  otto  Irrigated  and  nontrrtgated' 

Harm t 

Area,  percent.  for~- 

ytttf 

Punjab 

Rajasthan 

Haryana 

Uttar  Pradesh 

Bihar 

Fc 

fj 

*c 

'/ 

*< 

'/ 

f 1 

fj 

1071 

13 

17 

33 

67 

17 

•3 

33 

67 

47 

S3 

1073 

11 

M 

17 

73 

16 

14 

31 

60 

— 

— 

1074 

11 

St 

34 

66 

14 

16 

30 

70 

41 

$0 

107$ 

11 

IS 

30 

70 

14 

•6 

30 

70 

40 

60 

(c)  Yield  attributed  to  indicated  weather  component f 


Harm'  Yield,  butane.  for— 

year  1 — 


Punjab  Rajasthan  Norvana  (J.tat  Pradesh  Bihar 


w 

c 

w 1 

"c 

Wi 

wi 

K 

Wi 

*c 

wi 

1072 

31 

35 

It 

26 

20 

20 

24 

30 

10 

26 

1073 

21 

35 

5 

2$ 

10 

30 

20 

20 

— 

— 

1074 

34 

37 

3 

25 

13 

31 

17 

31 

12 

27 

1075 

32 

37 

1 

2t 

13 

32 

23 

32 

15 

26 

•*-010 

- MaunoMMlt  tr«»Md  <4l,bn4l.  t/  — »fl(*M4 
‘ H|  “ dryland  compoocai.  W t - irt^aud  (OfllfOMni 


Stability  of  A{R)  Ovar  Tima 

With  sufficient  historical  yield  and  weather  data, 
it  is  possible  to  study  the  stability  of  the  local  con- 
stant term  over  time.  Equation  (6)  gives  the 
basic  relation  between  A(R,5 ).  Government- 
reported  yields,  and  average  WAC(R.S)  values. 


When  results  over  CRD's  and  states  are  combined, 
the  (5)  for  stations  is  dropped  and  A(.R)  is  used. 
Thus. 


MR)  • GO\\R ) - B • KdCUU  (7) 
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For  U.S.  data,  /((R)  is  calculated  for  1955-64  and 
1967-76,  centered  approximately  on  1960  and  1972; 
and  the  differences  are  considered. 


An(R)  - a60w 


Cultural  changes,  as  measured  by  the  model,  explain 
all  the  increase  in  yield  in  the  Northwest. 

For  spring  wheat,  the  model  explains  about  SO 
percent  of  the  6.3-bushel-per-acre  increase.  KSU 
believes  that  a major  portion  of  the  remaining  in- 
crease resulted  from  planting  later  and  using  more 
herbicides  to  control  weeds  between  1960  and  1972. 


If  one  can  assume  that  10-year  means  average  out 
much  of  the  weather  variation,  then 


GOV 72(/?)  - GOV60(R)  = a measure  of  the  effect  of 

cultural  changes  on  yields 


Weather  Station  Donalty 

Applications  discussed  previously  were  based  on  a 
rather  sparse  network  of  weather  stations.  The  effect 
of  more  weather  stations  using  Kansas  data  from 
19S5  to  1976  was  investigated.  Increasing  the  number 
of  weather  stations  from  7 to  42  reduced  the  root 
mean  square  error  (RMSE)  — [N~x  * [MOD(R)  — 
GOV(/?))2},/2from  3.1  to  2.6,  or  by  16percent. 


(9) 

Oiaaaaa  Loaaoa 


except  when  either  10-year  period  includes 
severe  disease  epidemics.  If  B * [WACjj(R)  — 
WAC^m  - l$Uvn(R)  - GUv^(R)l  then 
An(R)  — Am(R)  - 0.  The  latter  result  not  only  in- 
dicates that  A(R)  is  quite  stable  but  also  that  the 
model  explains  most  of  the  increase  in  yields  from 
cultural  changes.  Results  of  this  type  of  analysis  are 
shown  in  table  VII. 

For  winter  wheat  in  the  U.S.  Great  Plains,  the 
model  explains  5.0  of  a 6.9-bushel-per-acre  change, 
leaving  a 1.9-bushel-per-acre  increase  due  to  non- 
modeled  causes  and/or  weather  variation  not  com- 
pletely averaged  out.  For  the  Eastern  Great  Plains, 
UUVn{R)  — 0OV10(R)  — 6.3  becomes  a poor 
measure  of  cultural  change  because  of  the  septoria 
epidemics  in  1973  and  1974  (table  IV),  which 
reduced  yields  that  were  not  detected  by  the  model. 


In  conjunction  with  the  station  density  study, 
KSU  considered  how  much  improvement  could  be 
achieved  if  losses  due  to  stem  and  leaf  rust  were 
known.  With  data  supplied  by  the  USDA  Cereal 
Rust  Laboratory  in  Saint  Paul,  Minnesota,  KSU 
reduced  model  yields  by  the  percentages  indicated, 
recalculated  its  constants  to  adjust  WAC  values  to  a 
regional  level,  and  further  reduced  the  RMSE  from 
2.6  to  2.3.  Thus,  the  combined  benefit  of  more 
weather  stations  and  knowledge  of  rust  losses 
reduced  the  RMSE  by  26  percent. 

After  application  of  a high-density  weather  net- 
work and  rust  loss  information,  the  remaining 
“large"  differences  between  the  model  and  the 
USDA  SRS  resulted  from  underestimates  of  high 
yields  in  1970-73  and  an  overestimate  in  1966  due  to 
freezes  at  heading. 


Table  VI/.— Change  in  Yield  of  Winter  Wheat  A veraged  Over  Two  10-  Year  Periods 

!Eq.  (8)1 
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POTENTIAL  IMPROVEMENTS 

Applying  the  model  with  diverse  climatic  data 
showed  several  shortcomings  and  pointed  to  needed 
improvement.  KSU  identified  some  decision  points 
in  model  development  in  which  alternative  pro* 
cedures  would  have  been  more  productive.  Alterna- 
tive procedures  include  the  following. 

1 . For  winter  wheat,  use  a “normal"  crop  calendar 
calibrated  to  each  weather  station  location  divided 
into  phases  by  the  following  nine  stages:  planting, 
beginning  of  dormancy,  end  of  dormancy,  jointing, 
flag  leaf,  heading,  milk,  dough,  and  ripe.  The  normal 
crop  calendar  would  be  fixed  over  years. 

2.  Screen  plot  yield  data  to  eliminate  yields  abnor- 
mally low  because  of  disease  epidemics,  heavy  insect 
losses,  and/or  unknown  causes. 

3.  Include  a sod  index  in  the  model  with  one  or 
more  so;l  variables  based  on  soil  types  on  which 
varietal  trials  were  conducted. 

4.  Build  WRV’s  that  reflect  that  the  contribution 
of  1 inch  of  precipitation  on  yield  depends  on  the 
status  of  soil  moisture  budget  when  the  precipitation 
occurs. 

Some  of  the  shortcomings  of  the  model,  which 
hopefully  will  be  remedied  with  these  procedures, 
are  underestimating  high  yields,  not  detecting  and 
measuring  yield  losses  due  to  winterkill  and  freezes 
at  heading,  and  overestimating  when  disease 
epidemics  occur. 

In  conclusion,  there  is  no  technical  barrier  to  real- 
time application  of  the  model  either  for  selected 
regions  or  on  a global  basis.  The  computer  software, 
WHYMOD  (wheat  yield  model),  has  been  in  opera- 
tion at  the  National  Oceanic  and  Atmospheric  Ad- 
ministration (NOAA)  Center  for  Climatic  and  En- 
vironmental Assessment  (CCEA)  and  can  be  oper- 
ated on  a real-time  basis.  Preharvest  forecasts  are 
programed  into  WHYMOD  with  the  strategy  of 
substituting  mean  values  for  variables  that  are  gener- 
ated after  forecast  time. 

As  indicated  in  the  section  entitled  “Applica- 
tions," the  model  in  its  current  form  can  produce 
useful  results  and  provide  insights  into  causes  for  in- 
creases and  decreases  in  yields  despite  its  deficien- 


cies. KSU  is  continuing  work  by  retracing  the  steps  in 
model  development  in  an  effort  to  produce  an  im- 
proved model. 
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The  Law  of  the  Minimum  and  an  Application  to 
Wheat  Yield  Estimation 

R.  B.  Cate,0  D.  £.  Phinney,0  and  M.  H.  TrencharcP 


INTRODUCTION 

The  law-of-the-minimum  (LOM)  concept  domi- 
nated agricultural  science  throughout  the  19th  cen- 
tury. Its  most  famous  proponent  was  a German 
chemist,  Justus  von  Liebig,  although  several  other 
prominent  scientists  contributed  to  the  gradual 
evolution  of  the  principle  (ref.  1).  In  the  early  20th 
century,  the  LOM  concept  was  expanded  by  Black- 
man to  include  rate  or  flux  variables,  specifically  in 
photosynthesis  (ref.  2).  At  about  the  same  time, 
Shelford  developed  the  idea  of  ecological  maximums 
and  minimums;  these  constitute  the  uLimits  of 
tolerance”  that  control  the  distribution  of  organisms 
(refs.  3 and  4).  It  is  likely  that  the  scientists  who 
pioneered  in  the  application  of  the  LOM  to  biological 
systems  were  aware  that  they  were  merely  Extending 
the  law  of  multiple  proportions,  which  had  already 
become  the  basic  principle  of  chemistry , crystallogra- 
phy, and  other  Helds  dealing  with  the  structure  of 
matter. 

Despite  its  fundamental  theoretical  importance, 
the  LOM  was  not  applied  mathematically.  As  quan- 
tifleation  became  even  more  important  in  scientific 
research,  the  LOM  declined  in  prestige  because  it 
was  incompatible  with  conventional  analytical 
methods  involving  calculus,  analysis  of  variance,  and 
multiple  (additive)  regression.  However,  in  1963, 
Swanson  pointed  out  the  relationship  between  the 
LOM  and  linear  programing  (ref.  S).  Perhaps  coinci- 
dentally, increasing  attention  has  since  been  given  to 
quantified  application  of  the  LOM.  Several 
algorithms  now  exist  for  fitting  the  model  (refs.  6 to 
1 1).1'3  None  of  these  algorithms  is  wholly  satisfacto- 
ry, but  sufficient  progress  has  been  made  to  permit 


aLockheed  Electronics  Company,  Houston,  Texas. 


fairly  rigorous  application  and  testing.  The  purpose 
of  this  paper  is  to  illustrate  the  LOM  concept  in  a 
variety  of  contexts  and  then  to  report  on  how  it  is 
being  adapted  to  estimate  wheat  yields. 


THEORY 

Algebraic  Formulation 

Mathematically,  the  LOM  can  be  expressed  as 


Y - min/,  (Xf) 


For  example, 


A 

Y = 


min 


Yxl  = a0  + *1*1 


Yx2  " *0  + *1*2 


1 Yxn  = «0  + n\Xn 


*M.  J.  Hanley  and  H.  O.  Hartley  have  developed  an  un- 
published algorithm  using  maximum  likelihood  estimation  for 
calculation  of  the  LOM  parameters.  Although  untested  at  this 
time,  the  approach  is  a significant  advance. 

^R.  B.  Cate,  and  T.  Y.  Hsu,  “An  Algorithm  for  Deiining 
Linear  Programing  Activities  Using  the  Law  of  the  Minimt’m." 
North  Carolina  Agriculture  Experiment  Station  Technical 
Bulletin,  to  be  published. 

V E.  Waggoner,  “Liebig's  Law  of  the  Minimum  and  the 
Relation  Between  Weather,  Pathogen  and  Disease.”  Connecticut 
Agriculture  Experiment  Station,  to  be  published. 
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An  example  might  be  that  predicted  yield  Y equals 
the  minimum  value  r predicted  by  the  following 
equations. 


Yt  ~ 80  - 0.5  (mean  maximum  temperature) 
Yr  = 0 + 4.0  (total  rainfall) 

Yu  = 40  (maximum  historical  yield) 


The  prediction  process,  using  hypothetical  data,  is 
shown  in  table  I.  The  parenthetical  r values  deter- 
mine r,  the  final  prediction. 


Table  I. — Example  of  the  Application 
oj  a Law-of-the-Minimum  Model 


A 

A 

A 

A 

( CISC 

number 

Temperature 

Rainfall 

*T 

Yr 

YM 

Y 

\ 

90 

to 

(35) 

40 

40 

35 

2 

80 

S 

40 

(20) 

40 

20 

3 

70 

15 

45 

60 

(40) 

40 

The  equivalent  representation  for  an  additive 
multiple  regression  model  might  be 


Y = *0  * £ ¥i 


For  example. 


Y = % + a\X\  + a2X2  + • • * + %Xn 


whereas  a multiplicative,  multiple  regression  model 
can  be  expressed  as 


*i-  (a,)(a2).  . K) 


For  example. 


? * ( % + a,*,)  (bQ  + b{X2)  . . . (n0  + ntXn) 


Interactions 

The  treatment  of  interactions  differs  with  the 
three  preceding  types  of  functions.  In  the  additive 
model,  an  interaction  between  X{  and  Aj  is  usually 
represented  as  a new  variable  consisting  of  the  pro- 
duct of  the  two,  Af| AT2.  This  variable  has  its  own 
coefficient  and  is  included  in  the  overall  additive 
equation.  Since  the  number  of  possible  interactions 
increases  at  an  exponential  rate  as  more  variables  are 
considered,  a large  model  may  become  cumbersome 
and  unintelligible.  The  structure  of  the  LOM  func- 
tion avoids  this  difficulty  by  making  the  interactions 
absolute;  i.e.,  the  effect  of  a limiting  amount  of  one 
variable  is  to  suppress  completely  the  response  to 
another  factor.  In  other  words,  the  slope  of  the  sec- 
ond factor  becomes  zero  rather  than  its  coefficient. 
(Statistically,  interactions  can  be  defined  as  the 
effects  of  variables  on  the  slopes  of  other  variables.) 
The  multiplicative  model  involves  a more  extreme 
treatment  of  interactions  than  does  the  LOM.  For 
example,  the  Mitscherlich-Baule-Spillman-Bray 
model  predicts  that  if  three  variables  are  each  pre- 
sent in  sufficient  quantity  to  produce  50  percent  of 
maximum  yield,  the  yield  will  be  12.5  percent  of 
maximum  since  0.5  x 0.5  x 0.5  “0.125  (ref.  12). 


Substitution 

An  inherent  property  of  additive  variables  is  the 
capability  of  substituting  one  variable  for  another,  so 
that  a sufficient  amount  of  one  variable  can  com- 
pletely overcome  even  the  total  absence  of  another. 
The  LOM  does  not  permit  substitution.  However, 
when  substitution  does  exist,  a new  variable  may  be 
created  which  is  the  sum  of  two  additive  variables. 
An  example  of  this  technique  is  given  later  in  this 
paper. 
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Graphic  Representation 


2Hj0*2H2*02 


The  relationship  between  the  LOM  and  the  linear 
programing  model  (ref.  13)  can  be  seen  by  compar- 
ing figures  1 and  2,  which  use  the  formation  of  water 
as  an  example.  In  figure  l.only  points  A and  E repre- 
sent optimum  combinations  of  hydrogen  and  ox- 
ygen. At  points  B,  C,  and  F,  there  is  insufficient  hy- 
drogen to  balance  the  extra  oxygen;  whereas  at 
points  D,  O,  and  H,  there  is  an  excess  of  hydrogen 
relative  to  the  oxygen  supplied.  The  result  is  a series 
of  right-angled  isoquants  that  define  the  production 
diagonal,  or  expansion  path,  along  which  efficient 
output  of  water  occurs.  Figure  2 shows  the  yield  of 
water  plotted  against  hydrogen  and  oxygen  in- 
dividually. Note  that  the  hydrogen  plot  is  identical  to 
that  in  figure  1,  whereas  the  oxygen  plot  is  a rotated 
mirror  image  of  the  oxygen  portion  of  figure  1. 
However,  in  figure  2,  it  becomes  more  evident  that 
the  points  of  excess  oxygen  (B,  C,  and  F)  can  be  used 
as  replicates  for  definition  of  the  hydrogen  response 
line,  0-EF,  whereas  the  points  of  redundant  hy- 
drogen can  be  used  to  define  the  oxygen  response,  0 
EH.  This  property  is  the  basis  for  some  of  the 
algorithms  currently  used  to  fit  the  LOM. 


h2 

FIGURE  1.— Linear  programing  model. 


The  remainder  of  this  paper  is  devoted  to  a 
detailed  discussion  of  a trial  application  of  the  LOM 
to  yield  modeling. 


Q|HjOI  * m'n  *Vai  ■ m'n  Hj/I.O.  Oj/o.6 


FIGURE  2. — Law-of-the-minimum  model. 


USE  OF  THE  LOM  IN  YIELD  MODELING 


Interpretation  of  Individual  Experimente 

The  form  of  the  LOM  function  means  that  the 
coefficients  of  the  individual  variables  are  indepen- 
dent. This  independence  permits  the  direct  use  in 


yield  models  of  individual  experimental  results  ob- 
tained under  controlled  conditions  with  replicated 
factorial  treatments.  For  this  example,  the  results  of 
a typical  experiment  were  used;  they  are  plotted  in 
figure  3 (ref.  14).  The  left  portion  indicates  that  the 
nitrogen  response  followed  the  LOM  reasonably 
well,  since  treatments  1, 2,  3,  and  4 form  a relatively 
horizontal  line  corresponding  to  the  lowest  level  of 


963 


FIGURE  3.— Sample  application  of  the  LOM  model  to  the  joint  effects  of  nitrogen  and  moisture  on  grain  yield  in  which  the  original 
data  suggest  substitution  of  water  for  nitrogen.  The  N values  are  applied  nitrogen  In  pounds  per  acre;  RW  is  relative  water. 


applied  water.  Similarly,  treatments  7 and  8 form  a 
horizontal  line  corresponding  to  the  next  level  of  ap- 
plied water.  However,  the  right  portion  of  figure  3 in- 
dicates that  water  has  substituted  for  applied 
nitrogen  since  there  are  curved  positive  responses  to 
water  at  each  applied  nitrogen  level.  These  responses 
are  not  as  steep  (i.e.,  efficient)  as  the  response  to 
water  when  nitrogen  is  not  limiting,  but  the  LOM 
does  not  appear  to  be  the  proper  representation. 
However,  this  discrepancy  can  be  resolved,  using  the 
logical  assumption  that  the  intercept  value  of  20 
bushels  per  acre  on  the  nitrogen  graph  represents  the 
contribution  of  soil  nitrogen  under  optimum 
moisture  conditions.  If  the  nitrogen  response  line  is 
extrapolated  to  zero  yield,  the  amount  of  soil 
nitrogen  can  be  estimated  to  be  approximately  50 
pounds  per  acre.  Thus,  about  1 pound  of  soil  nitrogen 
per  acre  is  being  made  available  by  each  additional 
centimeter  of  applied  water.  To  obtain  a true  LOM 
representation  of  the  data,  it  is  necessary  to  create  a 
new  variable  (which  is  called  total  available 


nitrogen)  that  is  defined  as  the  sum  of  applied 
nitrogen  plus  soil  nitrogen  multiplied  by  relative 
water  availability  (RWA).  (Relative  refers  to  water 
level  as  a percentage  of  maximum  applied.)  The 
results  of  this  transformation  are  plotted  on  the  left 
side  of  figure  4.  The  LOM  model  now  provides  a 
satisfactory  description  of  the  data.  All  treatment 
yields  are  determined  by  available  nitrogen  except  at 
the  highest  applied  nitrogen  levels  (treatments  4,  8, 
12,  and  16).  The  nitrogen  response  equation  is 


Y = 3.16  + 0.36N 


where  Pis  yield  in  bushels  per  acre  and  JVis  nitrogen 
in  pounds  per  acre.  This  equation  is  very  close  to 
other  published  nitrogen  response  coefficients  (refs. 
7 and  15). 
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FIGURE  4.— Sample  application  of  the  LOM  model  to  the  Joint  effects  of  nitrogen  and  moisture  on  grain  yield  when  apparent  substitu- 
tion phenomena  are  removed  by  combining  soil  and  applied  nitrogen. 


Use  of  the  Experimental  Model 

The  experimental  model  was  used  in  the  follow* 
ing  applications. 

Data  base. — Once  a basic  model  has  been  built 
from  experimental  data,  the  problem  becomes  one  of 
adaptation  to  available  data.  This  example  consisted 
of  a data  set  for  spring  wheat  compiled  by  Dr.  A.  M. 
Feyerherm  (personal  communication).  The  basic 
variables  used  were  (1)  total  precipitation  by  crop 
stage,  (2)  mean  maximum  and  mean  minimum  rain- 
fall by  crop  stage,  (3)  applied  nitrogen  by  crop  report- 
ing district  (CRD),  (4)  percentage  of  fallow  by  CRD, 
(5)  the  relative  yielding  potential  of  the  dominant 
variety  planted  in  each  CRD,  and  (6)  yields  by  CRD. 
Information  on  these  variables  covered  the  period 
1955-76  for  most  of  the  spring  wheat  producing  dis- 
tricts in  the  U.S.  Great  Plains  (USGP). 

RWA.— The  fundamental  assumption  of  the 
model  is  that  nitrogen  uptake  is  determined  by  the 
amount  of  nitrogen  present  and  by  the  availability  of 


water  to  transport  the  nitrogen  into  the  plant.  Data 
limitations  precluded  the  use  of  a soil  moisture 
budget  to  estimate  water  availability.  The  most  prac- 
tical alternative  seemed  to  be  an  estimate  of  at- 
mospheric water  balance.  This  estimate  was  calcu- 
lated as  the  difference  between  precipitation  and  esti- 
mated pan  evaporation.  The  latter  was  calculated  as  a 
function  of  the  estimated  atmospheric  vapor  pres- 
sure deficit;  i.e., 

EP*  b o + b\E,  ~ b2E 

where  fyj  « 0.2163,  bx  - 0.3473,  and  * —0.2644; 
Ep  is  pan  evaporation  in  inches;  £,  is  the  vapor  pres- 
sure function  value  in  millibars  of  the  daily  max- 
imum air  temperature;  and  E is  the  vapor  pressure 
function  value  in  millibars  of  the  daily  minimum  air 
temperature.  The  functions  Es  and  E were  calculated 
using  a form  of  the  vapor  pressure  function  as 
derived  from  the  Clausius-Clapeyron  equation  with 


965 


compatible  units  (ref.  16).  The  coefficients  6q,  bu 
and  b2  were  derived  by  fitting  observed  data  for  pan 
evaporation  and  temperatures  at  several  stations  in 
the  USGP.  The  values  of  the  coefficients  were  found 
to  be  remarkably  stable  over  broad  areas.  The  range 
of  the  estimates  was  - 14  to  0 inches.  This  was  con- 
verted to  a percentage  scale  to  represent  RWA  and 
then  calculated  for  the  three  periods  of  plant  growth 
defined  in  the  paragraph  on  nitrogen  uptake. 

Nitrogen. — Soil  nitrogen  for  each  CRD  was  esti- 
mated by  dividing  the  maximum  historical  yield  in  a 
CRD  by  0.36  (the  slope  of  the  nitrogen  response  in 
the  basic  model)  and  by  the  relative  yielding  ability 
of  the  dominant  varieties  for  each  year  and  by 
subtracting  the  amount  of  nitrogen  applied  in  that 
year.  This  figure  was  then  divided  by  a factor  consist- 
ing of  1.0  + 0.5  (percentage  fallow).  The  purpose  of 
this  division  was  to  consider  the  accumulation  of 
nitrogen  in  the  soil  during  fallow.  The  maximum 
value  obtained  in  the  period  1955-66  was  then  used  as 
an  estimate  of  soil  nitrogen.  The  resulting  figures  for 
soil  nitrogen  combined  with  the  applied  nitrogen  and 
the  nitrogen  due  to  fallowing  to  give  a total  nitrogen 
figure  for  each  year  in  each  CRD. 

Nitrogen  uptake. — Because  nitrogen  uptake  and 
plant  growth  are  known  to  follow  approximately  the 
logistics  curve,  nitrogen  uptake  was  broken  down 
into  20  percent  during  the  period  planting  to  jointing, 
60  percent  during  jointing  to  heading,  and  20  percent 
during  heading  to  ripe.  The  RWA  for  each  period 
was  used  to  calculate  the  total  nitrogen  available  dur- 
ing each  period.  These  figures  were  in  turn 
multiplied  by  the  uptake  coefficient  for  the  period; 
the  sum  was  considered  to  be  total  nitrogen  uptake. 
The  basic  model  equation  was  then  solved  for  this 
total,  giving  a yield  prediction  based  solely  on 
nitrogen  uptake.  This  was  done  for  the  period 
1955-66,  and  the  results  were  compared  with  the  ac- 
tual values.  Two  major  types  of  systematic  errors 
were  identified.  The  first  type  was  a consistent  ten- 
dency for  actual  yields  to  be  anomalously  high  in 
years  with  cool  summers,  probably  due  to  a 
decreased  respiration  rate  during  grain  formation.  As 
an  interim  substitute  for  a respiration  submodel,  a 
critical  level  of  71 .5°  F was  simply  defined  for  mean 
te  mperature  during  the  milk  to  ripe  period  and  4 
bushels  were  added  to  the  yield  estimates  when  the 
mean  temperature  fell  below  this  critical  level. 

The  second  type  of  systematic  error  was  a consis- 
tent bias  over  years  which  varied  by  CRD,  presuma- 
bly due  largely  to  soil  differences.  This  bias  was  in- 


corporated into  the  model  as  a simple  additive  term 
consisting  of  the  mean  CRD  error  over  the  preceding 
8-year  period. 

Figure  5 summarizes  the  form  of  the  model  that 
was  submitted  for  testing.  The  respiration  and  soil 
components  will  be  improved  as  data  and  time  per- 
mit. 


Te*t  Results  for  Basollno  Modal 

The  model  was  tested  by  calculating  the  mean  air 
temperature  from  milk  to  ripe  and  the  RWA  from 
planting  to  jointing  jointing  to  heading,  and  heading 
to  ripe  at  each  synoptic  weather  station  in  the  spring 
wheat  region  for  the  1955-76  period.  An  objective 
analysis  using  variational  analysis  with  low-pass 
filtering  constraints  as  described  by  Wagner  (ref.  17) 
was  used  to  interpolate  the  four  weather-related 
variables  to  a 0.5°  grid  network.  All  grid  points  falling 
within  each  CRD  were  averaged  to  obtain  a mean 
value  for  the  weather  variables  for  each  CRD-year 
combination. 

The  weather  parameters  were  combined  with  the 
appropriate  cultural  and  soil  information.  A 10-year 
bootstrap  test  (1967-76)  was  performed  with  a local 
adjustment  factor  fitted  for  each  CRD.  The  resulting 
CRD  yield  estimates  were  aggregated  to  the  Center 
for  Climatic  and  Environmental  Assessment 
(CCEA)  model  regions  and  ultimately  to  the  entire 
spring  wheat  area  using  the  U.S.  Department  of 
Agriculture  Statistical  Reporting  Service4  (SRS) 
acreages. 

Table  II  presents  a comparison  of  the  Cate-Liebig 
results  with  the  baseline  performance  of  the 
Feyerherm  and  CCEA  Phase  III  yield  models. 
Clearly,  the  Cate-Liebig  model  has  performed  well. 
Figure  6 shows  the  year-by-year  performance  of 
these  models  and  the  SRS  yields  for  the  spring  wheat 
region. 

Relatively  poor  performance  of  the  Cate-Liebig 
model  in  Minnesota  resulted  in  additional  error 
analyses  in  that  region.  The  marked  change  in  per- 
formance for  a more  humid  climate  caused  specula- 
tion that  an  additional  weather  and/or  soil  parameter 
may  be  required.  However,  to  date,  no  satisfactory 
parameter  has  been  developed.  It  has  been  deter- 
mined that  radical  shifts  in  acreage  in  Minnesota 
may  also  be  related  to  this  problem. 


4Now  called  Economics,  Statistics,  and  Cooperatives  Service, 
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FIGURE  5.— Caie-Ueblg  baseline  wheal  yield  model 


Table  II. — Results  of  10-  Year  (1967-  76)  Bootstrap  Test  on  the  Cate-Lieblg  Yield 
Mode!  for  Spring  Wheat  With  Comparison  to  Baseline  Feyerherm  andCCEA 
Phase  III  Yield  Models 


Zone  CCt'A  Phau-lll  Eryertierm  Caie-Liebix 


Biot 

RUSE* 

Bios 

RUSE 

Bias 

RUSE 

• 

Montana 

-11.6 

2.18 

-0.1 

2.57 

0.8 

344 

North  Dakota 

-1.2 

294 

-.1 

2.55 

1 

1.37 

Red  River 

-1.4 

395 

9 

2.70 

-.8 

319 

Minnesota 

-.6 

3.81 

2.5 

545 

-1.3 

5.84 

• 

South  Dakota 

-.8 

3.00 

.9 

4.96 

.8 

4.14 

Toul  spring  wheat 

-1.0 

2.56 

.3 

208 

0 

1.29 

*Rooi-mc«n-wjUjre  emir 
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VIM 

FIGURE  6.— Sprint  wheat  modal  com  par  lion. 

Figure  7 shows  the  variation  in  acreage  ex- 
perienced in  the  south-central  CRD  or  Minnesota. 
The  source  of  the  variation  appears  to  be  economic. 
The  problem  lies  in  the  quality  of  land  being  moved 
in  and  out  of  wheat  cultivation.  An  examination  of 
the  Soil  Conservation  Service  (SCS)  Conservation 
Needs  Inventory  gives  a percentage  of  the  area  of  the 
state  in  field  crops  where  the  land  capability  (other 
than  erosion)  can  limit  productivity.  Table  III  gives 
these  figures  together  with  (he  errors  observed  for 
the  Cate-Liebig  model.  The  limitations  considered  by 
the  SCS  in  compiling  the  percentages  include  soil  tex 
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FIGURE  7.— Acreage  variation  with  time  In  the  aouth-centrat 
CRD  of  Minnesota. 


lure,  drainage,  and  climate  (dryness  and  coolness). 
Evidently,  model  performance  is  degraded  by  the 
failure  to  consider  these  differences.  Although  the 
problem  is  not  yet  resolved,  it  is  believed  that  more 
details  on  the  soils  under  cultivation  are  needed,  such 
as  nitrogen  content,  water-holding  capacity,  and 
drainage  class.  In  all  likelihood,  the  use  of  a detailed 
soil  moisture  model  will  be  required  for  a more  com- 
plete development  of  this  modeling  approach. 


CONCLUDING  REMARKS 

The  LOM  shows  promise  as  a yield-modeling  tool 
that  may  supplement  the  common  regression-type 
modeling  techniques.  Its  primary  advantages  are  as 
follows. 

1.  Extremes  are  better  predicted  because  LOM 
avoids  the  averaging  or  dampening  of  effects  that 
usually  results  from  multiple  regression  fitting. 

2.  Coefficients  are  stable  over  wide  ranges  of  con- 
ditions because  variable  effects  are  modeled  indepen- 
dently. 

3.  Coefficients  can  be  derived  from  experimental 
work  performed  under  controlled  conditions. 

4.  Additional  variables  can  be  added  to  a model 
without  affecting  the  coefficients  of  variables  already 
included. 

The  relationships  of  the  LOM  to  other  modeling 
techniques  are  summarized  in  table  IV,  which  is 
based  on  a table  originally  presented  by  Baier  (ref. 
18).  The  intent  of  this  paper  is  to  demonstrate  that 
the  LOM  concept  can  be  a vu'uable  tool  for  model 
building  when  regression  tools  are  inadequate. 

Additional  data,  which  may  be  of  interest,  on  the 
indication  of  biological  discontinuities  as  in  the 
LOM,  the  confirmation  of  the  LOM  by  a simulation 
model,  and  a review  of  the  application  of  the  LOM  in 
tropical  agricultural  development  are  available  in 
references  21,  22,  and  23,  respectively. 


Tahu  III. — Percentage  of  Total  Area 
in  Field  Crops  Where  Land  Capability  Limitations 
Are  a Predominant  Problem 
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Tabu:  II'. — Comparison  of  Mode I Features 
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CCEA  Second-Generation  Wheat  Yield  Model  for 
Hard  Red  Wheat  In  North  Dakota* 

£ K LtDuc* 


INTRODUCTION 

A wheat  yield  model  was  developed  for  hard  red 
spring  wheat  in  North  Dakota  using  historical  yields 
for  crop  reporting  districts  in  conjunction  with 
meteorological  predictor  variables  based  on  weekly 
data.  The  meteorological  data  were  aggregated  ac- 
cording to  observed  phenologies!  stages.  The  overall 
goal  of  this  approach  was  to  determine  whether  yield 
estimates  for  years  with  unusual  planting  dates 
and/or  unusual  phenologies!  development  could  be 
estimated  more  accurately  than  in  the  original 
monthly  models  developed  by  the  Center  for 
Climatic  and  Environmental  Assessment  (CCEA) 
of  the  National  Oceanic  and  Atmospheric  Adminis- 
tration (NOAA). 


CCCA  FIRST -GENERATION  MODKLS 

Tin*  first  yield  models  developed  by  CCEA  were 
regression  models  using  monthly  average  tem- 
perature and  precipitation  for  climatological  districts 
as  the  basic  meteorological  variables.  The  averages 
for  the  districts  were  based  on  the  average  from  a 
dense  network  of  cooperative  stations.  Models  were 
for  states  or  for  areas  the  size  of  states.  To  obtain 
variables  for  model  areas,  district  data  were  weighted 
by  relative  harvested  area  for  a specified  year. 

The  use  of  monthly  data  in  the  first  models  had 
several  shortcomings  If  the  crops  were  planted  very 
early  or  very  late,  the  stage  of  development  in  a par- 
ticular month  would  not  be  what  the  model  was  ex- 
pecting; i.e„  the  normal  stage  of  development.  For 
example,  early  planting  might  allow  the  wheat  to 


*Paper  preaeniad  to  the  Crop  Modeling  Workshop,  October 
W,  1977,  Columbia.  Missouri 

•NOAA  Center  for  Climatic  tad  Environmental  Assessment, 
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develop  more  rapidly  to  the  extent  that  the  wheat 
might  be  ripe  during  the  month  when  heading  would 
normally  occur.  In  this  case,  the  model  would 
assume  the  crop  was  in  the  heading  stage  and  in- 
terpret dry,  hot  weather  as  being  detrimental  to  yield 
when  in  fact  it  should  be  advantageous  to  drying  the 
ripe  crop.  Also,  phenological  development  varies 
over  a state,  and  areas  smaller  than  states  should  be 
able  to  more  effectively  utilize  stage-of-development 
information  in  a model. 

Variables  used  in  the  first  models  were  not  capa- 
ble of  assessing  the  delayed  and  cumulative  effects  of 
a moisture  deficit.  Soil  moisture  should  be  a good  in- 
dicator of  this.  Variables  that  are  averages  also  do  not 
reflect  the  full  impact  of  extreme  conditions. 


MODEL  VARIABLES 

A model  that  would  hopefully  alleviate  some  of 
the  problems  was  developed  for  hard  red  spring 
wheat  yield  in  the  nine  crop  reporting  districts 
(CRD's)  of  North  Dakota.  Production  and  harvested 
acreage  data  for  hard  red  spring  wheat  for  the  CRD’a 
were  available  from  the  Statistical  Reporting  Service 
(SRS)  of  the  U S.  Department  of  Agriculture 
(USDA).  The  most  recent  revision  was  used. 

The  quality  of  the  historical  v eld  data  for  the 
CRD’s  is  not  the  same  as  that  for  the  state  based  on 
objective  yield  surveys.  The  yield  estimates  for 
CRD’s  and  counties  are  based  mainly  on  responses 
from  SRS  .nail  surveys,  which  are  adjusted  using 
data  from  the  agricultural  census.  State  yields  are 
revised  based  on  check  data  such  as  state  assessors’ 
reports  on  acreage.1 

Actual  observed  phenological  stages  for  each 
CRD  in  North  Dakota  from  1950  to  1975  were 


I Lao*  T Steysert,  “Quality  of  United  Sutct  Wheat.  Corn, 
and  Soybean  Crop  Statistics."  report  to  the  Char  lea  F Kettering 
Foundation  under  Grant  Number  ST  76- JO,  Mar.  1977. 


971 


smoothed  and  the  median  date  was  determined.  The 
observed  stages  for  wheat  are  planting,  emergence, 
jointing,  heading,  milk-to-dough,  turning,  swathing, 
and  combining.  The  smoothing  was  both  spatial  and 
temporal  when  possible.  Data  were  spotty  in  some  of 
the  early  years.  Attempting  to  utilize  the  phenologi- 
es! stages  necessitates  using  periods  shorter  than  a 
month  for  thesu  smaller  areas.  The  natural  choice 
was  to  use  periods  of  a week  for  the  individual 
CRD’s. 

The  basic  aggregation  of  daily  station  data  into 
weekly  meteorological  data  for  each  CRD  was  ac- 
complished by  Dr.  Amos  Eddy  of  the  Department  of 
Atmospheric  Science  at  the  University  of 
Oklahoma.2  Data  for  $9  states  (fig.  1)  were  com- 
bined as  follows. 

1.  Average  total  precipitation  for  the  week  (PCP) 
in  hundredths  of  inches 

2.  Maximum  number  of  days  in  which  more  than 
1 inch  of  precipitation  fell  (NPH) 


2Amo$  Eddy.  Final  Report  on  Federal  Gram  USDC  (NO  A A) 
047-158-44000  concerning  development  of  second-generation 
yield  models.  June  1977. 


3.  Maximum  number  of  days  in  which  more  than 
0.2  inch  of  precipitation  fell  (NPM) 

4.  Maximum  number  of  days  in  which  more  than 
0.1  inch  of  precipitation  fell  (NPL) 

5.  Average  weekly  maximum  temperature  (MX) 
in  ®F 

6.  Average  weekly  minimum  temperature  (MN) 
in  #F 

7.  Maximum  number  of  days  in  the  week  when 
the  maximum  temperature  exceeded  100°  F (MXH) 
or  90®  F (MXL) 

8.  Maximum  number  of  days  in  the  week  when 
the  minimum  temperature  was  less  than  32°  F 
(MHN) 

9.  The  sum  of  the  average  daily  growing  degree 
days  (GDL)  in  the  CRD  for  the  week;  eg.,  for  loca- 
tion /,  where  the  daily  maximum  temperature  for  day 
j is  TXy,  the  daily  minimum  temperature  is  TNy,  and 
the  number  of  locations  on  day  j is  M 
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FIGURE  1.— North  Dakota  meteorological  stations  used  in  the  aggregation  of  dally  data. 
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10.  Average  weekly  surface  moisture  amount 
where  the  maximum  is  1 inch  (SS) 

11.  Average  weekly  subsurface  moisture  amount 
(SU)  where  the  maximum  is  the  available  water 
capacity  minus  one  (available  water  capacity  was  8 
inches  for  CRD's  4 to  9, 7 inches  for  CRD's  2 and  3, 
and  6 inches  for  CRD  1) 

12.  Average  weekly  runoff  in  inches  (RO) 

The  last  three  variables  were  determined  from  a 
hydrologic  accounting  system  similar  to  the  one  re- 
ported by  Palmer  (ref.  I).  Soil  moisture  is  previous 
storage  plus  precipitation  minus  evapotranspir* 
ation,  up  to  a set  maximum.  Excess  precipitation  is 
runoff.  A surface  layer  can  supply  up  to  1 inch  to 
evapotranspiration,  but  only  a fraction  of  demand 
beyond  that  can  be  supplied  by  the  underlying  layer. 
Evapotranspiration  is  that  part  of  potential 
evapotranspiration  (PET)  that  is  satisfied.  Thorn- 
thwaite  (ref.  2)  gives 


PET.  = 


5.5556  (T,  32) 

A ..  HOURS  ..  7 

•H  B 

X 12  X JO 

154 


where  Tt 
T 

HOURS 

7/30 

B 


weekly  CRD  average  temperature  in  °F 
(PET  — 0 where  T < 32) 
long-term  weekly  CRD  average  tem- 
perature in  °F 
number  of  daylight  hours 
transformation  from  monthly  values 
used  by  Thornthwaite  to  weekly  values 
used  in  this  study 

heat  index  computed  from  long-term 
•e«.ord 


B = 


where  T,  is  set  equal  to  32  if  it  is 
dimatologically  < 32 

A - 0.49239  + 0.01792 B - 0.0000771*2  + 
0.000000675*-' 


MODEL  FORM 

The  model  is  a multiple  regression  model.  A con- 
stant shift  term  was  considered  for  each  CRD; 
however,  none  of  the  shift  terms  were  significant. 
The  model  may  be  expressed  as 


>)=■<*  + fit*  + yTf  + Xj  fyWg 


U) 


where/ varies  for  each  crop  district  and  n is  the  num- 
ber of  weather  terms,  selected  from  those  listed  in 
the  previous  section.  The  year  YR  is  a variable 
defined  as  year  minus  1950.  The  trend  variable  7}  is 
of  the  following  form 


ij-  - 


00 1 M 


t*om  (wmj,)]  | 


(2) 


This  functional  form  was  chosen  to  allow  for  the  ex- 
ponential rate  of  increase  in  the  mid-1950's  and  to  ac- 
count for  the  apparent  slowdown  in  the  rate  of 
change  in  the  1970’s. 

The  coefficients  .41,  <42,  and  <43  were  determined 
from  a nonlinear  programing  algorithm  that  fitted  a 
linear  trend  from  1929  to  1949  and  T (in  the  above 
form)  from  1949  to  1976,  with  a forced  juncture  at 
1949.  A trend  was  fitted  to  each  individual  CRD 
yield  time  series;  i.e.,  for  each  series  T.  The  estimates 
of  the  parameters  are  given  in  table  1.  The  expected 
state  yield  without  regard  to  weather  variables  is 
shown  in  figure  2.  This  yield  is  derived  by  omitting 
the  weather  variables  If,  from  the  full  model  and 
estimating  the  parameters  a,  6,  and  y using  the  years 
1953  and  1957  to  1973.  These  parameter  estimates 
were  then  used  to  estimate  the  expected  yield  for 
each  CRD  for  each  of  the  years  from  1950  to  1976. 
Using  actual  harvested  acreages,  the  expec*  a CRD 
yields  were  aggregated  to  determine  an  expected 
state  yield.  Also  shown  in  figure  2 is  the  expected 
state  yield  using  the  same  process  except  that  the 
trend  function  (eq.  (2))  was  fitted  using  the  1929-73 
period;  i.e.,  no  linear  trend  was  used  for  the  early 
part.  Estimates  of  the  parameters  for  equation  (2)  are 
shown  in  table  I. 

The  meteorological  variables  W'  arc  determined 
as  follows.  For  a particular  stage,  the  week  in  which 
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half  the  crop  in  a given  CRD  passed  that  stage  was 
used.  The  weather  variable  for  that  week  and  the 
weather  variable  for  both  the  week  before  and  the 
week  after  are  averaged  together  with  weights  of 
O.SO,  0.25,  and  0.25,  respectively.  This  averaging  was 
used  because,  even  within  a single  CRD,  not  all  the 
crop  is  in  the  same  stage  during  a given  week. 


Table  I.— North  Dakota  Hard  Red  Spring  Wheat 
Trend  Coefficients  for  Exponential  Distribution 


(a)  Exponential  form  (eq.  (2))  through  entire  period 
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1.16 

t.to 
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uo 

7 

.78 

.93 

1.10 

8 

.69 

.90 

1.10 

9 

.66 

1.19 

1.10 

(bl  Exponential  form  (eq.  (211 1949-76.  linear  during  prior  period, 
T-B/+B>*  (year  -1928) 


CRD 

A/ 

A 2 

A3 

B/ 

B2 

1 

0.89 

1.72 

1.07 

8.0 

0.25 

2 

.88 

1.52 

1.07 

8.0 

.10 

3 

1.13 

1.34 

1.06 

10.5 

.25 

4 

.87 

1.56 

1.08 

4.5 

.25 

5 

.88 

1.52 

1.07 

5.5 

.25 

6 

1.06 

1.51 

1.03 

9.0 

.25 

7 

.89 

1.50 

1.10 

4.3 

25 

8 

.70 

1.50 

1.08 

8.0 

.01 

9 

.82 

1.58 

1.11 

7.9 

.01 

The  final  variables  were  selected  using  a stepwise 
regression  procedure  with  the  restriction  lhat  the 
physical  interpretation  of  the  signs  of  the  coefficients 
was  correct  with  regard  to  the  known  response  of 
wheat  yields  to  climatic  factors  during  a particular 
stage  of  development.  Because  of  missing  phonologi- 
cal data,  only  the  years  1953  and  1957  to  1973  were 
used  in  selecting  the  variables.  Truncated  models 
were  determined  for  the  different  phenological 
stages  using  only  variables  for  that  stage  or  pre- 
viously occurring  stages,  allowing  predictions  to  be 
made  early  in  the  growing  season  as  the  crop  reached 


each  development  stage.  In  each  of  these  preharvest 
models,  the  trend  was  allowed  to  remain  an 
independent  variable  and  its  coefficient  was  derived 
for  each  truncation  to  maximize  the  fit  to  the  data. 


FIGURE  2.— North  Dakota  hard  red  sprint  wheat  yield  and  tit- 
led trend. 


RESULTS 

Table  II  contains  the  variables  that  were  selected 
for  a preliminary  model.  The  first  alphabetic  charac- 
ters of  the  weather  variable  names  refer  to  the  pre- 
viously described  3-week  weighted  average  of  varia- 
bles described  in  the  section  entitled  “Model  Varia- 
bles’* where  the  codes  appear.  The  numeral  that 
follows  refers  to  stage  of  the  wheat  in  natural  order; 
i.e.,  planted  is  1 , emergence  is  2,  jointing  is  3,  heading 
is  4,  milk-to-dough  is  5,  turning  is  6,  swathing  is  7, 
and  combining  is  8.  The  variables,  the  estimates  and 
the  standard  errors  of  their  coefficients,  and  the  F 
statistic  with  its  significance  level  are  also  included  in 
table  II.  Maximum  temperature  MX,  minimum  tem- 
perature MN,  and  precipitation  PCP  have  the  mean 
subtracted.  The  means  are  included  in  table  II. 

The  trend  variable  has  a coefficient  varying  from 
0.81  to  0.91,  indicating  that  scalar  adjustment  to  the 
trend  estimated  in  equation  (2)  is  needed.  High  max- 
imum temperatures  are  detrimental  to  yield  from 
jointing  through  turning.  Precipitation  is  advan- 
tageous to  yield  from  emergence  to  heading  except 
that  runoff  can  cause  a decrease  in  yield  during  joint- 
ing and  heading.  The  milk-to-dough  stage  has  three 
variables  that  appear  to  be  important:  (1)  subsurface 
soil  moisture  at  this  time  can  be  beneficial  to  yield; 
(2)  maximum  temperature  has  an  effect  that  is 
difficult  to  interpret  in  the  model;  and  (3)  the 
average  maximum  temperature  has  a negative  coeffi- 
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Table  II. — Coefficients  (Standard  Deviations)  and  Statistics  Pertaining  to  Preliminary  Models 
for  North  Dakota  Hard  Red  Spring  Wheat 


1 'anabte 

Truncation  stage0 

Statistics  on  variables P 

Trend 

Planted 

at 

Emergence  Jointing  Heading 

Ot  ( J ) (4) 

Mllk-to-dough 

(S) 

Turning 

(6) 

Mean  Stf  Min  Max 

Constiru 

-0.61  a 51) 

166(161) 

1 12(163) 

7.93(2.31) 

1.18(2.54) 

-566(1.64) 

4.11(2.51) 

Trend 

90  (10) 

.82  ( .10) 

81  (09) 

*6  ( 07) 

.91  (.07) 

?!  (.06) 

85  (.06) 

2004 

5 73 

8.26 

31.64 

YH 

23  (OS) 

20  (.0*1 

22  ( 08) 

.25  (06) 

.26  ( 06) 

.20  ( 05) 

.18  (.05) 

13.00 

7.80 

0 

26.00 

MX  1 

.23  (.05) 

.20  (.05) 

.75  (.15) 

.51  ( 14) 

60.24 

6.82 

45.13 

76.20 

am 

-.15  (.05) 

-.08  (.03) 

6221 

3308 

6.45 

153.38 

SPH2 

2.01  (.97) 

.21 

,30 

0 

1.75 

MW  2 

.56  ( 29) 

1.27 

.15 

0 

3.25 

PC  PI 

.02  (.01) 

.02  (.01) 

45.21 

34.34 

.05 

192.10 

UDU 

-.06  (.01) 

161.26 

2519 

93.93 

229.20 

SPLS 

52  ( 24) 

54  (.23) 

337 

96 

.50 

600 

sm 

- .49  ( 06) 

-.35  (.06) 

-.25  (.061 

75.72 

4.57 

62  48 

86.63 

ROY 

-3.62(2,14) 

.02 

.09 

0 

.74 

MW3 

.82  (.37) 

186 

69 

.25 

3.50 

NPM* 

.88  (.28) 

88  (.29) 

1.81 

.75 

.50 

4.00 

MX* 

-.33  (.06) 

78.72 

4.40 

66.40 

89.44 

HO* 

-5.31  (2  87) 

.01 

.07 

0 

.71 

MX  H* 

-5.71(1.75) 

-6,41(1.67) 

.27 

80 

0 

500 

SO* 

36  (.15) 

3.99 

1.47 

66 

7.00 

MXS 

-.64  (.07) 

-.59  ( 07) 

8249 

362 

7223 

91.51 

MXHS 

3.58(1.05) 

3.32(1.00) 

.39 

99 

0 

6.00 

SOS 

35  (.16) 

3.53 

1.40 

.54 

7.00 

MXb 

-.17  ( .07) 

8366 

4.00 

73.02 

92.85 

a2 

j 2 

0.6! 

065 

065 

0.78 

082 

0.87 

0.88 

IS. 2 

16.4 

16.1 

10  2 

8.6 

6,5 

5.8 

*V*n*hiM  •‘CHXitd  «hJ  troefikMfim  emmaied  using  only  d*u  foe  the  years  1953  atul  1957.73 
S^aiiic*  on  variables  use  alt  years  I95U-76. 
cStandard  deviation 


cient  whereas  the  number  of  days  that  the  tem- 
perature is  above  100°  F has  a positive  coefficient. 
These  variables  are  highly  correlated.  If  temperatures 
get  above  100°  F,  the  yield  does  not  continue  to 
decrease  at  the  same  rate  but  is  lessened.  This  is  simi- 
lar to  a quadratic  effect  in  that  yield  losses  are  not 
simple  linear  functions  of  temperature  over  the  en- 
tire range. 

Coefficients  for  the  preliminary  models  defined  in 
table  I were  recalculated  using  additional  years.  The 
reestimated  models  were  used  to  obtain  predictions 
for  each  CRD  for  each  year  from  1950  to  1976.  These 
CRD  predictions  were  then  aggregated  to  the  state 
level  using  the  actual  acreages.  The  mean  squared  er- 
rors or  the  average  squared  difference  between  the 
aggregated  predictions  and  the  actual  yield  are  shown 
in  table  111.  The  mean  squared  errors  appear  to  be  sta- 
ble and  are  of  course  smaller  when  more  years  are 
used  in  estimating  the  coefficients. 


The  quality  of  the  phenological  data  for  the  years 
from  1950  to  1956  is  poor.  Some  of  these  years  had 
few  observations  on  the  phenological  stages.  The 
missing  phenological  data  were  estimated  in  an  at- 
tempt to  use  the  data  that  were  reported  in  addition 
to  the  development  stage  as  indicated  in  the  “Weekly 
Weather  and  Crop  Bulletin.”  This  estimation  is. 
however,  a source  of  error  in  evaluating  the  models, 
but  these  questionable  years  were  not  considered 
when  selecting  variables  for  the  models.  The 
phenological  data  for  1976  were  obtained  from  the 
actual  planting  date  and.  for  later  phenological  stages, 
from  Robertson’s  biometeorological  time  scale 
(ref.  3). 

Yield  estimates  for  the  truncated  models  in  table 
IV,  with  variables  as  listed  in  table  II,  were  generated 
for  independent  data  for  the  years  not  considered  in 
selecting  the  variables;  i.e.,  1950  to  1952,  1954  to 
1956,  and  1974  to  1976.  For  the  early  years  from  1950 
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Table  III— Mean  Squared  Error  (19S0-76) 
Preliminary  Truncated  Models  for  North  Dakota 
Hard  Red  Spring  Wheat  Yield 


Yean  used  to 
estimate  coeffi- 
cients 

Trend  Plant- 
ing 

Emer- 

gence 

Truncation 

Joint - Head- 
ing ing 

Mtik 

10 

dough 

Turn- 

ing 

1953, 1957-73 

13.6 

11.4 

11.2 

6.4 

7.0 

S.l 

6.1 

1951-76 

11.9 

10.5 

10.1 

5.3 

4.6 

3 S 

2.9 

1950, 1952-% 

11.9 

10.5 

10.0 

5.1 

4.4 

3.5 

3.0 

1950-51,1953-76 

11.8 

10.4 

9.9 

5.0 

4.3 

3.4 

2.8 

1950-53, 1955-76 

11.9 

10.6 

10.0 

5.2 

4.4 

3.4 

2.8 

1950-54,1956-76 

11.8 

10.4 

9.9 

5.0 

4.5 

3.3 

2.8 

19SO-5J,  1957-76 

11.8 

10.4 

9.9 

4.9 

4,3 

3.4 

2.8 

1950-73,1975-76 

12.0 

10.6 

101 

5.4 

4.6 

3.4 

2.9 

1950-74,1976 

11.8 

10.4 

9.9 

5.1 

4.4 

3.4 

2.8 

1950-75 

11.9 

10.5 

9.9 

5.1 

4.4 

3.3 

2.8 

1950-73 

12.7 

11.5 

U.O 

5.7 

4.6 

3.5 

3.4 

1950-74 

11.9 

10.5 

10.0 

S.l 

4.4 

3.3 

2.7 

to  19SS,  the  yield  estimates  are  not  very  encouraging. 
In  fact,  at  the  time  of  turning,  the  model  under* 
predicted  twice  by  at  least  2 bushels  per  acre  and 
once  by  5.5  bushels  per  acre.  For  1952  and  1954,  the 
model  overpredicted  by  2.6  and  3.6  bushels  per  acre, 
respectively.  The  model,  however,  did  show  a 
decrease  in  the  yield  estimate  for  1952  that  was  lower 
than  the  actual  yield  for  1951.  The  estimates  for  1956 
were  good  from  the  milk*to*dough  stage  through 
turning,  the  last  estimate.  Omitting  individual  years. 


the  estimates  for  1974  and  1976  were  good.  The  esti- 
mate for  1975,  however,  is  closer  in  the  early 
phenologies1,  stages.  If  all  years  after  an  individual 
year  are  left  out,  the  model  prediction  for  1974  is  un- 
derestimated by  4.2  bushels  per  acre. 

An  alternate  model  using  the  number  of  days 
above  90°  F for  the  later  stages  is  presented  in  table 
V.  The  results  of  the  independent  yield  predictions 
are  included  in  table  VI.  The  fit  of  the  milk-to-dough 
truncation  of  this  model  is  shown  in  figure  3.  As  with 
the  preliminary  model,  the  independent  years  1950 
to  1952, 1954  to  1956,  and  1974  to  1976  were  not  used 
in  model  development.  The  dashed  line  represents 
yield  estimates  from  the  model  when  coefficients 
were  estimated  using  the  years  1953  and  1957  to 
1973.  Yield  estimates  for  the  independent  years  were 
also  derived  using  all  other  years  except  that  particu- 
lar year  to  estimate  the  coefficients;  these  estimates 
are  indicated  by  the  label  "independent  test."  In  the 
"bootstrap  test,”  all  previous  years  were  used  to  esti- 
mate the  coefficients  for  the  model. 

The  variables  in  the  model  with  days  above  90°  F 
in  addition  to  having  a meaningful  physical  in- 
terpretation are  statistically  significant.  The  model 
for  the  jointing  stage  shows  high  maximum  tem- 
peratures to  be  helpful  at  planting  but  harmful  to 
yield  at  jointing.  Precipitation  occurring  around 
jointing  increases  yield,  but  yield  decreases  if  there 
are  too  many  days  of  precipitation  around  the  plant- 
ing date.  If  the  minimum  temperatures  are  too  high 
at  emergence,  the  model  indicates  there  will  be  a 
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Table  IY.— Independent  Yield  Predictions  for  North  Dakota  Hard  Red  Spring  Wheat 
Using  Preliminary  Truncated  Models 


Yew a 

Trend 

Planting 

Emergence 

Jointing 

Heading 

Milk-to- 

dough 

Turning 

Actual 

1950 

10.2 

11.3 

11.4 

9.2 

9.5 

10.2 

11.3 

14.0 

19S1 

11.2 

12.3 

11.7 

13.3 

13.0 

10.8 

11.6 

14.0 

1952 

12.8 

15.0 

14.5 

13.5 

13.9 

12.7 

12.6 

10.0 

1954 

14.6 

13.5 

13,3 

12.7 

11.5 

13.4 

13.6 

10.0 

1955 

15.2 

15.8 

15.7 

13.9 

11.7 

U.O 

10.0 

15.5 

1956 

16.1 

15.1 

14.9 

13.4 

13.4 

17.0 

17.1 

17.5 

1974 

26.9 

27.4 

27.3 

24.9 

22.8 

20.2 

21.4 

20.5 

1975 

26.1 

27.6 

25.6 

26.7 

25.2 

24.9 

24.2 

25.5 

1976 

26.2 

25.9 

25.4 

24.4 

25.4 

25.2 

24.3 

24.7 

b1974 

27.8 

28.5 

28.4 

25.6 

23.4 

18.0 

16.3 

20.5 

**1975 

26.5 

26.3 

25.8 

24.5 

25.5 

2S.4 

24.5 

2S.S 

*Y«M  i>  left  out  of  calculation!  for  coefficient*:  alio,  the  yean  included  here  were  nor  comtdered  when  independent  variable*  were 
•elected. 

bYe*r  and  followini  years  are  left  out  of  calculation*  for  coefficient  estimate*. 
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Table  V. — Coefficients  (Standard  Deviations)  and  Statistics  Pertaining  to  the 
A Itemate  Models  for  North  Dakota  Hard  Red  Spring  Wheat 


Variable 

Truncation  stage 0 

Statistics  on  variables P 

Jointing 

(J) 

Heading 

(41 

Mitk-tthdough 

0) 

Mean 

SD 

Min 

Max 

Constant 

1.37  (1.72) 

-1.61(1.46) 

-0.46(1.23) 

Trend 

.85  (.07) 

.85  (.06) 

.82  (.06) 

20.0 

5.7 

8.3 

316 

YR 

.24  (.06) 

.29  (.06) 

.30  (.05) 

13.0 

7.8 

0 

260 

MXl 

.19  (.05) 

.12  (.04) 

.07  (.03) 

60.2 

6.8 

45.1 

76.2 

NPLl 

-.62  (.31) 

2.6 

1.0 

0 

5.0 

UN  2 

-.18  (.07) 

41.3 

4.2 

30.4 

55.0 

MN  3 

-.31  (.07) 

-.23  (.06) 

50.3 

3.6 

39.0 

59.3 

MXl 

-.23  (.08) 

75.7 

4.6 

62.5 

86.6 

MXLl 

— 1.97  (.47) 

-1.09  (.33) 

— 1-14  (.29) 

.6 

.8 

0 

4.0 

NPM) 

.83  (.40) 

1.42  (.33) 

1.13  (.29) 

1.9 

.7 

.3 

3.5 

MXLA 

— 1.94  (.25) 

-1.42  (.24) 

1.1 

1.1 

0 

5.0 

MXS 

-.53  (.07) 

82.5 

3.6 

72.2 

91.5 

M/VS 

.23  (.09) 

55.4 

2.9 

48.1 

63.7 

R2 

0 79 

0.84 

0.88 

s ^ 

9.9 

7.7 

5.8 

‘Variables  selected  and  cuefTtoents  estimated  using  only  data  for  the  years  I95J  and 
Statistics  on  vansbtes  use  alt  years  1950-76 


decrease  in  yield.  The  model  at  the  heading  stage  has 
the  same  variables  as  before,  except  that  precipita- 
tion around  planting  and  minimum  temperatures  at 
emergence  are  not  included  and  the  minimum  rather 
than  the  maximum  temperature  at  the  jointing  stage 


Table  VI. — Independent  Yield  Predictions  for 
North  Dakota  Hard  Red  Spring  Wheat 
Using  Alternate  Models  With  MXL  Variable 
(Number  of  Days  > 90°  F) 


Year0 

Trend 

Jointing 

Heading 

Milk-to- 

dough 

Actual 

b1976 

26.4 

25.7 

24.6 

25.2 

24.7 

c1975 

26.8 

289 

28.6 

27.4 

25.5 

1974 

28.3 

24.0 

21.0 

20.7 

20.5 

1950 

8.9 

8.7 

7.6 

8.1 

14.0 

1951 

10.0 

132 

108 

10.3 

14.0 

19S2 

11.0 

11.6 

9.6 

10.7 

10.0 

1954 

13.0 

119 

8.1 

9.4 

10.0 

1955 

14.1 

12  4 

8.3 

8.7 

15.5 

1956 

15.3 

13.2 

12.5 

15.0 

17.5 

*Yc«r  it  not  used  in  estimation  of  coefficients,  nor  were  these  yean  considered  in 
the  selection  of  variables 

**!>aU  from  1*^4  and  added  to  basic  data  set  for  calculation  of  coefficients 
kl>au  from  added  to  basic  data  set  for  calculation  of  coefficients 


is  used.  High  maximum  temperatures  at  heading  are 
detrimental  to  yield.  The  model  for  the  milk-to- 
dough  stage  includes  the  maximum  temperature 
with  a negative  effect  and  the  minimum  temperature 
with  a positive  effect.  These  two  variables  are  highly 
correlated,  the  coefficient  on  the  maximum  tem- 
perature is  of  larger  magnitude,  and  both  are 
measured  as  deviations  from  normal.  If  both  max- 


I'llil  Rt  .1. — Yields  for  North  Dakota  hard  red  spring  wheat, 
actual  and  estimated  from  alternate  model. 
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imum  and  minimum  temperatures  are  2°  P above 
normal,  the  impact  on  yield  would  be  estimated  as 
-0.5J2  + 0.23j  - -0.3  bushel  per  acre.  If  the  tem- 
peratures deviate  in  opposite  directions  from  the 
normal,  the  contributions  would  be  in  the  same 
direction.  For  example,  the  maximum  being  higher 
than  normal  and  the  minimum  lower  than  normal 

Table  VtL— North  Dt 


(a)  Weighting  factors? 


Crop  district 

Weight 

10  Northwest 

0.2509 

20  North  Central 

.1558 

40  West  Central 

.1178 

50  Central' 

.1616 

70  Southwest 

.0948 

80  South  Central 

.0834 

90  Southeast 

.1357 

*Wei*hi«  baled  on  1973  spun*  wheel  harveited  aerate. 


would  indicate  not  only  higher  day  temperatures  but 
also  cooler  nights;  i.e.,  more  diurnal  variability.  The 
model  would  estimate  these  conditions  to  be  more 
detrimental  to  yield  than  if  both  day  and  night  tem- 
peratures were  higher  than  normal. 

The  CCEA  first-generation  Phase  III  model, 
which  includes  seven  of  the  nine  CRD’s  in  North 

>a  Spring  Wheat  Model 


(b)  Definition  of  constants 


Parameter 

Definition 

PET 

Potential  evapotranspiration 
estimated  from  Thornthwaite's 
method 

PET  A 

1.051 

PET! 

34.813 

April  day  length 

1.1297 

May  daylength 

1.2573 

Latitude 

48°  N 

June  deg -days  >90“  F 

1 if  deg-days  >2;  otherwise  0 

July  deg-days  >90°  F 

1 if  deg-days  >15;  otherwise  0 

Deg-days  stations 

Bismarck,  Dickinson.  Fargo,  Grand 
Forks,  Jamestown,  Minot,  and 
Williston 

(cl  Analysis 


Variable 

Truncation a 

Normal 

Trend 

March 

April 

May 

June 

July 

Overall  constant 

1.00 

4.25220 

5.07375 

5.00981 

5.12948 

6.66911 

7.83411 

Linear  trend  1932-55 

24.00 

0.17973 

0.12701 

0.14318 

0.13230 

0.10454 

0.08950 

Linear  trend  I95S-65 

11.00 

0.58914 

0.68875 

0,65567 

0.65543 

0.68523 

0.69733 

Linear  trend  1965-72 

8.00 

0,26745 

0.10780 

0.22519 

0.23068 

0.27639 

0.24716 

Aug.  to  Mar,  precipitation, c mm 

176.67 

0.02966 

0.02716 

0.02660 

0.02589 

0.02357 

Apr.  precipitation  - PET,C  mm 

10.47 

-0.00009 

-0.00658 

-0.00297 

0.00181 

Apr.  precipitation  - PET.^  mm 

10.47 

-0.00042 

-000035 

-0.00046 

0.00041 

May  precipitation/PET,c  mm 

0.77 

1.24698 

1.44860 

0.70176 

June  precipitation. c mm 

89.26 

0.04159 

0.03738 

June  precipitation, d mm 

89.26 

-0.00044 

-0.00045 

June  deg-days  >90°  F 

-1.29241 

-0.88576 

July  deg-days  >90°  F 

-1.55418 

R 2 

0.69401 

0.74860 

0.76076 

0.76959 

0.87180 

0.88325 

Standard  error,  ql/ha 

2.83090 

2.59865 

2.60266 

2.58938 

2.01738 

1.95503 

Standard  variance,  ql/ha 

801400 

6.75300 

6.77386 

670489 

4.06982 

3.82215 

Standard  deviation  of  yields  — 4.93589  qi/ha 

*Y*M»  baaed  an  19J2-75.  measured  in  quintals  per  hectare. 
^Meteorological  normals  bawd  on  1931-15 
'Departure  from  normal 
“Squared  departure  from  normal. 
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Dakota,  is  detailed  in  table  VU.  It  is  difficult  to  com- 
pare coefficients  of  the  first-  and  second-generation 
models  because  of  inherent  differences.  Later  stages 
show  the  negative  effects  of  hot  weather  and  the 
need  for  moisture  in  both  models.  Results  of  the  test 
of  the  first-generation  model  are  shown  in  figure  4. 
Basic  differences  must  be  considered  when  compar- 
ing figures  3 and  4.  The  yield  shown  for  the  second- 
generation  model  is  for  hard  red  spring  wheat  in  all 


FIGURE  4. — Phase  III  first -generation  model  yields  for  North 
Dakota  spring  wheat  In  CRD's  I,  2.  4,  S.  7,  8.  and  9. 

of  North  Dakota,  whereas  the  first-generation  model 
is  for  the  yield  of  all  spring  wheat  in  seven  of  the 
CRD’s.  The  higher  yielding  Red  River  Valley  is  not 
included  in  the  area  included  in  the  first-generation 
model.  Both  models  seem  to  miss  in  the  same  direc- 
tion. The  second-generation  model  appears  to  give  an 
improved  estimate  in  1974  and  1976.  The  latter  year 
is  an  independent  test  year  for  both  models,  but  1974 
is  an  independent  test  only  for  the  second-generation 
model. 


parison.  The  occurrence  of  the  phenological  stages 
for  estimates  from  the  second-generation  model  may 
not  be  convenient  for  release  and  dissemination  of 
yield  estimates  because  the  dates  are  variable.  If  the 
models  are  to  provide  estimates  at  fixed  calendar 
dates  for  an  operational  system,  the  second-genera- 
tion modei  needs  to  be  evaluated  for  its  ability  to  ac- 
complish this.  The  comparative  value  of  the  two 
types  of  models  will  be  determined  by  operational 
results  of  each  on  independent  data. 
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CONCLUSION 

The  second-generation  model  appears  to  provide 
estimates  that  are  improved  in  two  of  the  three  inde- 
pendent test  years.  The  earlier  years  analyzed  using 
the  bootstrap  test  do  not  show  considerable  improve- 
ment. The  second-generation  models  are  much  more 
difficult  to  use  operationally  because  of  the 
variability  of  truncations  caused  by  the  rate  of 
development  of  the  wheat  crop.  More  data  must  be 
collected,  quality  controlled,  and  used  to  calculate  the 
derived  variables.  Thus,  more  resources  are  de- 
manded for  assessments  using  the  second-generation 
model.  The  cost/benefit  comparison  of  the  two 
models  is  based  on  meteorological  data  that  are 
routinely  observed.  The  timeliness  of  providing  the 
estimates  also  needs  to  be  considered  in  the  corn- 
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INTRODUCTION 

The  objectives  or  this  paper  are  to  describe  the 
supporting  research  in  crop  development  modeling 
(crop  calendars)  and,  more  specifically,  to  discuss 
the  relative  merits  and  shortcomings  of  various 
models  for  the  development  of  wheat  ( Trittcum 
aestivium ) which  emerged  during  the  3 years  of 
LACIE.  The  models  described  herein  represent  a 
joint  research  effort  of  NASA  and  contractor  scien- 
tists. The  incorporation  of  these  models  into  LACIE 
operations  is  discussed  elsewhere  in  this  volume 
(McCrary  and  Rogers,  “Operation  of  The  Yield 
Estimation  Subsystem"). 

Crop  phenology,  the  study  of  the  expression  of 
genotypic  and  environmental  interactions,  has  been 
a key  concept  in  the  evolution  of  quantified  crop 
development  scales  for  many  crops.  Wheat,  corn, 
peas,  sorghum,  and  soybeans  are  a few  of  the  crops 
for  which  development,  from  emergence  through 
maturation,  has  been  described  using  a pheno- 
logically  based  numeric  scale.  The  history  of 
agriculture  shows  that  man  has  always  used 
phenological  characteristics  to  identify  stages  of 
development  for  particular  crops.  In  fact,  improve- 
ments in  crop  husbandry  are  still  being  made  as  a 
better  understanding  of  crop  phenology  is  gained. 

Phenotypic  characteristics  of  a crop  may  be 
divided  into  those  which  manifest  growth  and  those 
which  manifest  development.  Crop  growth  and 
development  are  frequently  confused,  although  they 
are  distinctly  different  concepts.  Growth  tradi- 
tionally refers  to  an  increase  in  plant  size  (roots, 
shoots,  stems,  and  leaves)  and  represents  one  com- 
ponent of  plant  development.  The  concept  of 
development  includes  the  sequence  of  life  cycle 
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events  which  lead  to  changes  in  tissue  structure 
and/or  function.  Because  there  is  some  disagreement 
over  this  concept  among  agricultural  scientists,  three 
variant  categories  of  crop  development  require  dis- 
tinction. The  potential  rate  of  crop  development  is 
determined  genetically  and  can  only  be  observed  in 
the  laboratory  under  controlled  optimum  conditions. 
The  actual  rate  of  development  is  the  result  of  a 
system  of  genotypic-climatic-nutritional  interactions 
which  occur  at  the  biochemical  level  in  natural  en- 
vironments. Lastly,  the  observed  rate  of  develop- 
ment depends  on  the  degree  to  which  a crop  ex- 
presses changes  in  tissue  structure  or  function  and 
the  frequency  and  accuracy  of  the  observations  of 
such  changes. 

The  quantification  of  a crop  phenological  scale  is 
based  on  the  observed  rate  of  development.  Too  fre- 
quently, vigorous  growth  obnubilates  a clear  picture 
of  the  ontogenetic  status  of  a crop,  which  is 
mistakenly  identified  as  an  advanced  stage  of 
development.  The  observed  rate  of  development,  as 
the  dependent  variable  in  the  model,  may  contain 
significant  error  caused  by  inadequate  expression  of 
ontogenetic  changes  by  the  crop  or  observation  error 
by  the  scientist.  When  this  is  the  case,  it  is  futile  to 
attempt  a crop  development  model  of  the  form  Y — 

fix j.  xj xt),  because  it  is  likely  that  the  dependent 

variable  ( Y)  will  have  a greater  magnitude  of  error 
than  the  predictor  variables  (*,),  which  are  typically 
measured  climatic  or  nutritional  characteristics.  In 
order  to  limit  such  errors  in  phenological  data, 
careful  dissection  of  plant  parts  and  frequent  field 
observations  have  been  used  in  some  crop  develop- 
ment studies,  such  as  those  by  Robertson,  Hanway, 
Vanderlip,  Williams,  Major  et  al.,  and  Seeley  (refs.  1 
to  6,  respectively). 

Traditionally,  the  quantification  of  a biological 
time  scale  for  a crop  has  been  accomplished  by  num- 
bering, in  ascending  order,  the  phenotypic  charac- 
teristics as  they  appear.  The  terms  “developmental 
stage”  or  “biostage"  refer  to  a particular  point  on  the 


biological  time  scale.  Terms  such  as  “subperiod"  or 
“biophase''  are  used  to  refer  to  the  time  from  one 
stage  to  another.  The  rate  of  development  is 
generally  expressed  in  numeric  stage  units  per  unit 
time;  occasionally,  however,  a heat  unit,  such  as 
growing  degree  days,  is  a surrogate  for  time  in  the  ex* 
pression  of  crop  development. 

A key  motivation  to  derive  crop  development 
(crop  calendar)  models  is  the  idea  that,  with  such  a 
model,  environmental  variables  measured  through- 
out some  critical  subperiods  in  the  crop  life  cycle 
would  provide  better  predictors  of  grain  yield.  In  ad- 
dition, crop  calendar  models  were  needed  to  provide 
a tool  for  the  analyst-interpreter  (AI)  to  use  in  iden- 
tifying spectral  signatures  of  wheat  Helds  throughout 
the  growing  season. 

The  initial  step  in  crop  calendar  modeling  must  be 
a review  of  the  physiology  of  development.  This 
review  points  out  the  required  assumptions  that 
must  be  made  in  order  to  model  crop  development 
over  a large  geographic  region. 

PHYSIOLOGY  OP  DEVELOPMENT 

A search  of  the  literature  on  the  physiology  of 
development  reveals  that  there  are  many  theoretical 
considerations  to  be  taken  into  account  in  building  a 
model  of  crop  development.  In  the  general  develop- 
mental process,  photosynthesis,  respiration, 
translocation,  and  differentiation  are  key  mecha- 
nisms which  are  controlled  by  a highly  complex  and 
interactive  system  of  climatic  and  nutritional  factors. 
The  complexity  of  this  system  is  magnified  by  the 
fact  that  experimental  results  show  that,  for  any  of 
these  mechanisms  and  the  overall  developmental 
process,  the  limiting  values  of  temperature, 
moisture,  and  soil  nutrients  frequently  change  over 
subperiods  in  the  life  cycle.  For  example,  those 
climatic  or  nutritional  conditions  which  do  not 
sharply  limit  early  vegetative  development  may 
drastically  inhibit  floral  development. 

Because  of  this  complexity,  specific  developmen- 
tal mechanisms  are  difficult  to  model  without 
laboratory  controls  and  frequent  measurements  of 
important  climatic  and  soil  characteristics.  Some 
simulation  models  exist  for  these  mechanisms,  but 
their  required  inputs  are  inappropriate  for  use  in  a 
large  regional  modeling  effort  such  as  LACIE.  Many 
of  these  simulation  models  require  frequent  and 
detailed  measurements  of  the  crop,  soil,  and  climate. 

Crop  development  models  applicable  to  large 
regions,  such  as  the  U.S.  Great  Plains,  should  not  be 


overparameterized  or  made  to  require  types  of  input 
variables  not  frequently  available  Horn  the  network 
of  climatic  stations.  Because  the  vast  majority  of 
climatic  stations  only  record  daily  maximum  and 
minimum  temperatures  and  precipitation,  com- 
prehensive climatic  variable  input  for  a crop  develop- 
ment model  is  impractical.  Solar  radiation,  humidity, 
wind,  soil  temperature,  and  soil  moisture— all  impor- 
tant factors  to  crop  development  and  the  mecha- 
nisms which  govern  it— must  be  excluded  from  con- 
sideration in  building  crop  models,  except  to  the  ex- 
tent that  they  can  be  submodeled.  Despite  these 
severe  limitations  of  input  data,  the  problem  of  quan- 
tifying and  modeling  wheat  development  over  a large 
geographical  region  was  addressed  by  LACIE  scien- 
tists. Some  of  the  pertinent  assumptions  required  for 
this  research  were  (I)  the  phenotypic  characteristics 
of  wheat  express  the  developmental  process  well  and 
are  observable;  (2)  wheat  is  relatively  stable 
ecotypically,  and  genetic  variance  does  not  confound 
the  quantitative  development  scale;  (3)  the  develop- 
ment of  wheat  can  be  modeled  with  a minimum  of 
climatic  data;  «nd  (4)  the  within-year  spatial 
variability  in  the  occurrence  of  specific  stages  is 
relatively  uniform. 

THE  WHEAT  PHENOLOGICAL  SCALE  USED 
IN  LACIE 

Stages  of  development  for  cereals  have  been 
defined  by  Feekes,  Large,  Jensen  and  Lund, 
Robertson.  Williams,  Haun,  and.  most  recently, 
Waldren  (refs.  7, 8. 9, 1, 4, 10,  and  11,  respectively). 
Some  of  these  scales  of  develop  men;  have  not  been 
accepted  by  farmers  and  agricultural  scientists 
because  they  are  based  on  small  morphological 
changes  which  are  not  readily  apparent,  especially  at 
the  later  stages  of  crop  development.  Still  others  re- 
quire careful  field  observations  to  identify  stages. 
The  developmental  scale  used  by  Robertson  (ref.  1) 
for  spring  wheat  was  chosen  by  LACIE  because  it 
uses  the  predominantly  visible  phenotypic  charac- 
teristics to  identify  a limited  number  of  stages  in 
wheat  and  because  it  evolved  from  years  of  research 
in  several  climatically  diverse  locations. 

The  different  stages  (biological  times)  and  their 
corresponding  numbers  on  the  Robertson  quantified 
development  scale  appear  in  table  I.  Pictures  of  the 
successive  stages  of  development  are  presented  in 
figure  I. 

Primary  candidate  crop  calendar  models  for  initial 
testing  in  LACIE  were  the  Nuttonson  (ref.  12) 
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Tabu  l.-~The  Robertson  Hienolagical 
Scale  for  Wheat 


Phrmthguvl  di&mtrrtitk 

Stage 

Planting 

to 

Emergence 

;.o 

Jointing 

JO 

Heading 

4.0 

Soft  dough 

JO 

Ripe 

6.0 

photothermal  linear  model  and  the  Robertson  (ref. 
1)  triquadratic  spring  wheat  model,  which  included 
nonlinear  and  interacJon  expressions  of  maximum 
and  minimum  temperatures  and  day  length.  Stuff 
(ref.  13)  analyzed  the  Nuttonson  data  and  concluded 
that  there  were  significant  nonlinear  effects  in  the 
temperature  and  day-length  variables.  Robertson's 
treatment  of  these  variables  conformed  with  the 
findings  of  Stuff. 

Another  study  which  motivated  LAC1E  to  begin 
with  the  Robertson  spring  wheat  model  was 
Feyerherm's  (ref.  14)  finding  that  it  accurately  pre- 
dicted heading  and  maturity  of  winter  wheat  in  Kan- 
sas. Subsequent  application  to  winter  wheal  develop- 
ment in  other  regions  has  shown  some  large  errors. 
Refinements  are  being  made  in  the  application  of  the 
Robertson  model  to  winter  wheat.  A function  to  ac- 
count for  vernalization  in  this  crop  is  needed  to  im- 
prove this  model. 

Since  both  planting  dates  and  varieties  of  wheat 
vary  across  the  U.S.  Great  Plains  and  other  large  pro- 
ducing regions,  there  is  a problem  of  uniformity  in 
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FIGURE  1.— llliHtntiM  af  RoWrt— ’«  hlalaglest  (tow  wait 
with  Interval  (facing  far  Marqalt  (print  wheat  In  Canada. 


the  occurrence  of  specific  stages.  Thus,  for  the  pur- 
pose of  defining  the  status  of  wheat  in  any  particular 
crop  reporting  district  (CRD),  the  dates  of  suc- 
cessive stages  of  development  refer  to  the  times 
when  SO  percent  of  the  crop  in  that  district  has 
reached  the  specified  stage.  The  types  of  field  obser- 
vations which  the  U.S.  Department  of  Agriculture 
Economics,  Statistics,  and  Cooperatives  Service 
(USDA  ESCS)  keeps  are  compatible  with  this  defini- 
tion. Their  estimates  of  crop  stage  in  each  CRD  are 
always  given  as  the  percentage  of  total  crop  acreage 
which  has  reached  a particular  stage.  The  USDA 
ESCS  data  were  valuable  to  LACIE  in  testing  the  ac- 
curacy of  various  crop  calendar  models  built  on  the 
Robertson  phonological  scale. 


LACIE  APPROACH  TO  CROP  CALENDAR 
MODELS  POR  WHEAT 

The  types  of  wheat  development  models  and  their 
extent  of  sophistication  have  been  limited  primarily 
by  the  available  data.  The  operational  aspect  of 
LACIE  requires  these  models  to  use  a minimum  of 
meteorological  variables  as  daily  inputs.  The  most 
common  and  readily  available  of  these  are  daily  max- 
imum and  minimum  temperatures  and  precipitation 
touts.  A description  of  the  model  types  which  have 
been  developed  or  applied  during  LACIE  follows. 


Multivariata  Laaat-Squaraa  Taehnlquaa1 

Phonological  data  as  reported  at  the  CRD  level 
and  environmental  data  from  the  National  Climatic 
Center  were  used  in  developing  and  testing  an  ad- 
jusuble  crop  calendar  model  for  winter  wheat. 
Generalized  least-squares  **chniques  were  applied 
for  parameter  estimation  ■ notions  to  predict  the 
winter  wheat  phenologica)  stage,  • • h environmental 
values  as  independent  variables,  the  independent 
variables  investigated  included  daily  maximum  tem- 
perature (T4),  daily  minimum  temperature  {Tm)t 
daily  day  length  (0/ ),  and  daily  precipitation  (Pr), 

The  outstanding  feature  of  the  generalized 
multivariate  least-squares  procedure  used  for 
parameter  estimation  is  the  fact  that  the  sums  of 
squares  of  residuals  for  all  independent  variables  are 
simuiuneously  minimized.  In  using  this  approach,  a 
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condition  equation  set  is  linearized  ' •»  Taylor's 
series.  These  linearized  equations  are  used,  with 
Lagrangian  multipliers,  to  augment  the  least-squares 
condition,  resulting  in  a general  constrained 
minimum  to  be  enforced.  Solution  of  the  resulting 
total  normal  equation  set  by  matrix  partitioning 
tesults  in  parameter  estimates  and  their  associated 
voriance-co variance  matrix.  The  method  may  be 
used  m an  iterative  manner  for  nonlinear  functional 
forms. 

After  parameter  estimation,  tests  were  conducted 
on  independent  data.  Fmm  these  tests,  it  may 
generally  be  concluded  that  exponential  functions 
have  little  advantage  over  polynomials.  Precipitation 
was  not  found  to  signifcantly  affet;  the  (its.  The 
Robertson's  triquadratic  form,  in  general  use  for 
spring  wheat,  was  found  to  show  promise  for  winter 
wheat;  however,  speciii  techniques  and  care  are  re- 
quired for  its  use.  In  most  instances,  equations  with 
nonlinear  effects  wee  found  to  yield  erratic  results 
when  used  with  daily  environmental  values  as  inde- 
pendent variables.  Thus,  as  of  this  writing,  the  iinear 
function  of  the  form 

R - //,  + //,7>  //,rm+  H4Dl 

is  recommended,  where  R represents  the  daily  rate 
of  development.  Specific  coefficients,  designated  H,, 
recommended  for  inclusion  and  testing  in  the 
LAC1E  project  are  given  in  table  II. 

Specific  recommendations  for  further  work  in- 
clude preparation  and  inclusion  of  additional  data  in 
the  least-squares  programs;  preparation  of  more  ex- 
tensive testing  programs  and  data,  to  include  in- 
vestigation of  the  effects  of  using  averaged  environ- 


Table  //.— Coefficients  Recommended  for  Inclusion 
and  Testing  in  LAC  IE 


Stair 

(a) 

H/ 

M; 

H 3 

«4 

PE 

-0.014919 

0.0038970 

0 

0 

E.J 

- .00039918 

00043509 

0 

0 

J-H 

-.216419 

0 

0 

019021 

It-S 

.314583 

0 

0 

-018610 

S-R 

244711 

0046211 

0015439 

- 022684 

4P 1 - planting  tucmefgsrfwv.  | J - cmcrigcnsr  to jointing.  j-H  » jutiumgUi 
heading.  H-h  «■  heading  u<  suit  dough,  VR  * wit  duugh  to  rtpc 


mental  data  for  predictions;  further  work  on  the 
Robertson's  triquadratic  model;  and  variance  prop- 
agation studies. 


Itormhro  Rooroaeion2 

The  effort  to  rederive  the  Robertson  spring  wheat 
model  by  fitting  it  to  phenologies!  data  for  winter 
wheat  began  with  assembling  the  50-percent-stage 
dates  for  23  CRD's  in  7 states.  These  data  varied  in 
definition  as  well  as  completeness.  For  example, 
some  states  reported  booting  instead  of  jointing  and 
turning  instead  of  soft  dough.  A summary  of  the 
stage  data  available  appears  in  table  III. 

A single  representative  station  was  selected  for 
each  CRD,  and  a meteorological  data  base  was  built 
which  included  the  growing  season  for  each  location 
year.  These  data  were  daily  observations  of  max- 
imum and  minimum  temperatures.  Day  length  was 
computed  from  the  Julian  date  and  the  latitude  of  the 
station. 

The  iterative  regression  fitting  technique 
developed  by  Robertson  is  described  elsewhere  in 
this  volume  (Whitehead  et  al.,  "Growth  Stage 
Estimation").  The  original  coefficients  for  the  spring 
wheat  model  (table  IV)  were  used  as  seeds  in  fitting 
each  stage.  An  improved  fit  was  found  in  the 
emergence-to-jointing  and  the  soft-dough-to-ripe 
stages.  The  rederived  model  for  emergence  to  joint- 
ing was  particularly  successful  in  reducing  the  bias  at 
jointing  (table  V).  However,  the  accumulating  posi- 
tive bias  was  disastrous  if  the  model  was  run  from 
the  observed  planting  date  (table  VI).  The  original 
Robertson  coefficients  for  planting  to  emergence, 
jointing  to  heading,  and  heading  to  soft  dough  were 
retained.  The  complete  set  of  coefficients  for  the 
rederived  form  appears  in  table  VII. 

The  latest  modeling  attempt  has  been  an  effort  to 
identify  and  incorporate  a moisture  variable  in  a tri- 
quadratic model.  Instead  of  precipitation  amounts, 
precipitation  occurrence  (rain  days)  was  selected  as  a 
single-station  variable  which  could  represent  an  en- 
tire CRD.  This  variable  was  transformed  from  a (0.1] 
form  to  a decimal  value  by  means  of  a low-pass-filter 
function  designed  to  simulate  a 30-day  moving 
average.  This  function  computed  a daily  value  which 
explains  roughly  95  percent  of  the  variance  of  occur- 
rence in  the  preceding  30  days.  This  filtered  rain-day 
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Tak.£  III. — Phonological  Data  A variable  for  Winter  Wheat 
by  Stages  and  State 


Slate 

Yeaet 

No. 

Development  tiage  trf  crop 

CRD** 

P-E 

E-J 

J-M 

H-S 

S-R 

Colorado 

1972-7$ 

i 

20 

II 

10 

0 

0 

Idaho 

197J-7S 

2 

$ 

0 

0 

0 

0 

Kansas 

1967.71 

$ 

0 

0 

30 

31 

34 

Missouri 

I970-73 

1 

0 

0 

0 

4 

4 

Montana 

1971-7$ 

2 

0 

0 

0 

10 

10 

North  Dakota 

1971.7$ 

$ 

0 

0 

10 

19 

0 

Oklahoma 

1967.7$ 

$ 

21 

21 

21 

19 

0 

Total 

23 

46 

32 

71 

S3 

4* 

Table  /K— Characteristic  Coefficients  Developed  by  Robertson  for 
the  Spring  Wheat  Crop  Calendar 

Coefficient 

Development  nope  of  crop 

P-E 

E-J 

J-H 

H-S 

S-R 

4, 

V,  - 1.0 

1.413 

10.93 

1094 

24.31 

«. 

»;  - io 

1.00$ 

.9256 

1.384 

— 1.140 

4. 

y,  - i.o 

0 

-.06025 

-.01191 

0 

K 

44.37 

23*4 

42*5 

42.11 

37*7 

b, 

.010*6 

-003512 

0002951 

00024SS 

.00006733 

b, 

-.0002230 

00005026 

0 

0 

0 

009732 

0003666 

0005943 

00003109 

0003442 

'J 

-.0002267 

-000004212 

0 

0 

0 

variable  applied  (o  the  same  data  set  as  the  rederiva- 
tion except  that  it  was  substituted  Tor  the  day-length 
variable  in  the  Robertson  form  The  results  showed 
improvement  over  the  original  form  in  all  stages  ex- 
cept planting  to  emergence  (table  V).  When  all 
models  were  run  sequentially  from  planting,  the 
results  were  quite  reasonable  (table  VI).  Further  test- 
ing of  these  models  is  underway  with  expansion  of 
an  independent  test  set.  Further  development  is  still 
possible  in  the  planting-to-cmergence  stage. 


However,  now  it  may  be  concluded  that  a moisture 
variable  may  be  successfully  substituted  for  day 
length  in  the  Robertson  triquaJratic  form.  The  stages 
for  which  these  coefficients  were  calculated  appear 
in  table  VM).  The  three  stages  in  which  the  moisture 
quadratic  appears  have  coefficients  which  describe  a 
concave-down  quadratic  which  expresses  a decrease 
in  growth  as  precipitation  occurrence  increases.  This 
is  logical  in  experience  with  maturity  rates  of 
moisture-stressed  wheat. 
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Table  V. — Errors  in  Days  Given  the  Observed  Stage  TABLE  VI. — Error  in  Days  When  Run  From  the 

and  Starting  Date  Observed  Planting  Date 


Error 

PE 

E-J 

J-H 

H-S 

SR 

Error 

P-E 

E-J 

J-H 

H-S 

S-R 

1-2 

2-3 

3-4 

4-5 

5-6 

1-2 

2-3 

3-4 

4-5 

5-6 

Original 

Original 

Bias 

6.78 

16.97  - 

8.46 

-1.77 

6.71 

Bias 

6.78 

20.00 

0.57 

-1.19 

7.15 

RMSE8 

9.32 

20.45 

9.66 

6.03 

7.72 

RMSE 

9.32 

24.73 

8.20 

7.00 

11.40 

Rederived 

Rederivei 

Bias 

12.07 

6.25 

0.51 

0.61 

0.19 

Bias 

28.57 

24.50 

24.40 

22.53 

RMSE 

13.38 

11.58 

4.” 

4.31 

3.84 

RMSE 

35.92 

3360 

28.67 

27.71 

New  ( RDCC *) 

New  (RDCC) 

Bias 

— 1.47 

0.62 

-0.08 

-0.25 

Bias 

1.66 

2.26 

2.18 

0.49 

RMSE 

10.06 

5.04 

4.23 

3.54 

RMSE 

17.57 

12.83 

11  18 

11.74 

tf 

46 

32 

71 

83 

48 

ft 

46 

94 

102 

103 

55 

aRMSE-  root-mc4n-«Kiu*fe  error 
bRDCC-  rain-day  crop  calendar. 
Si- number  of  obser>  aliens 


Table  VU. — Characteristic  Coefficients  for  the  Winter  Wheat  Crop  Calendar  Based  on  CRD  Data 


Coefficient 

Development  stage  of  erop 

P-E 

E-J 

J-H 

H-S 

S-R 

ao 

V,  - 1.0 

0.8413x10' 

0.1093x10’ 

0.1094X  10’ 

0.1262x10’ 

a, 

V,  - 1.0 

. 1005.<10' 

.9256 

.1389X10' 

.1224x10" 

a2 

K - io 

-.8311X10"' 

-.6025X10"' 

-.8191X10" 

0 

0.4437  x 10*’ 

.1971x10’ 

.4265X10’ 

42I8X  10’ 

.4779 x 10’ 

.1086X10*' 

.2202X10*’ 

.2958X  10"’ 

.2458x10*’ 

.6146X  io" 

-.2230X  10"’ 

-.3376x10*’ 

0 

0 

.3178x10*" 

.9732x  10*’ 

.2707  x IO"4 

.5943X10" 

.3109x10" 

.1511X10" 

-.2267x10*’ 

I2l5xl0*‘ 

0 

0 

-.5998x10"’ 
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Table  VUL — Characteristic  Coefficients  of  New  (RDCC)  Winter  Wheat  Crop  Calendar 


Coefficient 

Development  stage  of  crop 

P-E 

EJ 

J-H 

H-S 

SR 

% 

V,  - 1.0 

-1.272X10° 

-8.482x10'* 

1 

© 

-5.015X10" 

V,  - 1.0 

1.409  xttf 

' 1..WIX  li'i' 

l',  - 1.0 

6.373  x 10" 

r, , - t o 

-6.099  x fO" 

-1.607x10' 

yt  - i.o 

— 2.478 x 10" 

\ 

4.437X  to' 

2.207X10' 

6.090X  10' 

5.042X10' 

3.632X10' 

A 

1.086*  10“’ 

4.331  x 10 

2.277XI0" 

2.930  x 10' ‘ 

-3.461X10° 

•>, 

—2.230*  10" 

-6.646X10'* 

- 3.377  X10" 

-4.425X10" 

9.340X  10'* 

<■« 

9.732  X 10' 1 

1.035X10" 

1.966X10" 

9.090X  10'* 

1.183X10' 

ri 

-2.267X10"* 

/ 

2.953X  10" 

2.230X  10'* 

-6.813X  10" 

-2.712X10' 

Working  Day  Concapt3 

In  order  to  utilize  phenological  development 
models,  knowledge  about  the  planting  date  is  re* 
quired.  In  general,  sufficient  information  on  the  ac- 
tual planting  date  is  not  readily  available  in  a timely 
manner.  Hence,  the  starter  model  should  be  con- 
sidered an  integral  first  stage  of  a complete 
phenological  development  model.  Complementary 
studies  by  Feyerherm,  Stuff  and  Phinney,  and  Lytle 
et  al.  (refs.  IS  to  17,  respectively)  have  used 
meteorological  and  simple  agronomical  information 
to  predict  planting  dates  for  spring  and  winter  wheat. 

Feyerherm’s  study  considered  the  effects  of  tem- 
perature and  precipitation  on  accumulated  warming/ 
planting  (WP)  days.  The  genera)  form  of  the  model 
was  as  follows. 

1.  WP  - 0 if  TA  32 

2.  WP  - a ( TA  - 32){PRE)  if  32  < TA  a?  32  + 
1/a 

3.  WP  - 1 if  TA  > 32  + 1/a 


3D.  E.  Phinney. 


where  TA  * the  average  daily  air  temperature  (°F), 
a - the  threshold  value,  and  PRE  — a value  be- 
tween 0 and  1 as  a function  of  the  previous  3 days  of 
precipitation.  His  study  found  that  for  spring  wheat, 
a - 0.1.  No  statistically  significant  precipitation 
effect  was  found,  and  PRE  — 1 was  ultimately  used. 

The  date  for  50-percent  planting  of  spring  wheat 
was  estimated  from  a degree-day-type  summation, 
beginning  on  January  19.  When  the  accumulated 
warming/planting  days  reached  3S.S,  it  was  assumed 
that  SO  percent  of  the  crop  had  been  planted. 

Stuff  and  Phinney  developed  an  equation  for  the 
daily  rate  of  spring  wheat  planting  based  on  tem- 
perature, precipitation,  and  the  normal  planting  date. 

R - -0.77  + 0.045(r)  - 0.032(/>)  4-  0.053(/Vr) 

where  R — the  daily  rate  of  planting,  T — the 
average  daily  temperature,  P - the  total  daily  pre- 
cipitation, and  N - the  normal  planting  date  (actual 
date). 

Lytle  et  al.  (ref.  17)  derived  area-specific  equa- 
tions for  each  CRD  in  South  Dakota  as  a function  of 
temperature,  precipitation,  trend,  and  the  difference 
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between  precipitation  and  the  Thornth waite  (ref.  18) 
potential  evapotranspiration. 

To  date,  a comparative  test  of  spring  wheat  starter 
models  over  the  same  test  set  has  not  been  carried 
out.  A preliminary  analysis  indicates  that  both  the 
genera!  models  have  a standard  error  of  estimate 
near  6.5  days  in  North  Dakota.  The  corresponding 
figure  for  the  area-specific  Lytle  model  is  approx- 
imately 4.S  days.  An  analysis  of  the  errors  imparted 
by  incorrect  start  dates  is  required  to  evaluate 
whether  or  not  the  more  universal  formulations  are 
satisfactory  for  general  application. 

To  date,  no  starter  model  for  winter  wheat  has 
been  developed  which  shows  improvement  over  the 
use  of  the  normal  planting  date. 


SUMMARY  AND  CONCLUSIONS 

The  success  of  the  LACIE  crop  calendar  models 
was  remarkable,  considering  the  number  of  factors 
which  limited  their  potential  accuracy.  Over  several 
years,  the  use  of  USDA  ESCS  normal  planting  dates 
as  initiators  of  the  biometeorological  time  scale 
(BMTS)  may  not  impart  significant  error  to  a crop 
calendar  model.  However,  for  any  specific  year,  there 
may  be  a large  error  component  associated  with  the 
use  of  normal  planting  dates  as  starters  for  the 
BMTS.  For  this  reason,  further  work  in  starter 
models,  such  as  the  Feyerherm  model,  is  recom- 
mended. 

Because  of  the  geographic  scale  to  which  these 
models  were  applied,  much  of  their  inaccuracy  can 
be  attributed  to  spatial  errors  in  the  variables.  The 
degree  to  which  point-source  meteorological  data 
from  selected  stations  adequately  describe  the  condi- 
tions in  a CRD  requires  further  evaluation.  An  objec- 
tive analysis  procedure  is  currently  being  developed 
at  the  NASA  Johnson  Space  Center  to  study  the  sen- 
sitivity of  crop  calendar  models  to  station  density. 
Additional  model  error  which  is  spatial  in  origin  may 
be  attributed  to  the  estimates  of  crop  stages  at  the 
CRD  level  made  by  the  USDA  ESCS 

Other  spatial  sources  of  error  are  in  the 
differences  in  soils,  management  practices,  and 
varieties  over  large  regions.  These  factors  may  have 
unaccounted  for  effects  on  crop  development.  Un- 
fortunately, they  are  difficult  to  quantify  and  incor- 
porate into  a crop  calendar  model.  An  efTort  to  solve 
this  problem  has  been  undertaken  by  LACIE. 

Lastly,  the  model  form  itself  may  contribute  a 
large  proportion  of  error.  The  original  Robertson  ap- 


proach was  the  triquadratic  multiplicative  model 
using  temperature  and  day  length  to  predict  the  stage 
of  crop  development.  Further  development  of  addi- 
tive models  in  which  one  variable  may  override  or 
substitute  for  another  is  recommended.  Nix  (ref.  19) 
has  developed  a model  in  which  temperature  and  day 
length  are  completely  substitutive.  On  the  other 
hand,  the  iaw-of-the-minimum  approach  (CateetaJ., 
discussed  elsewhere  in  this  volume)  to  crop  develop- 
ment may  be  valuable  as  an  alternative  model  since  it 
incorporates  the  use  of  critical  levels  for  the  input 
variables 

In  physiological  research,  there  is  no  conclusive 
evidence  to  show  that  the  integration  of  environ- 
mental effects  manifested  by  crop  development 
follows  an  interactive,  additive,  or  Iaw-of-the- 
minimum  mode  exclusively.  For  this  reason,  all 
model  forms  should  be  comparatively  evaluated. 
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New  Developments  In  Sampling  and  Aggregation 
for  Remotely  Sensed  Surveys 

A.  H.  Feivesorfl 


BASIC  SAMPLING  AND  AGGREGATION 
INLACIE 

In  a large-scale  crop  inventory  project  like 
LACIE,  where  procedural  and  resource  constraints 
severely  restrict  the  total  sample  size,  it  is  of  prime 
importance  to  optimize  the  placement  of  samples 
and  to  achieve  the  greatest  possible  accuracy  when 
constructing  a large-area  crop  acreage  estimate  from 
individual  sample  observations.  If  the  crop  distribu- 
tion in  a country  is  fairly  stable  from  year  to  year  and 
a comprehensive  set  of  historical  data  exists,  then 
that  data  can  be  used  effectively  for  distributing  sam- 
ples and  for  ratio-estimating  acreage  for  regions 
where  sampling  is  sparse  or  nonexistent. 

In  the  United  States,  the  distribution  of  wheat 
does  not  vary  greatly  from  year  to  year  and  the  5- 
year  census  of  county-level  crop  statistics  is  readily 
obtainable  and  of  good  quality.  For  these  reasons, 
county-level  historical  data  have  been  used 
throughout  LACIE  to  allocate  samples  in  the  U.S. 
Great  Plains. 

Counties  were  divided  into  three  categories  of 
sampling  density  (Group  I,  Group  II,  and  Group  III) 
on  the  basis  of  data  from  the  last  available 
agricultural  census  (either  1969  or  1974),  so  as  to 
minimize  the  best  a priori  estimate  of  the  variance  of 
the  total  U.S.  Great  Plains  wheat  production  esti- 
mate. Details  of  the  guiding  philosophy  and  actual 
mechanics  of  the  categorization  and  sampling  are 
given  in  the  paper  by  Hallum  et  al.  entitled  “Sam- 
pling, Aggregation,  and  Variance  Estimation  for 
Area,  Yield,  and  Production  in  LACIE”  and  in  the 
paper  by  Feiveson  et  al.  entitled  “LACIE  Sampling 
Design.” 

In  addition  to  placing  the  samples,  the  census  data 
were  also  used  to  obtain  ratioed  acreage  estimates  for 
counties  or  groups  of  counties  not  sampled  (either  by 


aNASA  Johnson  Space  Center,  Houston,  Texas. 


design  or  because  of  loss  of  data).  Essentially,  this 
was  done  by  assuming  that  the  ratio  of  the  wheat 
acreage  of  a nonsampted  county  to  that  of  nearby 
sampled  counties  was  the  same  for  the  year  of 
LACIE  estimation  as  it  was  for  the  historical  census 
year.  For  details,  the  reader  is  again  referred  to  the 
papers  by  Hallum  et  al.  and  Feiveson  et  al. 


PROBLEM  AREAS  IN  SAMPLING 

In  some  foreign  areas  (Brazil,  China,  Argentina, 
India,  and  the  U.S.S.R.),  historical  data  are  not 
available  at  the  same  level  of  detail  as  in  the  United 
States;  furthermore,  the  accuracy  of  the  data  is 
unknown.  For  example,  in  the  U.S.S.R.,  the  smallest 
political  region  for  which  published  historical  wheat 
acreage  and  production  data  exist  is  the  oblast,  which 
is  considerably  larger  than  a U.S.  county.  By  relying 
solely  on  the  historical  data  for  the  U.S.S.R.,  one 
could  do  no  better  than  throw  samples  randomly  or 
perhaps  systematically  within  an  oblast.  In  Phases  II 
and  III  of  LACIE,  it  was  observed  that,  because 
some  oblasts  were  very  large,  their  effectiveness  as 
strata  was  diminished;  i.e.,  large  within-stratum  sam- 
pling variation  was  found.  Even  in  the  United  States, 
where  county-level  information  was  available,  large 
sampling  errors  were  suspected  in  certain  areas.  Con- 
sequently, an  effort  was  made  to  develop  and  test  a 
stratification  procedure  based  on  natural  topographic 
or  climatic  variables  rather  than  on  political  bound- 
aries. 

This  experimental  effort,  called  the  Natural  Sam- 
pling Strategy  Test  (NSST),  was  run  in  parallel  with 
the  standard  LACIE  procedure  during  Phase  III  to 
determine  whether  areas  of  uniformity  with  respect 
to  climate,  soil  type,  and  distribution  of  agriculture 
would  make  better  strata  than  oblasts  or  their 
equivalent  in  countries  without  detailed  historical 
data. 

To  determine  the  validity  of  the  hypothesis,  areas 
having  the  above  uniformity  characteristics  were 
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delineated  in  North  Dakota  and  Kansas  and  in  three 
oblasts  in  the  U.S.S.R.  The  areas  were  typically  larger 
than  U.S.  counties  but  smaller  than  states.  These 
homogeneous  areas  or  “agrophysical  units”  (APU's) 
were  used  as  strata  for  subsequent  sampling  and  ag- 
gregation. 

In  order  for  the  U.S.  experiment  to  simulate  the 
situation  in  a foreign  country  without  detailed  data, 
only  state-level  historical  data  were  used  to  allocate 
samples  and  construct  ratio  estimates  where  neces- 
sary. Because  the  APU  boundaries  did  not  generally 
coincide  with  state  boundaries,  it  became  necessary 
to  apportion  the  historical  wheat  acreage  of  a state  to 
those  parts  of  the  APU's  that  were  included  in  the 
state.  This  was  done  by  assuming  that  a particular 
APU's  share  was  proportional  to  its  agricultural  area, 
which  was  known  approximately. 

One  problem  with  having  strata  cross  political 
boundaries  is  that,  in  some  countries  (such  as  the 
U.S.S.R.),  the  amount  of  wheat  planted  might  be 
determined  by  political  considerations.  If  this  is  the 
case,  the  wheat  distribution  might  change  drastically 
when  a political  boundary  within  a country  is 
crossed.  To  protect  against  this  possibility,  it  was 
decided  in  the  U.S.  simulation  to  construct  substrata 
by  intersecting  APU's  with  states.  These  substrata, 
called  refined  strata,  were  then  treated  as  strata  if 
they  were  large  enough  to  be  allocated  at  least  one 
sample  unit;  otherwise,  their  wheat  acreage  was  esti- 
mated by  ratioing.  Since  only  state-level  data  were 
used  to  construct  the  ratios,  this  procedure  could 
have  caused  considerable  bias  in  a particular  refined 
stratum;  however,  the  amount  of  wheat  involved 
was  only  a small  fraction  of  the  total. 

Results  of  the  NSST  were  inconclusive.  In  the 
United  States,  the  NSST  estimate  of  wheat  acreage  in 
Kansas  was  not  significantly  different  from  the 
LACIE  estimate  in  terms  of  standard  errors.  (The 
NSST  estimate  was  actually  closer  to  the  U.S.  Depart- 
ment of  Agriculture  (USDA)  estimate  than  to  the 
LACIE  estimate.)  In  the  U.S.S.R.,  no  reliable  third 
estimate  of  wheat  acreage  was  available.  In  two  of  the 
three  oblasts  estimated,  the  official  LACIE  and 
NSST  estimates  of  acreage  were  less  than  2 standard 
errors  apart;  however,  in  Kurgan,  the  NSST  estimate 
was  about  3 times  the  LACIE  estimate,  a difference 
of  about  20  standard  errors!  It  was  suspected  that 
large  errors  were  caused  by  the  apportionment  pro- 
cedure for  computing  ratios.  For  details,  see  the 
paper  by  Hallum  and  Basu  entitled  “Natural  Sam- 
pling Strategy,”  which  describes  the  NSST  in  depth. 

In  summary,  it  appeared  that  the  NSST  might  be 


feasible  for  stratifying  countries  without  detailed 
historical  data  but  that  great  problems  still  remained 
in  determining  sample  allocations  and  in  ratio 
estimation  for  unsampled  areas. 

Although  not  reported  in  these  proceedings,  R.W. 
Thomas  and  C.M.  Hay  of  the  University  of  Califor- 
nia at  Berkeley  tried  to  improve  the  approach  of  the 
NSST  by  using  a two-phase  sampling  scheme.  This 
approach  obtained  crude  estimates  of  wheat  acreage 
for  areas  covered  by  a full  Landsat  frame  (about  90 
by  90  nautical  miles)  and  then  used  the  crude  esti- 
mates to  allocate  samples  for  intensive  study  within 
the  area.  This  two-phase  sampling  scheme  was  pre- 
sented at  the  1977  Symposium  on  Machine  Process- 
ing of  Remotely  Sensed  Data. 


PROBLEM  AREAS  IN  AGGREGATION 

Wheat  proportion  estimates  were  probably  much 
more  reliable  in  some  LACIE  segments  than  in 
others.  For  example,  not  all  segments  have  the  same 
acquisition  history;  some  have  all  the  data  necessary 
to  make  a good  estimate,  whereas  others  may  have 
data  from  only  one  Landsat  pass.  This  large  variance 
in  reliability  has  been  ignored  until  now;  i.e.,  in  the 
aggregation  process,  wheat  proportions  estimated 
from  a minimal  amount  of  data  were  treated  the 
same  as  those  estimated  from  complete  sets.  As  a 
result,  some  very  poorly  estimated  segments  had  too 
large  a part  in  determining  the  final  large-area  wheat 
production  estimate.  To  alleviate  this  problem,  sup- 
porting research  has  been  conducted  to  develop  a 
weighted  aggregation  scheme  in  which  each  stratum 
estimate  would  be  a weighted  average  between 
“direct"  (based  only  on  current  sample  segments) 
and  “historical”  (based  on  Group  III  ratio)  estimates. 
This  scheme  is  designed  to  give  more  weight  to  the 
historical  estimate  when  segment  estimates  are 
thought  to  be  unreliable,  and  vice  versa.  For  details 
of  this  procedure,  see  the  paper  by  Feiveson  entitled 
“Weighted  Aggregation." 

Another  way  in  which  a LACIE-type  survey 
could  be  improved  is  to  use  segment  proportion  esti- 
mates from  previous  years  in  obtaining  the  current- 
year  estimate.  At  Texas  A & M University,  support- 
ing research  has  been  in  operation  to  develop  a pro- 
cedure for  both  sampling  and  aggregation  that  uses 
data  from  previous  years  to  the  best  advantage.  This 
procedure  is  described  in  the  paper  by  Hartley  en- 
titled “Multiyear  Estimates  for  the  LACIE  Sampling 
Plans.” 
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CONCLUSIONS 

In  summary,  there  have  been  four  sampling  and 
aggregation  research  tasks  carried  out  in  parallel  with 
LACIE  operations— the  natural  sampling  strategy, 
two>phase  sampling,  weighted  aggregation,  and 


multiyear  estimation.  Although  little  or  no  opera* 
tional  testing  of  theoretically  derived  procedures  has 
been  done  to  date,  it  appears  that  future  LACIE*type 
surveys  should  consider  a thorough  testing  of  some 
or  all  of  these  methods  for  possible  implementation. 
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Natural  Sampling  Strategy 

C R.  HaUunfi  andJ.  P.  Basil* 


INTRODUCTION 


Background 

The  LACIE  sampling  strategy  is  designed  to  esti- 
mate cost-effectively  and  with  predesignated  preci- 
sion the  wheat  area  and  production  in  countries  of 
interest.  The  level  of  precision  depends  on  the  sam- 
ple size  and  is  adversely  affected  by  the  variability  or 
heterogeneity  of  wheat  density  and  yield.  Stratifica- 
tion is  a means  of  effectively  reducing  this 
heterogeneity  and  improving  the  efficiency  of  the 
LACIE  sample  design. 

During  Phase  II  of  LACIE,  the  methodology  was 
developed  for  using  Landsat  imagery  and  agrophysi- 
cal data  to  permit  an  improved  stratification  in 
foreign  areas  by  ignoring  political  boundaries  and 
restratifying  along  boundaries  that  are  more  homo- 
geneous with  respect  to  the  distribution  of 
agricultural  density,  soil  characteristics,  and  average 
climatic  conditions.  These  considerations  formed  the 
basis  for  the  decision  in  Phase  II  to  redesign  the  sam- 
pling strategy  for  the  purpose  of  having  an  improved 
foreign  sampling  strategy  based  on  natural  stratifica- 
tion and  using  Landsat  imagery  with  less  dependence 
on  historical  data.  The  primary  motivation,  then,  for 
a redesign  of  the  initial  sampling  strategy  was  to  use  a 
stratification  in  foreign  areas  that  is  more  homo- 
geneous than  political  subdivisions.  The  resulting 
strategy  would  provide  a common  approach  for  all 
countries  and  should  permit  the  same  precision  (as 
achieved  with  the  initial  design)  but  with  fewer  seg- 
ments. Its  use  domestically  was  planned  to  permit 
better  applicability  of  the  “yardstick”  region  as  a 
quantifier  of  foreign  results.  The  former  LACIE 
sampling  strategy  is  referred  to  as  the  “initial"  or 
“first-generation"  sampling  strategy,  the  latter  as  the 
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“natural”  or  “second-generation”  sampling  strategy. 
Because  of  the  manner  in  which  the  natural  strata 
were  created,  they  are  commonly  referred  to  as 
“agrophysical  units”  (APU's). 

The  remainder  of  this  paper  is  a description  of  the 
natural  stratum-based  sampling  scheme  and  the  ag- 
gregation procedures  for  estimating  wheat  area, 
yield,  and  production  and  their  associated  prediction 
error  estimates.  A summary  of  test  results  will  be 
given  including  a discussion  of  the  various  problems 
encountered. 


PhaM  III  Seopaof  ttw 
Natural  Sampling  Strategy 

The  natural  sampling  strategy  was  implemented 
for  Phase  III  in  an  off-line  mode  for  two  states  (Kan- 
sas and  North  Dakota)  in  the  U.S.  Great  Plains  and 
three  oblasts  (Kurgan,  Kustanay,  and  Tselinograd) 
in  the  U.S.S.R.  spring  wheat  indicator  region.  The 
initial  sampling  strategy  was  retained  in  an  opera- 
tional mode  over  these  areas  for  the  purpose  of  com- 
paring the  estimates  from  the  two  strategies.  The 
natural  sampling  strategy  design  for  the  U.S. 
yardstick  area  was  developed  using  procedures  and 
data  input  requirements  so  that  the  performance 
parameters  estimated  from  the  U.S.  evaluation 
would  be  as  applicable  as  possible  to  the  U.S.S.R. 
region.  Moreover,  the  Phase  III  evaluation  of  the 
natural  sampling  strategy  was  conducted  in  parallel 
with  and  over  the  same  regions  as  the  Feyerherm 
yield  estimation  model  (see  the  paper  by  Feyerherm 
and  Paulsen  entitled  “A  Universal  Model  for 
Estimating  Wheat  Yields”).  In  summary,  the 
motivation  for  the  Phase  III  scope  of  the  natural 
sampling  strategy  was  to  describe  and  test  (using 
Kansas  and  North  Dakota  as  quantifiers  from  the 
U.S.  yardstick  region)  the  sampling  scheme  as  well  as 
the  procedures  for  aggregating  estimates  of  wheat 
area,  yield,  and  production  and  their  associated  pre- 
diction error  estimates  in  LACIE  foreign  areas. 
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PhaM  III  Tots 

Several  tests  were  performed  during  Phase  HI  to 
evaluate  the  effectiveness  and  efficiency  of  the 
natural  sampling  strategy.  These  included  evalua- 
tions of  the  extent  of  increased  homogeneity  of  yield 
as  well  as  that  of  the  agricultural  and  wheat  density 
achieved  by  the  restratiflcation;  also,  comparisons 
were  made  of  various  aggregations  as  summarized  in 
the  following  paragraphs. 

Estimates  of  wheat  area  and  production  and  their 
associated  variances  were  obtained  at  different  times 
during  the  growing  season  in  the  two  states  in  the 
United  States  and  the  three  oblasts  in  the  U.S.S.R. 
The  natural  sampling  strategy  area  and  production 
estimates/statistics  were  then  compared  with  those 
of  the  initial  sampling  strategy  and  of  the  Statistical 
Reporting  Service  (SRS)  (in  the  United  States).  Esti- 
mates were  made  with  the  natural  sampling  strategy 
using,  first  of  all,  a completely  new  set  of  segments 
(i.e.,  no  attempt  was  made  to  use  any  previous 
LACIE  segments).  These  segments  were  those 
resulting  from  a direct  application  of  the  standard 
stratified  random  sampling  scheme  applied  directly 
to  the  second-generation  strata.  Estimates  were  also 
made  using  a statistically  feasible  mixture  of  first- 
and  second-generation  strategy  segments1  (i.e.,  deter- 
minations were  made  in  regard  to  which  of  the  first- 
generation  sample  segments  together  with  a subset  of 
the  second-generation  sample  segments  result  in  a 
random  sample  within  each  stratum  of  the  natural 
sampling  strategy— this  procedure  is  detailed  in  ap- 
pendix A of  this  paper).  The  primary  motivation  for 
attempting  to  use  as  many  of  the  first-generation 
strategy  segments  as  possible  was  to  permit  the 
analysts  to  use  the  collected  history  available  on  such 
segments. 

The  previously  described  inputs  were  made  in 
combination  with  (1)  tne  use  of  the  Feyerherm  yield 
model  applied  at  the  natural  stratum  level  (over  Kan- 
sas) and  (2)  the  use  of  the  Center  for  Climatic  and 
Environmental  Assessment  (CCEA)  yield  model 
applied  at  the  political  subdivision  level.  (Details  of 
the  CCEA  model  are  included  in  the  paper  by  Strom- 
men  et  al.  entitled  “Development  of  LACIE  CCE  A-I 


*The  sample  segments  randomly  distributed  within  the  sec- 
ond-generation straw  are  referred  to  as  second-generation  seg- 
ments; those  randomly  distributed  within  the  political  subdivision 
(county,  oblast,  etc.)  straw  of  the  first-generation  strategy  are 
referred  to  as  first-generation  segments. 


Weather/Wheat  Yield  Models.")  Comparisons  of  the 
production  estimates  from  these  two  inputs  provided 
a further  evaluation  of  the  advanced  (Feyerherm) 
yield  model  in  conjunction  with  the  area  estimator. 


SAMPLING,  ESTIMATION,  AND 
AGGREGATION  PON  AREA 

With  the  exception  of  the  use  of  a different 
stratification  and  differences  in  the  associated  sets  of 
aggregation  logic,  the  initial  and  natural  sampling 
strategies  are  similar.  Consequently,  pans  of  the 
following  discussion  will  be  redundant  relative  to 
some  of  the  material  in  the  Experiment  Design  sec- 
tion. 


Sample  Selection  Proeedure 

The  second-generation  sampling  strategy  uses 

1.  A stratified  random  sampling  without  replace- 
ment scheme 

2.  “Natural"  strata  developed  according  to 
specifications  oriented  toward  achieving  homo- 
geneity in  regard  to  the  distribution  of  agricultural 
density,  soil  characteristics,  and  average  yearly 
climatological  conditions 

3.  The  5-  by  6-nautical-mile  segment  as  the  sam- 
pling unit 

The  total  sample  size  allocated  to  the  area  of  in- 
terest is  such  that  enough  segments  will  be  available 
for  Classification  and  Mensuration  Subsystem 
(CAMS)  processing  to  achieve  a preassigned  coeffi- 
cient of  variation  for  the  at-harvest  estimate  of  pro- 
duction allowing  for  errors  due  to  (1)  sampling,  (2) 
classification,  (3)  yield  prediction,  and  (4)  loss  of 
data.2  The  sampling  frame  is  generated  using  the 
same  procedure  as  described  in  the  paper  by  Hallum 
et  al.  entitled  “Sampling,  Aggregation,  and  Variance 
Estimation  for  Area,  Yield,  and  Production  in 
LACIE.” 


2 The  choice  of  Ihe  preassigned  value  for  ihe  production  coeffi- 
cien.  of  variation  is  dependent  on  the  desired  probabilistic  ac- 
curacy of  the  production  estimate  for  the  area  of  interest.  For  ex- 
ample. if  the  90/90  criterion  is  to  be  satisfied  st  harvest  at  the 
country  level,  then  the  goal  is  <o  obwin  a country  production  esti- 
maw.  at  harvest,  which  is  within  10  percent  of  the  actual  produc- 
tion with  a probability  of  0.9. 
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Stratification 

The  strata  are  developed  based  on  soil  types,  di- 
matic  conditions,  and  agricultural  density.  The 
suitability  or  different  soils  is  ranked  fo  growing 
wheat  and  is  rated  on  the  basis  of  several  soil  charac- 
teristics such  as  texture,  depth,  water-holding  capac- 
ity, drainage,  salinity,  and  slope.  The  stratification 
procedure  is  oriented  toward  achieving  the  same  soil 
suitability  rating  and  similar  agricultural  density 
within  each  stratum.  Also,  the  annual  precipitation 
at  any  two  areas  in  a given  stratum  is  not  to  differ  by 
more  than  SO  millimeters  and  the  average  growing 
season  temperature  is  not  to  differ  by  more  than  1° 
C.  The  resulting  strata  are  referred  to  as  the  “natural 
strata"  or  the  “agrophysical  units."  Further  details  of 
the  stratification  effort  are  included  in  appendix  B of 
this  paper. 

To  obtain  estimates  at  levels  such  as  state  or 
oblast,  the  intersections  of  the  political  subdivision 
with  the  natural  strata  are  used  in  the  aggregation. 
The  strata  that  result  from  these  intersections  are 
referred  to  as  the  “refined  strata."  In  a country  such 
as  the  U.S.S.R.,  considerable  differences  in 
agricultural  practices  frequently  exist,  for  political  or 
other  reasons,  between  two  contiguous  ofc  lasts.  Con- 
sequently, resorting  to  the  refined  strata  as  the  base- 
level  strata  is  the  step  taken  to  include  “political  in- 
fluence" as  a stratification  variable. 


Sample  Allocation 

Sample  allocation  refers  to  the  determination  of 
the  total  number  of  segments  to  be  distributed 
among  the  strata.  These  determinations  are  com- 
pleted, first  of  all,  for  the  natural  strata.  The  sample 
sizes  determined  for  the  natural  strata  are  then  ap- 
portioned to  the  refined  strata  based  on  a propor- 
tional allocation  using  the  proportion  (relative  to  the 
natural  strata)  of  historical  wheat,  from  the  epoch 
year,  present  in  the  refined  strata. 

Determination  of  total  sample  size.— The  total  sam- 
ple size  allocated  to  the  area  of  interest  is  such  that 
the  LACIE  precision  goal  for  the  production  esti- 
mate will,  expectedly,  be  met  allowing  for  errors  due 
to  (1)  sampling.  (2)  classification.  (3)  yield  predic- 
tion. and  (4)  loss  of  data.  The  best  available  a priori 
knowledge  of  the  magnitude  of  these  errors  was  used 
together  with  the  following  assumptions. 

1.  Segment-level  wheat  area  estimates  are 
mutually  independent  and  unbiased. 


2.  Yield  estimates  are  unbiased,  are  mutually  in- 
dependent (at  the  yield  stratum  level),  and  are  inde- 
pendent of  the  acreage  estimates. 

On  this  basis,  it  is  straightforward  to  approximate 
the  mean-squared  prediction  error  for  production  as 
a function  of  the  total  sample  size  n.  In  accordance 
with  the  “Neyman"  or  “optimal"  allocation  (ref.  1). 
minimizing  this  expression  as  a function  of  n subject 
to  the  constraint 


L 

E 
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i.e.,  such  that  the  summed  stratum  sample  size  is  the 
same  as  the  total  sample  size,  results  in  the  following 
choice  for  n. 
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where  n 
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Tj 

J 

CV(P) 


total  number  cf  segments  allocated  to 
the  area  of  interest 

total  number  of  agricultural  (ag)  seg- 
ments in  thejth  natural  stratum 
estimate  of  segment-to-segment  varia- 
tion of  wheat  area  within  the  .Ah  natural 
stratum 

average  yield  potential  determined  from 
soil  characteristics  in  the  fih  natural 
stratum 

standard  deviation  of  yield  potential  in 
the  .Ah  natural  stratum 
total  number  of  strata  in  the  area  of  in- 
terest 

preassigned  value  of  the  coefficient  of 
variation  of  the  production  estimate 


/ 

£ 
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Aj  “ estimate  of  wheat  area  in  the  yth  natural 
stratum  based  on  historical  information 
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< - a conservative  lower-bound  estimate  or 
the  expected  sample  acquisition  rate  by 
the  end  or  a crop  season  (determined 
from  previous  experience  with  loss  of 
segments  due  to  cloud  cover  or  other 
reasons) 

In  case  it  is  not  an  integer,  it  is  rounded  upward  to 
the  integer  just  larger  than  a. 

To  compute  the  wheat  area  variance  Sf,  the 
following  procedure  is  performed.  If  the  7th  natural 
stratum  contains  at  least  Mj  LAClE-processed  seg- 
ments from  the  previous  year  (Mj  defaults  to  the 
value  of  3 in  the  LACIE  software  if  no  overriding 
value  is  specified),  Sf  will  be  the  computed  segment- 
to-segment  variation  of  the  previously  estimated 
LACIE  wheat  area.  In  case  the  number  of  LACIE 
segments  is  less  than  M*  the  estimate  of  the  within- 
stratum  wheat  area  variance  will  be  obtained  from 
the  following  regression  equation. 


S*  ■ (wgm«nt  net)2  £iyJ  Saf  ♦ c{af  * */l)',/(t  - <jjj  (2) 


where  aj  m ag  area  in  the  collection  of  ag  segments 
of  the  jth  natural  stratum  computed 
from  a complete  enumeration  of  ag  pro- 
portion in  each  5-  by  6-nautical-mile 
segment  in  the  stratum  as  determined 
from  a 5-by-6  grid  overlay  onto  Landsat 
color-infrared  imagery 

tj  — gj  I Of  where  a,  is  as  defined  previously 
and  gj  is  the  historical  wheat  area  in  the 
7th  natural  stratum 

Saf  - the  segment-to-segment  within-stratum 
ag  area  variance  in  the  collection  of  ag 
segments  of  the  7th  natural  stratum 
computed  from  the  enumeration  of  the 
S*by-6  ag  proportions 

and  where  c is  a constant  obtained  by  performing  a 
least-squares  fit  to  other  strata  (each  of  which  con- 
tains at  least  M,  LACIE  segments)  in  the  country 
containing  LAClE-processed  segments  from  the  pre- 
vious year.  (If  insufficient  LACIE  segments  are 
available  in  the  country  of  interest,  the  least-squares 
fit  is  performed  on  strata  from  analogous  sreas  from 
other  countries  having  LAClE-processed  segments 
from  the  previous  year.)  The  form  of  equation  (2)  is 
identical  to  that  discussed  in  the  paper  by  Feiveson 
et  al.  entitled  “LACIE  Sampling  Design”;  details  of 
its  derivation  are  included  therein. 


Finally,  in  countries  hav  ng  no  historical  data 
available,  since  Sf  cannot  be  computed  from  equa- 
tion (2)  in  this  case,  S?  is  replaced  hy  Saf  in  equation 
(!>• 

Distribution  of  sample  sixes  among  strata. — Alter 
determining  the  tout  number  n of  segments  to  be 
allocated  to  the  area  of  interest,  the  sample  size  a,  to 
allocate  to  the  7th  natural  stratum  is  computed  as 
follows.  (Again,  the  following  is  a result  of  the  Ney- 
man  allocation  procedure  (ref.  1).)  Let  tj  be  the 
weight  associated  with  the  Ah  natural  stratum,  where 

N.yk*  [Y.1  ♦ T.1) 

t,  - m V/  JJL  n (3) 
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and  where  the  quantities  to  be  input  into  the  right 
side  of  equation  (3)  are  as  defined  for  equation  (1). 
Let  stj  be  the  provisional  allotment  to  the  7th  natural 
stratum,  where  a)  equals  the  integer  part  of  tj . Also 
let 
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and 
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Assign  one  additional  sample  segment  to  each  of  dof 
the  natural  strata  with  probabilities  proportional  to 
<^;i.e.,let 

/ - *2 / (4) 

where  bj  — 1 if  the  7th  natural  stratum  receives  an  ex- 
tra sample  segment;  6,  - 0 otherwise.  The  Hartley- 
Rao  procedure  (ref.  1)  is  used  to  perform  this  allot- 
ment. 

Performing  the  previously  described  procedure 
results  in  the  determination  of  the  natural  stratum 
sample  sizes.  However,  wheat  area  and  production 
estimates  are  made  at  the  refined  stratum  level  and 
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require  determinations  of  sample  sites  for  each 
refined  stratum.  To  determine  the  sample  sites  for 
each  refined  stratum  within  a given  natural  stratum, 
a proportional  allocation  is  performed  as  follows. 
The  sample  site  for  each  refined  stratum  is  deter* 
mined  by  multiplying  the  natural  stratum  sample 
size  by  the  proportion  (relative  to  the  natural 
stratum)  of  historical  wheat,  from  the  epoch  year, 
present  in  each  given  refined  stratum.  The  fractional 
part  of  these  weights  accounts  for  one  or  more  seg- 
ments; consequently,  these  segments  are  assigned  to 
the  refined  strata  using  the  Hartley-Rao  proportional 
procedure  with  sample  sizes  proportional  to  the  frac- 
tional parts  of  the  weights  in  the  same  manner  as  in* 
dicated  in  the  previous  paragraph. 


Area  Intimation 

Area  estimation  is  performed  u follows. 

Desipiathm  of  Group  A and  B refined  strata*— The 
A h refined  stratum  is  designated  as  "Group  A"  or 
“Group  B"  depending  on  whether  it  is  allocated  at 
least  MjOt  less  than  M, segments,  respectively  (Mj 
defaults  to  the  value  or  3 if  no  overriding  value  is 
specified).  The  qualitative  definitions  are  as  follows: 
Group  A refined  strata  represent  marginal  to  high 
wheat-producing  areas  historically,  whereas  Group  B 
refined  strata  represent  areas  having  very  little  or  no 
wheat  historically.  There  is  one  primary  exception  to 
this  general  qualification:  since  LACIE  does  not  get 
coverage  on  every  segment  for  every  pass,  the  >lh 
refined  stratum  is  placed  in  the  Group  B category  if  it 
has  less  than  ^segments  after  such  losses.  If  MjOt 
more  segments  subsequently  become  available  for 
aggregation,  the  Ah  refined  stratum  is  reinstated  as 
Group  A. 

Guidelines  for  heating  segments.— The  location  of 
segments  within  a refined  stratum  is  performed  by 
simple  random  sampling  without  replacement  with* 
in  the  previously  designated  ag  area  of  the  stratum. 
All  the  segments  with  3 percent  or  more  agricultural 
area  within  the  refined  stratum  are  candidates  for 
selection.  After  the  selection  of  the  sample,  the 
selected  segments  are  located  on  a mosaic  and  the 
latitude  and  longitude  of  the  center  of  each  segment 
are  obtained.  Except  for  application  to  a different  set 
of  strata,  the  guidelines  for  locating  segments  are 
identical  to  those  applied  in  the  initial  sampling 
strategy;  consequently,  the  reader  should  refer  to  the 
paper  by  Liszcz  entitled  “LACIE  Area  Sampling 
Frame  and  Sample  Selection"  for  further  details. 


Apportionment  of  political  subdivision  data  to 
strata.— As  noted  in  the  section  entitled  "Sample 
Allocation,"  it  may  be  necessary  to  know  the  natural 
stratum  historical  wheat  area  gt  to  compute  r}  for  in- 
put into  equation  (2).  This  quantity  will  also  be 
needed  for  each  refined  stratum  in  the  aggregation. 
In  countries  having  historical  data  available  only  on 
one  level  smaller  than  the  country  (04.,  the  oblast 
level  in  the  U.&S.R.),  an  apportionment  of  the  politi- 
cal subdivision  wheat  proportional  to  the 
natural/refined  stratum  is  performed.  It  is  particu- 
larly important  to  note  that  historical  data  do  not  ex- 
ist at  the  refir.ad  stratum  level  in  any  country;  conse- 
quently, until  a historical  data  base  can  be  built  (i.e., 
with  the  passage  of  time),  the  apportioning  pro- 
cedure is  the  outlet  taken  to  estimate  the  historical 
wheat  area  to  associate  with  each  refined  stratum. 
The  underlying  assumption  made  in  this  approach  is 
that  the  wheat  in  a political  subdivision  is  uniformly 
distributed  over  the  agricultural  area  in  that  political 
subdivision.  Of  course,  this  is  not  always  the  case 
and  it  is  fully  recognized  that  apportioning  bar  its 
difficulty  and  represents  an  initial  attempt  at  resolv- 
ing the  missing-data  problem.  The  apportioning  is  ac- 
complished as  fi>:iows. 


the  apportioned  estimate  of  historical 
wheat  area  in  the  Ah  natural/refined 
stratum 

the  ag  area  in  the  collection  of  ag  seg- 
ments common  to  the  Ah  natural/ 
refined  stratum  and  the  Ah  political  sub- 
division 

the  wheat  area  in  the  Ah  political  sub- 
division based  on  historical  data 
the  tout  ag  area  in  the  collection  of  ag 
segments  of  the  Ah  political  subdivision 
Apportionment  of  area  estimates  to  yield  strata.— 
The  natural  sampling  strategy  requires  that  yield  and 
area  (APU)  strata  be  coincident.  The  situation  may 
arise,  however,  where  the  strata  do  not  coincide.  In 
such  cases,  an  apportionment  of  the  acreage  esti- 
mates to  the  yield  strata  is  needed  to  permit  the  esti- 
mate of  production  at  political  subdivision  (or  other) 
levels.  The  following  discussion  specifies  the  ap- 
proach taken  in  this  case. 


where  gj  m 
au  " 

*t- 
a,  - 


999 


Another  set  of  area  strata  is  generated  consisting 
of  the  collection  of  alt  the  area  “substrata”  that  result 
from  the  intersection  of  the  refined  strata  with  the 
yield  strata.  This  procedure  results  in  each  yield 
stratum  containing  one  or  more  area  substrata.  The 
L ACIE  wheat  area  estimate  for  each  area  substratum 
is  obtained  by  apportioning  the  LACIE  wheat  area 
estimate  of  the  refined  stratum  according  to  the  pro* 
portion  of  ag  area  in  each  substratum  relative  to  the 
refined  stratum  from  which  it  came. 

With  reference  to  figure  1,  if  the  indicated  yield 
strata  covered  the  area  of  interest  with 
being  the  collection  of  refined 
strata  over  this  same  area,  then,  in  view  of  the  lack  of 
coincidence  of  area  and  yield  strata,  a n;w  collection 
of  area  substrata  is  generated  consisting  of 

Ml  l^l2'A2l'A22’A3l'A)2'A4l  'AS\’AS2'A5}'A‘*)- 

If  is  used  to  denote  the  LACIE-estimated 
wheat  area  apportioned  to  Ay,  then 


a ..  if  the  /th  substratum  is  contained 

-fi-AjA,  within  the  /th  Group  A refined 
1 stratum 

(6) 

a if  the  /th  substratum  is  contained 

within  the  yth  Group  B refined 
/ stratum 


where  aJt  - the  total  ag  area  of  AJt  determined  from 
the  product  of  the  ag  proportion  of  AJt  (determined 
from  the  complete  enumeration  of  ag  in  the  gridded 
segments)  with  the  planimetered  area  of  Aj ,, 
aj  * the  total  ag  area  of  A}  (determined  t'rom  the 
complete  enumeration  of  ag  in  the  gridded  seg- 
ments), and  where  aJA  and  AjB  are  defined  in  equa- 
tions (7)  and  (8),  respectively. 

The  previously  described  procedure  identifies  the 
area  “substrata”  and  the  associated  wheat  area  esti- 


mates to  be  aggregated  within  each  yield  stratum  to 
compute  production.  Again,  it  should  be  emphasized 
that  this  procedure  is  used  in  estimating  production 
and  production  prediction  error  only  when  the  yield 
strata  are  not  coincident  with  the  natural  strata; 
however,  the  wheat  area  estimate  for  a given  area  of 
interest  will  be  the  same  regardless  of  whether  the 
area  aggregation  is  performed  from  the  refined  strata 
or  from  the  substrata. 

Group  A refined  stratum  area  estimate. — The  wheat 
area  estimate  for  theyth  Group  A refined  stratum  is 
calculated  as  follows. 


AM  = M £ AlkA 

IA  * = 1 


where  Aj4  * the  LACIE  wheat  area  estimate  for 
the/th  Group  A refined  stratum 
Nja  — the  total  number  of  ag  segments  in  the 
/th  Group  A ref  ned  stratum 
Mja  - the  total  number  of  sample  segments 
for  which  wheat  acreage  estimates  are 
made  in  the  /th  Group  A refined 
stratum 

AjkA  — the  LACIE  wheat  area  estimate  for 
the  Ath  sample  segment  in  the  /th 
Group  A refined  stratum 

Group  B refined  stratum  area  estimate. — The  wheat 
area  estimate  of  thv  /th  Group  B acreage  stratum  is 
calculated  as  follows 


m 

AIB  = H CkjAkA  + ( njB/Mi)AIB  (8) 

* = 1 


FIGURE  1.— A refinement  of  acreage  strata  to  the  yield  strata. 
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WJB  — the  most  recent  epoch  year 
(or  average  over  previous  2 or 
3 years)  harvested  wheat  area 
in  theyth  Group  B stratum 


*kj 


m 


]£  dk/AkA 
k-\ 


“1,  if  the  Krth  Group  A 
stratum  is  used  in  the 
estimate  of  the  yth 
Group  B stratum 
_0,  otherwise 

!the  sum  of  the  LACIE 
estimates  of  wheat  area 
in  all  Group  A strata 
that  are  used  to  estimate 
the  given  yth  Group  B 
stratum 

r the  sum  of  the  most  re- 
cent epoch  year  (or 
average  over  previous  2 
to  3 years)  harvested 
wheat  area  in  the  Group 
A strata  that  are  used  to 
estimate  the  yth  Group  B 
, stratum 


m -»  the  total  number  of  Group  A 
strata  in  the  area  of  interest 
plus  any  additional  Group  A 
strata  that  are  not  contained 
within  the  area  of  interest  but 
that  are  used  in  the  ratio 
estimation  of  the  Group  B 
strata  contained  in  the  area  of 
interest 

njB  — the  total  number  of  sample 
segments  for  which  acreage 
estimates  are  made  in  the  yth 
Group  B stratum 
and  where  AjB  * 0 if  tijB  ~ 0;  otherwise 


$B 


nj* 
n/B  h 


*tkB 


(9) 


In  the  Crop  Assessment  Subsystem  (CAS)  soft- 
ware, the  capability  exists  for  Mj  to  default  to  the 
value  of  3 if  no  overriding  value  is  specified. 

It  should  be  reemphasized  at  this  point  that  the 
most  refined  level  for  which  historical  wheat  acreage 
data  are  available  in  many  LACIE  foreign  areas  is 
one  level  below  the  country  level  (crop  region  or  eco- 
nomic region).  Consequently,  until  a historical  data 
base  is  generated  after  passage  of  time,  the  historical 
wheat  value  for  each  stratum  used  in  the  Group  B 
estimator  is  apportioned  to  the  stratum  as  indicated 
in  the  section  entitled  “Apportionment  of  Political 
Subdivision  Data  to  Strata." 

In  the  preceding  discussion,  the  Group  A strata  to 
be  used  as  a base  for  the  ratio-estimated  part  of  the 
wheat  area  estimate  of  a given  Group  B stratum  are 
selected  according  to  the  following  guidelines. 

1.  First  of  a*!,  the  capability  exists  in  the  CAS 
software  to  permit  interactive  selection  of  the  ap- 
propriate Group  A strata  as  a base  in  the  ratio-esti- 
mated part  of  a given  Group  B stratum.  This 
capability  is  particularly  advantageous  in  allowing 
the  crop  analysts  to  incorporate  real-time  informa- 
tion and  expertise  that  may  be  available  at  the  time 
of  aggregation  to  assist  in  selecting  Group  A strata  to 
use  as  a base  in  the  ratio  estimation.  For  example, 
knowledge  of  agricultural  practices  and/or  informa- 
tion concerning  (he  status  (such  as  the  presence  of  an 
episodic  event)  of  crops  in  different  localities  can  be 
very  beneficial  in  deciding  which  Group  A strata  to 
use  in  ratio-estimation  of  a given  Group  B stratum. 

2.  When  option  I is  not  used,  the  CAS  software 
defaults  to  the  use  of  all  Group  A strata  in  the  zone 
containing  the  given  Group  B stratum. 

Area  aggregation  to  the  zone,  region,  and  country 
levels. — The  wheat  area  estimate  A of  the  area  of  in- 
terest (whether  it  be  zone,  region,  or  country)  is 
given  by 


where  NJB  «■  the  total  number  of  ag  segments  in  the 
yth  Group  B refined  stratum 
AJkB  “ the  LACIE  wheat  area  estimate  for 
the  Arth  sample  segment  in  the  yth 
Group  B stratum 


where  m, 


1,  if  the  yth  Group  A refined  stratum  is 
contained  within  the  area  of  interest 
0,  otherwise 


and  m is  the  total  number  of  Group  A strata  con- 
tained in  the  area  of  interest  plus  any  additional 
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Group  A strata  not  contained  within  the  area  of  in- 
terest but  used  in  the  ratio  estimation  of  the  Group  B 
strata  contained  in  the  area  of  interest.  The  quantity 
b denotes  the  total  number  or  Group  B strata  in  the 
area  of  interest. 


tion  (g)  and  is  given  by 

m 

f.  = E CUVkA  * ,14) 


AREA  VARIANCE  ESTIMATION 


Group  A Refined  Stratum  Variance 

From  equation  (7),  it  is  straightforward  to  see  that 
the  variance  of  the  estimate  of  wheat  area  for  thejih 
Group  A reftned  stratum  is  given  by 


where  VJB  is  the  estimate  of  segment-to-segment 
variance  of  wheat  area  computed  for  the  refined 
stratum  in  the  manner  indicated  in  the  section  en- 
titled “Determination  of  Total  Sample  Size”  (i.e.,  by 
making  use  of  the  allocation  data  at  the  refined 
stratum  level)  if  njB  < l;  otherwise  (i.e.,  if  2 « 
nJB  < Mj)s  Misestimated  directly  (i.e.,  in  the  same 
manner  as  the  estimate  of  M <«*<”»• 


lA 


N 2 
= _Zd_ 


M, 


M 


*IA 


....  Variance  Aggregation  to  the  Zone, 

■ ' Region,  and  Country  Lovola 


After  substituting  the  expression  in  equation  (S) 
where  VjA  — the  variance  of  the  estimate  of  wheat  into  equation  (10)  and  simplifying,  it  is  straightfor- 

area  in  the  fih  Group  A refined  ward  to  see  that  A in  equation  (10)  is  expressible  as 

stratum 


V -['/("w  - ')]  £ (Vu-A.)’-  <l2> 

m b 

A = Z)  V/*  + SKa'W**/) 

/= t i=t 

and 

where 

b 

JW 

= "/ + £ cn 
1=1 

A/A  = (llMjA)  /C  AjkA  ^ 

k=J 

Consequently,  the  variance  VA  for  the  estimate  of 

wheat  area  for  the  area  of  interest  is  given  by 


The  finite  population  correction  factor,  1 — 
(Mj4/NjA),  is  omitted  from  equation  (11)  since  it  is 
almost  always  insignificantly  different  from  1. 
(When  this  is  not  so.  equation  (1 1)  is  a conservative 
estimate — i.e.,  an  estimate  on  the  upper  side). 


Group  B Refined  Stratum  Variance 

The  variance  of  the  estimate  of  wheat  area  for  the 
yth  Group  B stratum  is  directly  obtainable  from  equa- 


m b 

K,  ■ £ Vvia  * £ <«) 

/=1  1=1 

SPRING  AND  WINTER  WHEAT 

Area  and  Variance  Estimation 

In  a mixed  wheat  area,  separate  area  estimates  are 
made  for  the  winter  wheat  and  the  spring  wheat 
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using  the  aggregation  procedures  described  in  the 
sections  “Group  A Refined  Stratum  Area  Estimate" 
through  "Variance  Aggregation  to  the  Zone,  Region, 
and  Country  Levels"  with  inputs  (LACIE  estimates 
as  well  as  historical)  of  winter  wheat  for  a winter 
wheat  aggregation  and  those  of  spring  wheat  for  a 
spring  wheat  aggregation.  This  method  provides  the 
spring  wheat  and  winter  wheat  area  estimates  and 
their  respective  variance  estimates  at  the  stratum  and 
zone  levels.  The  aggregation  procedures  are  used  to 
obtain  separate  winter  wheat  and  spring  wheat  area 
estimates  and  the  corresponding  variance  estimates 
at  the  zone,  regional,  and  country  levels. 


Total  Wheat:  Area  and 
Variance  Estimation 

The  total  wheat  area  estimate  in  a mixed  wheat 
area  is  computed  by  aggregating  the  winter  wheat 
and  spring  wheat  area  estimates  for  the  area;  that  is, 
if  and  At  denote  the  winter  and  spring  wheat  area 
estimates,  respectively,  the  total  wheat  area  estimate 
Misgiven  by 


AjA  Yja  , if  the  /th  refined  stratum  is  a 
Group  A stratum 

A ib  ^ IB ' M refined  stratum  is  a 

Group  B stratum 


(18) 


In  equation  (18),  Y,A  (Yjg)  is  the  predicted  yield 
for  the  Ah  Group  A (Group  B)  refined  stratum  as 
given  by  the  yield  estimation  model.  Two  basic 
assumptions  are  made  to  obtain  the  LACIE  produc- 
tion variance  estimator. 

1.  Segment-level  wheat  area  estimates  are 
mutually  independent  and  unbiased. 

2.  Yield  estimates  are  unbiased  and  are  mutually 
independent  of  the  acreage  estimates. 

Under  these  assumptions,  the  variance  of  the  pro- 
duction estimator  of  the  Ah  Group  A refined  stratum 
is  given  by  o>  * where 

jA 


riA 


AIA 


Hy 


I A 


♦ o, 


r 2 l*A  2 * °A  2 °Y  2 (I9) 
rjA  AiA  AiA  >iA 


4 = + K (17) 

This  computation  is  done  at  the  zone  and  higher 
levels. 

The  variance  estimates  for  the  total  wheat  at  the 
zone  and  higher  levels  are  those  obtained  from  the 
total  wheat  aggregation  made  with  inputs  of  total 
wheat  by  CAS  for  the  segments  and  historical  data. 
The  procedure  is  the  same  as  described  in  the  section 
entitled  "Variance  Aggregation  to  the  Zone,  Region, 
and  Country  Levels.” 


and  where  &aA' ^aa  and  <r  Y2>  tiyjA  am  the 
respective  variance  ana  mean  for  the  acreage  and 
yield  estimators  for  the  Ah  Group  A refined  stratum. 
The  production  variance  estimator  is  the  one  result- 
ing from  the  replacement  of  the  parameters  in  the 
right  side  of  equation  (19)  by  their  estimates;  the 
sign  of  the  last  term  in  equation  (19)  is  changed  from 
positive  to  negative  to  obtain  an  unbiased  estimator. 
Consequently,  the  resulting  estimator  is  given  by 
ZjA2 , where 

V ■ * W Vu  (2°) 


PRODUCTION  ESTIMATION 


Production  and  Variance  Estimation 
at  the  Stratum  Laval 

The  estimate  of  wheat  production  for  the  Ah  area 
refined  stratum  is  given  by 


where  AjA  and  YjA  are  the  estimates  of  wheat  area 
and  yield,  respectively,  of  the  Ah  Group  A refined 
stratum.  VjA  is  an  estimate  of  the  wheat  area  variance 
for  the  Ah  Group  A refined  stratum,  and  T^2  is  the 
estimated  squared  prediction  error  of  yield  in  the  Ah 
Group  A refined  stratum.  The  variance  of  the  Ah 
Group  B refined  stratum  production  estimate  is 
similarly  obtained. 

In  case  the  yield  strata  are  not  coincident  with  the 
natural  strata,  the  estimate  of  production  and  its  as- 
sociated prediction  error  estimate  at  the  substratum 
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level  are  obtained  as  follows.  In  particular,  if  P L 
denotes  the  production  estimate  of  the  /tn 
substratum  (apportioned  from  the  .Ah  yield  stratum 
which  has  yield  Kr) , then 

% a KYr  <2D 

where  & is  defined  in  equation  (6).  Moreover, 
denoting  ute  production  variance  estimate  of  Pyr  by 


or 

//  r m b 

= £ E yr)T)ltliAAtA  * £ QiB^iB  K 

/*t 

(25) 

That  is, 

$ " £ V, 

r*l 


w 


A , 

+ TrW0l  - 


w 


(22)  wherc 


4 


* 


where  and  Yr  are  the  estimates  of  wheat  area  and 
yield,  respectively,  of  the  Ah  substratum  (contained 
in  the  Ah  refined  stratum),  and 


r it 


if  the  Ah  substratum  is  contained 
VjA  .within  the  Ah  Group  A refined 
stratum 

(23) 

if  the  Ah  substratum  is  contained 
ViB.  within  the  Ah  Group  B refined 
stratum 


or  Ar  is  given  by  this  same  expression  after  replacing 
y,f)j  by  y^/tjA  and  /3„  by  p In  the  situation 
where  the  natural  strata  and  yield  strata  coincide, 
equation  (24)  applies,  where  Yr  is  the  LAClE-pre- 
dicted  yield  in  the  rth  yield  stratum,  H is  the  total 
number  of  yield  strata  in  the  area  of  interest 


The  quantities  a ^ and  a;  are  as  defined  in  the  section 
entitled  “Apportionment  of  Area  Estimates  to  Yield 
Strata." 


1,  if  the  Ah  Group  A stratum  lies  within  (or 
coincides  with)  the  rth  yield  stratum 
0,  otherwise 


Production  Estimate  at  a Zone, 
Region,  or  Country  Level 


1,  if  the  Ah  Group  B stratum  lies  within  (or 
coincides  with)  the  rth  yield  stratum 
0.  otherwise 


The  wheat  production  estimate  PA  for  the  area  of 
interest  (whether  it  be  a zone,  a region,  or  a country) 
is  given  by  equation  (24)  or  equation  (25),  respec- 
tively, depending  on  whether  or  not  the  yield  strata 
coincide  with  the  natural  strata. 


and  Mi  is  a variable  that  defaults  to  the  value  of  3 if 
no  overriding  value  is  specified. 

In  the  situation  where  the  natural  strata  and  yield 
strata  do  not  coincide,  equation  (25)  applies  with  Yr 
and  H as  defined  previously;  however, 


* 


t 


H 

m 

1 

b 

+ £ MW 

*'  1241  y„  - 1 

T-  1 

1=1 

J 

' 0 

rih  yield  stratum 
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Also 


1 , if  the  Ah  Group  B stratum  intersects  the  rth 
yield  stratum 
0,  otherwise 


where 


a/r  = m + E ^fC/< 

i«l 


*IA 


(27)  ^r2  ™ *he  estimated  squared  prediction  error  of 

yield  in  the  rth  yield  stratum 


and 


and 


(28) 


rA 


t v,  * b 

'■>  L '■> 


In  equations  (27)  and  (28),  the  small  a’s  are  as 
given  in  the  section  entitled  “Apportionment  of 
Area  Estimates  to  Yield  Strata”;  ntj  is  the  total  num- 
ber of  substrata  that  are  apportioned  out  of  the  yth 
refined  Group  A stratum  and  that  lie  within  the  nh 
yield  stratum.  Similarly,  b , is  the  total  number  of 
substrata  that  are  apportioned  out  of  the  Ah  refined 
Group  B stratum  and  that  lie  within  the  rth  yield 
substratum. 


* t (Wi)*  Kb  <“> 

1=1 


The  preceding  equations  apply,  of  course,  to  the 
situation  in  which  the  natural  strata  and  yield  strata 
- are  coincident,  if  this  should  not  be  the  case,  the 
same  equations  would  still  apply  provided  y„ij7  and 
/3„  are  replaced  with  yrjnfijA  and  respectively, 
throughout. 


Production  Prediction  Error  for  a 
Zone,  Region,  or  Country  Level 

The  estimate  of  the  squared  prediction  error  Sf?  of 
the  production  estimate  for  the  area  of  interest 
(whether  it  be  zone,  region,  or  country  level)  is  given 
by 

v * b WV'A  * a?t?  ■■  v'atA 

r = l L J 

+ 2 £ £ Y,Y,'  COVUrAA  (29) 
r = 2 r- 1 


and 


SPRING  AND  WINTER  WHEAT  PRODUCTION 


Production  Eetimation  In  Mixed  Wheat  Areae 

In  a mixed  wheat  area,  separate  production  and 
predicted  production  error  estimates  are  made  for 
the  winter  wheat  and  spring  wheat  using  the  pro- 
cedures described  in  the  sections  entitled  “Produc- 
tion and  Variance  Estimation  at  Stratum  Level," 
“Production  Estimate  at  a Zone,  Region,  or  Country 
Level,”  and  "Production  Prediction  Error  for  a Zone, 
Region,  or  Country  Level."  The  total  wheat  produc- 
tion estimate  in  a mixed  wheat  area  is  computed  by 
aggregating  the  winter  production  and  the  spring  pro- 
duction; that  is,  if  Pw  and  r,  denote  the  winter  and 
spring  wheat  production  estimate,  respectively,  the 
total  wheat  production  estimate,  P,,  is  given  by 


coy(A,.A,j  ~Ewvja 
/ 


(31) 
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Production  Error  Estimates 
In  Mixed  Wheat  Area* 

The  estimate  of  the  production  prediction  error  at 
the  level  of  interest  (zone,  region,  or  country)  in  a 
mixed  wheat  area  is  given  by  equations  (29)  and  (30) 
with  the  following  modifications;  if  the  yth  Group  A 
refined  stratum  contains  mixed  wheat  and  is  sup- 
plied with  both  a spring  Y * and  winter  Yrw  yield  esti- 
mate, then  Yr  and  T2  in  equations  (29)  and  (30)  are 
replaced  by 


The  yield  estimate  for  the  area  Cof  interest  is  ob- 
tained by  the  average  yield  7C  — fy/,4^  where  Pcis 
the  production  estimate  and  Ac  is  the  area  estimate 
for  the  area  of  interest.  An  estimate  of  the  squared 
prediction  error  of  this  average  yield  estimate  is  ob- 
tained by  resorting  to  the  assumptions  stated  in  the 
section  entitled  “Production  and  Variance  Estima- 
tion at  Stratum  Level"  and  applying  the  standard  for- 
mula for  the  variance  of  the  ratio  of  two  random 
variables.  This  estimate  is  given  by 


A Yw  + A Y* 
Awr‘  r Asr‘  r 

A + A 
*wr  *sr 


(32) 


H 

2EYlVl 

PCAC 


(34) 


and 


(KrK  + AsX 
\ Awr  + Atr 


(33) 


respectively,  where 

Awr  = the  epoch  year  harvested  winter  wheat  area 
in  the  /th  yield  stratum 

Asr  — the  epoch  year  harvested  spring  wheat  area 
in  the  /th  yield  stratum 

Trw  — the  root  mean  square  of  the  prediction  error 
of  the  winter  wheat  yield  estimate  for  the  /th 
yield  stratum 

Tf  ” the  root  mean  square  of  the  prediction  error 
of  the  spring  wheat  yield  estimate  for  the  /th 
yield  stratum 


AREA  OF  INTEREST  YIELD  AND 
PREDICTION  ERROR  ESTIMATION 

When  there  is  a single  yield  model  covering  the 
area  of  interest,  the  squared  prediction  error  of  the 
yield  model  provides  the  variance  needed  for  evalua- 
tion; i.e.,  for  computation  of  standard  statistics.  In 
case  of  more  than  one  yield  model  in  the  area  of  in- 
terest (whether  it  be  zone,  region,  or  country),  the 
average  yield  estimate  and  the  associated  prediction 
error  estimate  is  computed  using  the  following 
formulas. 


where  Sc2  ™ estimated  squared  prediction  error  of 
P0  the  production  estimate 
Vc  - estimated  prediction  error  of  A0  the 
area  estimate 

Yj  — yield  estimate  for  the  /th  yield 
stratum 

Vt  — estimated  prediction  error  of  the  area 
estimate  in  the  /th  yield  stratum 
(computed  from  use  of  eq.  (16), 
where  the  area  of  interest  referred  to 
by  that  estimate  is  the  /th  yield 
stratum  in  this  case) 


SUMMARY  AND  CONCLUSIONS 

The  results  from  the  testing  conducted  in  Phase 
III  would  have  to  be  labeled  as  “encouraging,"  partic- 
ularly in  regard  to  their  supporting  the  following  ob- 
jectives for  the  use  of  a natural  sampling  strategy. 

1.  Increase  sampling  efficiency:  improve  strati- 
fication by  making  use  of  information  contained  in 
Landsat  and  agromet  data. 

2.  Reduce  bias  caused  by  high  incidence  of  cloud 
cover  over  large  regions. 

3.  Permit  better  estimates  of  precision  (more 
sample  segments  per  stratum). 

4.  Provide  a common  approach  for  all  countries. 

5.  Permit  better  applicability  of  the  yardstick 
region  as  a quantifier  of  the  foreign  sampling 
strategy. 

In  particular,  some  of  the  more  important  results 
that  should  be  highlighted  include  the  following. 
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1.  Aggregation  results  over  the  test  areas  indi- 
cated that  similar  precision  results  are  obtainable 
(relative  to  the  initial  strategy)  with  20  to  30  percent 
fewer  segments. 

2.  In  comparison  with  crop-reporting-district 
competing  strata,  the  gain  in  efficiency  from  the  use 
of  the  refined  strata  over  six  states  in  the  U.S.  Great 
Plains  was 

a.  Uniformly  better  in  regard  to  wheat  density 

b.  Better  for  five  of  six  stales  in  regard  to 
agricultural  density 

c.  Better  for  only  three  of  six  states  in  regard  to 

yield 

See  reference  2 for  further  details. 

The  predominant  key  issues  resulting  from  the 
use  of  the  natural  sampling  strategy  were  as  follows. 

1.  Some  strata  are  not  yet  sufficiently  homo- 
geneous to  be  considered  beneficial— -the  stratifica- 
tion procedures  need  further  fine  tuning. 

2.  Probably  the  biggest  difficulty  was  that  of  at- 
tempting to  estimate  those  areas  where  little  to  no 
satellite  coverage  (i.e.,  the  nonresponse  areas)  was 
available;  adequate  historical  data  to  support  making 


such  estimates  was  simply  unavailable.  (The  appor- 
tioning procedure  described  earlier  in  this  paper  was 
the  approach  taken  to  make  historical  data  available 
at  the  appropriate  levels  needed;  however,  it  was 
realized  from  the  outset  that  considerable  improve- 
ment would  be  needed.) 
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Appendix  A 

Determination  of  First-  and  Second-Generation 

Segment  Mixture 


MOTIVATION 

Because  of  cost  and  time  constraints,  it  may  not  be 
possible  to  order  or,  even  if  ordered,  to  process  alt 
second-generation  sample  segments.  To  obtain  an 
estimate  of  wheat  production  with  a precision  com- 
parable to  the  precision  specified  in  the  sampling 
plan,  it  is  necessary  to  process  a certain  minimum 
number  of  sample  segments.  One  way  to  fulfill  the 
sample  size  requirement  is  to  supplement  the  list  of 
available  second-generation  segments  with  the 
available  first-generation  segments.  Unless  the  ran- 
domness of  the  distribution  of  sample  segments  in 
each  refined  stratum  (second-generation  strategy 
strata)  is  preserved,  any  statistical  statement  con- 
cerning the  sample  estimates  made  according  to  the 
second-generation  strategy  will  be  invalid.  A scheme 
has  been  devised  for  the  selection  of  supplementary 
first-generation  segments  preserving  the  randomness 
of  distribution  in  each  refined  stratum. 


METHOD  FOR  8ELECTINQ 
SUPPLEMENTARY  FIRST -GENERATION 
SEGMENTS 

For  a detailed  mathematical  discussion,  see  the 
section  in  this  appendix  entitled  "Method  for  Using 
First-Generation  Sample  Segments  in  the  Second- 
Generation  Sampling  Scheme." 

Segments  chosen  under  the  first-generation 
strategy  and  segments  chosen  under  the  second- 
generation  strategy  (second-generation  segments) 
are  available,  labeled  with  both  county  name  and 
stratum  number.  The  following  procedure  is  used  for 
selecting  supplementary  first-generation  segments. 

1.  Count  the  number  of  second-generation  seg- 
ments in  each  unit  (county  intersection  refined 
strata). 

2.  Count  the  number  of  first-generation  segments 
in  each  unit. 
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3.  For  each  unit,  perform  the  following  operation. 

a.  If  the  number  of  first-generation  segments  is 
greater  than  or  equal  to  the  number  of  second* 
generation  segments,  randomly  replace  each  second- 
generation  segment  with  a first-generation  segment. 

b.  Otherwise,  randomly  replace  second-genera- 
tion segments  by  first-generation  segments  until  all 
first-generatioR  segments  in  the  unit  have  been  used. 
Some  second-generation  segments  will  remain. 

4.  Note  that  step  3b  differs  from  the  theoretical 
proposal.  Theoretically,  all  first-generation  segments 
should  be  used,  and  the  remaining  number  necessary 
for  the  unit  should  be  selected  at  random  from  all 
segments  possible  within  the  NASA  Goddard  Space 
Flight  Center  (GSFC)  constraints.  However,  the 
second-generation  segments  have  already  been 
chosen  and  must  be  used. 

5.  Check  the  spacing  between  segments.  When 
the  first-generation  and  second-generation  segments 
were  chosen,  the  segment  density  in  each  case  was 
constrained  by  GSFC.  The  same  constraint  should  be 
presented  in  the  composite  allocation. 


RESULTS 

The  method  was  followed  for  the  segments  cur- 
rently available  (first-generation  segments  and  sec- 
ond-generation segments).  Segments  chosen  are 
listed  in  the  section  in  this  appendix  entitled  “Sam- 
ple Segments."  When  these  segments  were  checked 
for  closeness,  it  was  found  that  the  plan  cannot  be 
executed  if  the  GSFC  constraints  are  to  be  preserved. 


METHOD  FOR  USING  FIRST-GENERATION 
SAMPLE  SEGMENTS  IN  THE  SECOND- 
GENERATION  SAMPLING  SCHEME 

The  method  for  using  first-generation  segments  in 
the  second-generation  scheme  was  developed  by 
A.  H.  Fciveson  of  the  NASA  Johnson  Space  Center. 
The  following  definitions  apply  to  this  procedure. 

S - new  stratum 

|0X}A_,  “ collection  of  first-generation  strata 
which  intersect  S 

Nk  — total  number  of  segments  in 
nK  — number  of  selected  segments  in  0A 
under  the  first-generation  strategy 


Mk  - total  number  of  segments  in  S 
m “ number  of  segments  to  be  selected  in 
5 under  the  second-generation 
strategy 

M - total  number  of  segments  in  S 
mK  — number  of  segments  to  be  selected  in 

«K 

The  steps  in  the  procedure  are  as  follows. 

1.  Generate 

a.  Let  7q  “ 0,  7j  *“  A/|,  A/j, . . . , 7^ 

"•  + Afj  + . . . + A#^. 

b.  Define  JK  - ( 7^.,  + 1,  7^_,  + 2 

tk-  i + fyl- 

c.  Choose  a random  subset  of  m from  the  in- 
tegers 1,  2, ....  M.  Let  / be  that  random  subset. 

d.  Let  mK  » cardinality  of  / nJK.  Note  that  mK 
has  a hypergeometric  distribution.  Hence, 


={p  »'k=  n 


and 

mMK 

= -i r (A2) 

2.  Let  /A-  — number  of  first-generation  selected 
segments  in  0A  nS. 

a.  If  lK  ss  choose  mA  segments  at  random 
among  the  lK  originally  selected  ones. 

b.  If  lK  < , choose  all  of  the  /A  originally 

selected  segments  plus  mK  — /A  additional  ones  ran- 
domly selected  from  the  remaining  MK  — lK  in  0A 
nS. 

3.  Prove  that  this  procedure  selects  m segments 
out  of  M with  equal  probability. 

a.  Let#A,  “ Pr  [lK  -/),/-  0. 1 Mk. 

S — second-generation  sample. 

- first-generation  sample. 
sK  = any  potential  segment  in  0K  n s. 


1008 


j)  only  de- 


fa.  Then 
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P('x  ( s)m  E hi  '1‘a  * V'*  3 '•  - /}  (A3) 
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'(‘a  « V'a  ■ <)  - E “A/h  « V'a  ■ '•  '"A  * /}  (A4) 
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c.  Thus,  P(Sk  t S/Ik  ■ I,  Mk  “ 
pends  on  7,  not  /.  Hence, 
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<-0 
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8AMPLE  SEGMENTS 

The  sample  segments  in  Kansas  selected  prior  to 
the  GSFC  constraint  are  the  following. 


P(SK  e V"A  = >■  'A  = ')  = '/^A  (A7) 
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^(*A  6 V"'A  = '•  'a  = »)  = ("a  'VA/A  (A9) 
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)/Mk  +0.  / > / 

*//«*•  i “O.i (A  10) 


Segment 

number 

County 

Strati 

m 

Cowley 

2C 

1170 

Harper 

2C 

1172 

Kiowa 

2C 

800 

Kingman 

2C 

1174 

Pratt 

2C 

1892 

Reno 

2C 

1175 

Sedgwick 

2C 

1892 

Stafford 

1C 

1176 

Sumner 

2C 

1022 

Clark 

4B 

1168 

Barber 

4B 

812 

Kiowa 

4B 

102$ 

Ford 

SA 

1292 

Hodgeman 

SA 

822 

Pawnee 

SA 

825 

Pawnee 

5A 

822 

Trego 

SA 

821 

Finney 

SB 

1857 

Grant 

SB 

1025 

Greeley 

SB 

818 

Greeley 

SB 

819 

Greeley 

SB 

1859 

Hamilton 

SB 

1861 

Kearny 

SB 

1009 


Segment 

number 

County 

Stratum 

Segment 

number 

County 

Stratum 

1284 

Lane 

5B 

1347 

Morris 

7B 

1884 

Sunton 

5B 

1876 

Ottawa 

7B 

833 

Bourbon 

6 

852 

Pottawatomie 

7B 

829 

Bourbon 

6 

843 

Republic 

7B 

1180 

Cherokee 

6 

1888 

Saline 

7B 

134S 

Franklin 

6 

1348 

Wabaunsee 

7B 

839 

Franklin 

6 

1158 

Washington 

7B 

837 

Lion 

6 

1016 

Cheyenne 

8 

830 

Miami 

6 

1017 

Docetur 

8 

834 

Miami 

6 

1880 

Ellis 

8 

I3S3 

Montgomery 

6 

1024 

Gove 

8 

828 

Montgomery 

6 

1153 

Jewell 

8 

836 

Outse 

6 

1027 

Logan 

8 

840 

Osage 

6 

1019 

Norton 

8 

1184 

Wilson 

6 

1155 

Phillips 

8 

832 

Wilson 

6 

1020 

Rawlins 

8 

842 

Lincoln 

IK 

1281 

RawUni 

8 

1134 

Mitchell 

7A 

1877 

Rooki 

8 

1349 

Butler 

7B 

864 

Rook* 

8 

847 

Chase 

7B 

1887 

Russell 

8 

857 

Chase 

7B 

1022 

Sheridan 

8 

846 

Clay 

7B 

863 

Sheridan 

8 

1131 

Clay 

7B 

1021 

Sherman 

8 

1879 

Dickinson 

7B 

1282 

Sherman 

8 

1297 

Dickinson 

7B 

1157 

Smith 

8 

1881 

Ellsworth 

7B 

1296 

Smith 

8 

853 

Marehail 

7B 

1023 

Thomas 

6 

1884 

McPherson 

7B 

1031 

Wallace 

9 

Appendix  B 
Stratification 


REQUIREMENTS 

The  L ACIE  Partitioning  Group  developed  univer- 
sal strata  for  use  by  the  Sampling  Strategy  Team  for 
the  Phase  III  implementation  of  the  natural  campling 
strategy.  These  strata  were  created  with  the  following 
guidelines. 

1.  The  basic  partitions  are  to  be  delineated  on  the 
basis  of  agricultural  density  as  determined  from  full- 
frame  color-infrared  (CIR)  imagery. 

2 . These  partitions  are  to  be  refined  with  data  on 
soils  rated  on  yield  potential  and  with  climatic  data. 

3.  Climatic  data  considered  are  to  include  pre- 
cipitation and  temperature. 

4.  If  the  variation  in  climate  over  an  agricultural 
density  stratum  exceeds  the  allowable  threshold,  this 
stratum  is  to  be  divided  in  order  to  lower  the  varia- 
tion over  each  portion. 


5.  Whereas  the  climatic  data  are  to  be  used  to 
determine  whether  an  agricultural  density  stratum 
must  be  divided,  the  soils  data  are  to  be  used  to  deter- 
mine where  the  division  occurs. 

6.  The  resulting  partitions  are  to  be  rechecked 
against  full-frame  imagery  to  adjust  and  smooth 
stratum  boundaries. 

7.  Political  boundaries  that  are  artificial  in  an 
agricultural  sense  are  to  be  ignored  in  delineating  the 
strata  within  a large  region.  The  final  partitioning 
product  for  such  a region  is  to  consist  of  a group  of 
contiguous  strata  that  includes  the  region, 

STEPS  FOR  IMPLEMENTATION 

The  Stratification  Team  performed  the  partition- 
ing for  sampling  strategy  based  on  guidelines  I to  7. 
Figure  B-l  is  a flow  chart  showing  the  process. 
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FIGURE  B-t. —Stratification  for  Mmpllng  utralrgy. 


Agrlcultural/Nonagrteultural  Delineation— 

Pull*  Pram*  CIR  Transparencies 

The  9*  by  9-inch  full-frame  CIR  transparencies 
used  to  produce  the  agricultural/nonagricuitural 
(ag/non-ag)  overlay  are  to  be  acquired  during  the  op- 
timum growth  phase  of  the  majority  of  crops  in  the 
area.  Ordinarily,  this  period  will  fall  during  biological 
window  2 or  biological  window  3.  A record  is  to  be 
kept  of  those  transparencies  containing  significant 
defects  such  as  cloud  cover  and  a list  of  those  that 
should  be  reacquired  because  of  unacceptable 
quality. 


Ag/Non-  Ag  Overlay 

The  following  are  guidelines  for  the  delineation  of 
agricultural  and  nonagricultural  areas. 

1.  All  ag/non-ag  delineations  are  to  be  based 
solely  on  the  use  of  9-  by  9-inch  full-frame  CIR 
transparencies. 

2.  No  ancillary  data,  such  as  agricultural  statistics, 
climatic  information,  or  soil  maps,  are  to  be  used. 

3.  An  area  will  be  designated  as  agricultural  if  it 
contains  recognizable  field  patterns. 

4.  Conversely,  areas  without  recognizable  field 
patterns  will  be  designated  as  nonagricultural. 

$.  A record  will  be  kept  of  the  acquisition  dates 
for  all  9-  by  9-inch  CIR  transparencies  that  are  used. 

6.  Each  state  or  oblast  is  to  be  worked  by  only  one 
person 

7.  A log  of  the  criteria  to  complete  the  overlay  for 
each  state  will  be  kept. 


8.  The  person  working  on  the  overlay  for  a partic- 
ular stale  will  produce  a short  narrative  summary  in- 
cluding the  dates  of  the  9-  by  9-inch  CIR  transparen- 
cies that  were  used  and  the  percentage  of  the  state  in- 
complete because  of  clouds. 

9.  There  should  be  an  adequate  number  of 
reference  points  (latitude  and  longitude)  on  each 
overlay  so  that  it  can  be  easily  registered  to  any  base 
map.  The  steps  to  be  taken  in  delineating  ag/non-ag 
areas  are  as  follows. 

a.  An  overlay  of  1:1 000000 scale, covering  the 
state  to  be  worked,  will  be  registered  to  the  corre- 
sponding Operational  Navigation  Chart  (ONC). 

b.  All  9-  by  9-inch  full-frame  CIR  transparen- 
cies covering  any  part  of  the  state  will  be  assembled 
and  examined  for  evidence  of  field  patterns. 

c.  All  contiguous  nonagricultural  areas  greater 
than  or  equal  tn  4 square  miles  in  size  or  with  a very 
sparse  sprinkling  of  agriculture  will  be  delineated  on 
the  overlay. 

d.  Records  for  transparencies  used  should  be 
kept  for  use  in  stratum  description  and  evaluation. 
These  records  would  indicate  specific  problems  in 
delineation  of  the  transparency  such  as  apparent 
differences  in  intensity  of  agriculture  patterns  and 
confusion  sources. 

Designated  Sampling  Frame  Overlay 

The  designated  sampling  frame  (DSF)  is  the 
region  within  a country  used  for  the  process  of  parti- 
tioning for  sampling  strategy.  The  following  steps  de- 
scribe this  process. 

1.  The  contiguous  nonagricultural  areas  deline- 
ated on  the  ag/non-ag  overlay  are  measured  for  area. 

2.  All  nonagricultural  areas  having  size  greater 
than  or  equal  to  30  square  miles  will  be  eliminated 
from  further  consideration.  The  area  on  the  overlay 
less  the  area  estimated  as  nonagricultural  comprises 
the  DSF.  The  DSF.  therefore,  contains  non* 
agricultural  areas  ranging  in  size  between  4 and  30 
square  miles  in  a segment. 

3.  On  the  overlay  showing  the  DSF,  areas  having 
approximate  agricultural  densities  falling  in  the 
following  ranges  will  be  delineated  and  labeled  as  to 
density  category. 


Omuls  tans*  LaM 

0 to  5 percent  ag  Low 

5 to  40  percent  a*  Moderate 

40  to  SO  percent  ag  High 

80  to  100  percent  ag  Very  high 
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4.  Etch  density  category  ii  further  refined  by  out- 
lining  high-variance  areas  (large  dumps  of  non-ag  in 
a predominantly  ag  area  or  large  dumps  of  ag  in  a 
predominantly  non-ag  area)  and  low-variance  areas 
(small  dumps  of  non-ag  in  a predominantly  ag  area 
or  small  dumps  of  ag  in  a predominantly  non-ag 
area).  The  DSP  overlay  is  thus  divided  into  low-  and 
high-variance  areas  of  each  density  category.  These 
agriculturally  homogeneous  areas  form  the  basis  for 
further  partitioning. 

5.  A record  of  the  density  category  and  variance 
type  should  be  kept  for  each  of  these  areas  for  later 
use  in  providing  stratum  descriptors. 


toils  DeHneetion 

The  A.  R.  Aandahi  map  “Soils  of  the  Orest 
Plains"  will  be  used  to  provide  soils  data  for  parti- 
tioning the  U.S.  Great  Plains.  For  the  area  of  the 
spring  wheat  indicator  region  in  the  U.S.S.R.  to  be 
partitioned,  the  U.S.  Department  of  Agriculture  soil 
survey  map  “World  Soils  Map"  will  be  used. 


toll  Suitability  Overlay 

The  purpose  of  the  soil  suiubility  overlay  is  to 
stratify  on  the  basis  of  soil  yield  potential  for  wheat 
as  follows. 

1.  The  soil  mapping  units  will  be  rated  on  their 
suitability  to  grow  wheat.  This  will  be  accomplished 
by  determining  a rating  in  the  following  six  catego- 
ries of  soil  characteristics. 

Category  1— Texture 
Category  2— Depth 
Category  3 — Drainage 
Category  4-— Salinity 
Category  5— Slope 
Category  6— Moisture  and 
Temperature 

The  numerical  rating  in  each  category  will  range 
from  a value  of  “1"  for  best  to  a value  of  “4“  for 
worst.  The  range  of  values  for  each  rating  are  pro- 
vided in  table  B-l.  The  overall  rating  for  a particular 
soil  mapping  unit  will  then  be  determined  in  the 
following  way.  For  each  /,  I < / < 6,  let  C,  denote  the 
rating  value  for  the  Ah  category.  The  overall  rating 
value  V is  then  computed  by  the  formula  V — max 
C,:  I — 1 6.  In  this  way.  the  overall  suiubility 


rating  for  a soil  Is  determined  by  iu  worst  rating  in  an 
individual  category. 

2.  Adjacent  soil  mapping  units  that  have  the  tame 
overall  rating  will  be  combined. 

3.  The  resulting  1:1 000000-scale  overlay  will 
conuin  lines  delineating  various  soil  groups,  where 
each  of  these  groups  consisu  of  contiguous  soil  map- 
ping units  ths:  i.xve  the  same  rating. 

4.  The  soil  characteristics  of  each  soil  mapping 
unit  should  be  recorded  for  subsequent  use  in  pro- 
ducing stratum  descriptors. 


TABLE  B-l. — Categories  of  Important  Soil 
Characteristics 


Catnory 

ProptrUtp—bttt  to  poorttl 

1— Texture 

1. 

SUt,  loem.  light  silty  clay  loam 

2. 

Sandy  loam,  heavy  silty  clay  loam 

J 

Sand,  loamy  tand,  clay,  other 

2— Depth 

1 

Deep.  90  cm 

2 

Moderately  deep.  SO  to  90  cm 

3. 

Shallow.  25  to  SO  cm 

4. 

Very  thallow,  25  cm 

3— Drainage 

1. 

Well  and  moderately  drained 

2. 

Somewhat  poor  or  tomewhai  excessive 

3 

Poor  of  between  somewhat  excessive  and 
excessive 

4. 

Very  poor  or  exettaive 

4— Salinity 

1 

None.  2 mmhoi/cm 

2. 

Slight , 2 to  4 mmhoa/cm 

3. 

Moderate.  4 to  S mmhoi/cm 

4. 

Severe.  8 mmhot/cm 

5— ®ope 

1. 

Level,  gently  doping.  0 to  S percent 
(tymbol  L.U  J.F.T.DT) 

2. 

Gently  rolling.  $ to  lOpercen:  < it  .DR- 
lower  range*) 

3. 

RoHing.  10  to  IS  percent  (R.DR — upper 

range*) 

4. 

Hilly,  deep.  1 $ percent 
(H.C  .RD.HD.SU  JS.RM) 

6— Moitture  end 

1. 

Udic  molUxot*  (most  tubhumid) 

tcmpcreture 

2. 

Typic  motiUot*  (dry  tubhumid) 

3 

Aridic  mollMol*  (temiarid) 

4. 

Ariditolt  (arid) 
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Climate 

The  climatic  factors  considered  in  partitioning  for 
sampling  strategy  are  temperature  and  precipitation. 
The  climatic  data  used  will  be  based  on  monthly 
averages  of  temperature  and  precipitation  obtained 
from  the  World  Meteorological  Organization 
(WMO)  stations  in  the  areas  to  be  partitioned. 


Climate  Suitability  Ovarlay 

The  steps  to  be  taken  in  producing  the  climate 
overlay  are  as  follows. 

1.  A 1:1  000000-scale  overlay  of  the  area  to  be 
partitioned  will  be  registered  to  the  corresponding 
ONC. 

2.  For  each  WMO  meteorological  station  in  the 
area,  the  annual  temperature  and  precipitation 
averages  will  be  computed. 

3.  Using  the  annual  temperature  averages  from 
the  meteorological  stations,  temperature  isopleths 
will  be  interpolated  for  the  area  to  reflect  changes  in 
mean  temperature  of  3*  C. 


4.  On  the  same  overlay,  the  annual  precipitation 
averages  will  be  used  to  produce  isopleths  of  pre- 
cipitation for  the  area  that  reflects  changes  in  mean 
precipitation  of  10  centimeters. 

5.  The  major  intent  of  using  the  climate  overlay  is 
to  subdivide  large  areas  of  homogeneous  soils  and 
uniform  agricultural  densities. 

6.  Data  used  to  prepare  the  overlay  (step  1) 
should  be  kept  for  use  in  stratum  description  and 
evaluation. 


Ag/Non-ag-toile  Overlay 

The  ag/non-ag  overlay  will  be  checked  against  the 
soil  suitability  overlay  fot  discrepancies  and  neces- 
sary changes  made.  This  will  be  accomplished  by  the 
following  steps. 

1.  A correlation  between  poor  soils  rating  and 
nonagricultural  areas  should  be  evident.  If  a non- 
agricultural  area  has  been  rated  “I"  or  ’*2"  on  soil 
suitability,  these  areas  should  be  rechecked  against 
the  appropriate  CIR  full-frame  transparencies  to  en- 
sure that  correct  ag/non-ag  classifications  were  made. 
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Multiyear  Estimates  for  the  LACIE  Sampling  Plans 

H.  0.  Hartley0 


INTRODUCTION 

This  document  presents  an  approach  that  may  be 
useful  in  improving  the  estimates  of  the  wheat 
acreages  for  the  LACIE  countries  for  each  year  by 
using  the  short-time  series  of  estimates  made  in  the 
sequence  of  consecutive  years.  Although  it  may  be 
premature  to  develop  this  concept  since  the  series  of 
estimates  is  just  being  started,  it  is  of  some  merit  to 
review  this  possibility  as  it  wilt  affect  future  plan- 
ning. 

It  is  obvious  that  there  will  be  two  types  of  charac- 
teristics of  such  a survey  design;  namely, 

1.  Characteristics  that  apply  to  each  year's  survey 
(such  as  the  size  of  the  sample  segments,  their 
stratification,  and  the  sampling  procedures  with 
which  they  are  drawn) 

2.  Characteristics  that  affect  the  design  and 
analysis  of  the  survey  data  arising  in  a series  of  years 

With  regard  to  the  characteristics  under  item  I,  in 
order  to  fix  the  ideas,  it  is  assumed  that  the  design  is 
essentially  as  it  is  implemented  at  present.  This 
assumption  does  not  mean  that  this  design  is  con- 
sidered the  optimum  choice  for  the  multiyear  esti- 
mates. 


THE  USE  OF  THE  BUREAU  OF  CENSUS 
CURRENT  POPULATION  SURVEY 
ROTATING  DESIGN 

The  Current  Population  Survey  (CPS)  of  the 
Bureau  of  Census  is  concerned  with  sample  seg- 
ments of  households  and,  in  the  CPS  design,  these 
are  arranged  in  “rotation  groups.”  The  segments  in 
the  same  rotation  group  are  surveyed  in  4 consecu- 
tive months  of  the  first  year,  then  omitted  from  the 
survey  in  the  next  8 months,  and  then  again  sur- 
veyed in  4 consivutive  months  of  the  next  year.  The 
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estimator  of  a characteristic  y,  the  so-called  com- 
posite estimator,  is  a weighted  average  of  the  follow- 
ing two  estimator  components. 

1 . The  first  component  simply  consists  of  the  best 
estimator  for  the  current  month  employing  all  the 
data  collected  in  the  sample  segments  for  that 
month. 

2.  The  second  component  consists  of  an  estimate 
of  8y,  (that  is,  of  the  change  in  y from  month 
r — 1 to  month  ()  based  only  on  the  matched  seg- 
ments (i.e.,  the  segments  that  are  in  the  sample  in 
both  month  t — I and  month  t).  This  change  is  then 
added  to  the  composite  estimator  for  month  t — 1. 

Finally,  the  two  components  under  items  1 and  2 
are  combined  as  a weighted  average,  with  weights 
summing  to  1.  (Currently,  these  weights  are  taken  as 
0.5  each).  It  will  be  seen  that  the  preceding  definition 
of  the  composite  estimator  simply  defines  the  com- 
posite estimator  for  month  t in  terms  of  the  com- 
posite estimator  for  month  t — 1;  that  is,  in  terms  of 
a difference  equation.  Although  this  difference  equa- 
tion is  used  for  recurrent  computation  of  the  com- 
posite estimator  of  y,  the  difference  equation  can  be-' 
solved  to  display  the  composite  estimator  as  an  in- 
finite series  of  monthly  estimators  with  weights  ex- 
ponentially decreasing  into  the  past. 

It  is  the  essence  of  the  effectiveness  of  both  the 
rotating  design  and  the  composite  estimator  that 
there  is  a strong  positive  correlation  between  the  y 
values  in  2 consecutive  months.  Such  a correlation 
would  make  the  variances  of  the  8y,  small  and  would 
thereby  increase  the  effective  sample  size  of  seg- 
ments by  those  measured  in  earlier  months. 


REASONS  FOR  THE  DEPARTURE  FROM 
THE  CPS  COMPOSITE  DESIGNS 
FOR  LACIE 

The  essential  condition  for  the  effectiveness  of 
the  composite  estimators  in  rotation  designs  appears 
to  be  well  satisfied  in  LACIE.  There  is  usually  a 
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strong  positive  correlation  between  the  wheat 
acreages  of  segments  observed  in  consecutive  years. 
In  this  context,  one  should  remember  that  since  the 
segment  is  fairly  large  ($  by  6 nautical  miles),  any 
year-to-year  “rotation”  of  wheat  with  other  crops  in 
accordance  with  agricultural  practices  wilt  probably 
occur  within  segments  (apart  from  boundary  effects). 
Such  rotations  will  therefore  generate  negative  year- 
to-year  correlations  of  wheat  acreages  for  smaller 
areas  within  a segment  and  yet  will  not  destroy  the 
positive  year-to-year  correlation  for  segments. 
However,  the  following  are  substantial  differences 
between  LACIE  and  the  CPS  sampling  problems. 

1.  The  LACIE  time  series  is  yearly  and  extremely 
short  (at  present,  only  2 years)  as  opposed  to  the  long 
monthly  series  in  CPS. 

2.  Whatever  the  rotation  design  that  is  adopted 
for  LACIE,  it  must  be  anticipated  that  a considerable 
number  of  matched  segments  (i.e.,  segments 
sampled  in  2 consecutive  years)  will  be  lost  through 
cloud  cover  (and  possibly  other  reasons).  It  will 
therefore  be  necessary  to  replace  the  composite 
estimator  by  a more  flexible  estimator  capable  of 
dealing  with  unbalanced  segment  patterns  over  a 
moderate  number  of  years. 

On  this  basis,  a decision  was  made  to  use  estima- 
tors arising  from  mixed  analysis  of  variance 
(ANOVA)  models  as  described  in  the  section  of  this 
paper  entitled  “The  Mixed  ANOVA  Models  and  the 
Associated  Estimators.”  These  estimators  are 
designed  to  deal  with  the  completely  unbalanced 
matching  patterns  that  are  likely  to  arise  through 
cloud  cover  losses  of  segments,  patterns  which  will 
differ  considerably  from  any  balanced  rotation 
design.  Nevertheless,  in  the  next  section,  suitable 
rotation  designs  are  developed  since  they  are  ex- 
pected to  result  at  least  in  segment  patterns  for  which 
the  conditions  of  estimability  for  the  estimators  are 
satisfied. 


Basic  rotation  patterns.— -Basic  rotation  patterns  are 
established  as  (a.2),  two  segments  per  stratum;  (a.3), 
three  segments  per  stratum;  (a.4),  four  segments  per 
stratum;  and  (a, 5),  five  segments  per  stratum. 


Pattern  (a.2),  two  segments  per  stratum:  Pattern 


(a.2)  is  represented  by  the  following  table. 

Segment  no. 

Year  no. 

1 

2 3 

1 

X 

2 

X 

X 

3 

X X 

4 

5 

X 

Pattern  (a.3),  three  segments  per  stratum:  Pattern 
(a.3)  is  represented  by  the  following  table. 


Segment  no.  Year  no. 


1 

2 

3 

4 

5 

6 
7 


12  3 4 

x 

X x 

X X 

X X 

X X 

X X 

X 


Pattern  (a.4),  four  segments  per  stratum:  Pattern 
(a.4)  is  represented  by  the  following  table. 

Segment  no.  Year  no. 


THE  ROTATION  DESIGNS 


Rotation  Designs  for  Group  I Strata 

First,  the  “basic  rotation  patterns"  for  segments 
within  a stratum  are  described;  then,  the  stratum  col- 
lapsing strategies  to  deal  with  strata  having  only  one 
segment  are  described. 


1 

2 

3 

4 

5 

6 

7 

8 


1 2 3 

x 

x 

X X 

X X 

X X 

X X 

X 
X 
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Pattern  (a.5),  five  segments  per  stratum:  Pattern 
(a.5)  is  represented  by  the  following  table. 


Segment  no.  Year  no. 


Summary:  The  general  rotation  pattern  is  now 
clear.  If  v is  the  number  of  segments  per  stratum,  v/2 
or  (v  - I )/2  or  (v  + 1 )/2  segments  are  discarded  ev- 
ery year  in  such  a way  that  every  segment  is  in  the 
sample  for  exactly  2 consecutive  years. 

Collapsing  strategies. — It  is  first  assumed  that  the 
total  number  of  strata  in  Group  I is  at  least  two  and 
that  each  stratum  can  offer  one  or  more  segments  to 
the  sample.  The  collapsing  strategies  are  established 
as  follows. 

Strategy  (b.2),  two  strata  in  Group  1:  One  stratum 
with  one  segment  per  stratum  is  collapsed  with  one 
of  the  following. 

1.  The  other  stratum  with  one  segment  to  form 
pattern  (a.2) 

2.  The  other  stratum  with  two  segments  per 
stratum  to  form  pattern  (a.3) 

3.  The  other  stratum  with  three  segments  per 
stratum  to  form  pattern  (a.4) 

4.  The  other  stratum  with  four  segments  per 
stratum  to  form  pattern  (a.5) 

If  both  strata  have  at  least  two  segments,  each  will  be 
kept  separately. 

Strategy  (b.3).  three  strata  in  Group  I:  The  follow- 
ing strategies  are  employed. 

1.  Strategy  (b.3.0) — If  there  is  no  stratum  with 
one  segment,  all  strata  will  be  kept  separately. 

2.  Strategy  (b.3. 1)— If  there  is  one  stratum  with 
one  segment,  this  will  be  collapsed  with  another 
stratum  having  the  smallest  number  of  segments. 

3.  Strategy  (b.3.2) — If  there  are  at  least  two  strata 
with  one  segment,  all  strata  with  one  segment  will  be 


collapsed  to  form  either  pattern  (a.2)  or  pattern 
(a.3);  all  other  strata  (if  any)  will  be  kept  separately. 

Strategy  (b.4),  four  strata  in  Group  I:  The  follow- 
ing strategies  are  employed. 

1.  Strategy  (b.4. 1)— The  only  stratum  with  one 
segment  will  be  collapsed  with  another  stratum  hav- 
ing the  smallest  number  of  segments. 

2.  Strategy  (b.4.2)— Two  strata  having  one  seg- 
ment each  will  be  collapsed  together. 

3.  Strategy  (b.4.3)— If  there  are  three  strata  with 
one  segment,  two  will  be  collapsed  together  and  the 
third  will  be  collapsed  with  another  stratum  having 
the  minimum  number  of  segments. 

4.  Strategy  (b.4.4) — If  all  four  strata  have  one  seg- 
ment, they  will  be  collapsed  in  pairs. 

Strategy  (b.5+),  five  or  more  strati*  in  Group  I: 
The  collapsing  strategy  used  for  five  or  more  strata  in 
Group  I will  be  implemented  following  the  pro- 
cedure described  for  strategy  (b.3)  when  the  number 
of  strata  is  odd  or  strategy  (b.4)  when  the  number  of 
strata  is  even. 

Summary:  The  principle  of  collapsing  is  now 
dear.  All  strata  with  one  segment  are  collapsed  in 
pairs,  preferably  with  another  stratum  having  one 
segment.  If  the  total  number  of  strata  having  one  seg- 
ment is  odd,  one  triplet  of  strata  is  formed  to  obtain 
pattern  (a.3).  If  there  is  only  one  stratum  in  Group  I, 
rotation  will  yield  one  of  the  basic  rotation  patterns 
((a.2),  (a.3), . . .).  If  that  single  stratum  has  only  one 
segment,  no  rotation  is  possible  in  Group  I. 


Rotation  Design  for  a Group  II  Stratum 

Each  sampled  primary  unit  (county)  has  only  one 
secondary  (segment)  precisely.  The  single  Group  II 
stratum  in  the  crop  reporting  district  (CRD)  is 
treated  as  in  the  preceding  section  except  that  pri- 
maries (counties)  are  rotated  in  accordance  with  the 

basic  patterns  (a.2),  (a.3) If  only  one  primary 

(county)  is  in  Group  II,  no  rotation  is  possible. 
Whenever  the  same  primary  is  retained  for  2 con- 
secutive years,  the  (single)  segment  is  also  retained. 

It  is  clear  that  any  rotation  design  of  primaries 
would  be  difficult  to  implement  with  a probability 
proportional  to  size  (PPS)  sampling  procedure.  It  is 
therefore  suggested  that  primaries  be  sampled  with 
equal  probability  and  without  replacement  and  that 
the  size  variable  be  used  as  a concomitant  variable  in 
conjunction  with  regression  estimation. 
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THE  MIXED  ANOVA  MODELS  AND 
THE  ASSOCIATED  ESTIMATORS 


in  section  2 of  appendix  A.)  Hie  maximum  likeli- 
hood  (ML)  estimator  of  y is  then  of  the  form 


Estimation  Thoory  for  Group  I Strata 

The  following  mixed  ANOVA  model  for  the  seg- 
ment wheat  acreages  in  a particular  “large  area”  (say 
CRD)  is  adopted 


yA«  c at  * Wh  * i0*h  + ci„  * ehtt  fort  ■ t T 

h - 1 H 

* * 1 

(1) 

where  yhls  — LACIE’s  “observed”  wheat  acreage 
for  segment  sof  stratum  h in  year  / 
a,  - average  true  wheat  acreage  per  seg- 
ment  in  year  r 

- differential  true  effect  of  stratum  Aon 
wheat  acreage  applicable  to  all  years 
x*  * last  agricultural-census  wheat  acreage 
for  county  h 

o>  ” regression  coefficient  for  xA 
chs  “ fme  segment  variable  applicable  to  all 
years 

ehts  ” composite  segment  error  variable  of 
segment  s of  stratum  h in  year  t.  This 
error  variable  contains  two  compo- 
nents; namely,  the  deviation  of  the 
true  wheat  acreage  of  segment  h.s  in 
year  t from  the  additive  formula  a,  + 
+ chs  the  classification  error 

tu  • iny^' 

The  mixed  ANOVA  estimation  procedure  de- 
scribed in  appendix  A gives  a simple  technique  of 
estimating  the  variance  components  a2  and  2 by 
“synthesis  based”  estimators  <rf2  and  a}.  Moreover, 
the  fixed  coefficients  5/3^,  and  u>  can  be  ad- 
joined into  a composite  regression  vector.  Without 
loss  of  generality,  one  may  assume  that  the  first  three 
terms  in  equation  (1)  are  of  the  form  Xy,  where  X is 
an  orthonormalized  form  of  the  fixed-design  matrix 
of  equation  (1)  and  y is  an  associated 
reparameterization  of  the  composite  vector 
<at,5fih,(o>.  (This  follows  the  procedure  described 


(x,jr'x)-l{x'/rlv)  <2>‘ 

where  the  variance-covariance  matrix  <t2H  of  the 
*Aais  given  by 


ae2H 


(3) 


and  (/is  the  design  matrix  in  equation  (1)  represent- 
ing the  effect  of  the  variables  c^.  (Compare  also  with 
sections  2 and  3 of  appendix  A.) 

Suppose  now  that  one  wishes  to  estimate  the 
Group  I wheat  acreage  for  the  CRD  in  the  last  year 
indexed  (/  ■ T).  This  estimator  is  given  by 


(4)  ' 

where  Nh  is  the  number  of  segments  in  stratum  ft  and 


H 


h*  1 


is  the  number  of  segments  in  the  CRD. 

The  estimator  (eq.  (4))  can  be  written  in  the  form 


fycRD.r)  = (5) 


and  its  variance  in  the  form 


Var  y\(CRD,r)  * o^c'ix’fi  » *)  * (6) 


* 
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This  variance  formula  is  standard  in  weighted 
(Aitken)  linear  regression  theory.  The  fact  that  it 
represents  a first-order  approximation  of  Var 
rj(CRD,D  is  proved  in  appendix  B.  For  an  estima- 
tion of  equations  (3)  and  (6),  replace  <re2  and  crf2  by 
9 2 and  9 2 , respectively. 

Three  questions  may  arise  with  this  approach. 
First,  it  was  assumed  that  the  composite  errors  ehts  in 
equation  (1)  all  have  the  same  variance  <rf2.  This 
does  not  agree  with  standard  sample  survey  practice, 
which  would  have  to  postulate  different  within- 
stratum  variances  for  the  true  segment  wheat 
acreages  in  different  strata  (counties).  Similar  con- 
siderations may  also  apply  to  the  within-stratum 
classification  variances.  The  preceding  objection 
could  be  taken  into  consideration  and  different  <rrt2 
estimated  for  each  stratum.  However,  the  conditions 
of  estimability  described  in  appendix  A are  more 
likely  to  be  violated,  in  which  case  a return  to  a com- 
mon <r  2 would  appear  to  be  reasonable.  Moreover, 
the  variance-covariance  formula  (eq.  (3))  would  now 
be  of  the  form 

» ■ £ w***  ♦ ww  <7> 

A =1 


where  U0  is  the  design  matrix  for  the  c^and  Uh  is  the 
design  matrix  for  the  ehts. 

Strictly  speaking,  the  variables  chs  represent  sam- 
ples from  finite  populations  of  size  Nh.  The  finite 
population  corrections  (fpc's)  have  here  been  ig- 
nored. Work  is  in  progress  to  examine  the  inclusion 
of  the  fpc’s.  The  ehls  errors  are  however  composed  of 
finite  population  errors  (the  within-stratum  variation 
of  the  true  acreages)  and  an  infinite  population  error 
variable  (the  classification  error).  Again,  the  fpc’s  are 
ignored. 


Estimation  Theory  for  a Group  II  Stratum 

The  following  mixed  ANOVA  model  for  the  seg- 
ment wheat  acreages  in  a particular  “large  area”  (say 
a CRD)  is  adopted. 


where  yu  * L ACIE’s  “observed”  wheat  acreage  for 
segment  sin  year  t.  Theprimary  (coun- 
ty) is  given  the  same  index  as  the 
(single)  segment  within  it. 
x4  - known  size  variable  for  the  primary 
which  contains  segment  s.  Normally, 
this  size  variable  would  be  constant 
over  the  years. 

a,  — average  true  wheat  acreage  per  segment 
in  year  t 

9 - regression  coefficient  of  y on  x 
cs  — true  segment  variable  applicable  to  all 
years 

ets  ■ composite  segment  error  variable  of 
segment  s in  year  r.  This  error  variable 
has  three  components  which  will  nor  be 
separately  estimated.  The  three  compo- 
nents are  (1)  the  within-primary  varia- 
ble, (2)  the  primary  variable  within  the 
Group  II  population,  and  (3)  the 
classification  error. 

The  mixed  ANOVA  procedure  described  in  ap- 
pendix A can  again  be  used  to  provide  estimates  9 2 
and  9 2 of  a2  and  <r2.  Again,  one  would  use  the 
simulated  ML  estimators  of  the  a,  and  0 

y '«  [x'lr 1 *)  - 1 (x*H~ 1 y)  (2) 


More  specifically,  the  originally  fixed  design  matrix 
would  consist  of  columns  a,  of  0,1  variables  for  each 
of  the  a,  and  a single  column  of  size  variables  xr 
This  column  would  have  to  be  orthogonal ized  on  the 
0,1  columns  and  the  orthogonal  matrix  normalized  to 
obtain  the  new  fixed-design  matrix  X so  that  the  ML 
estimators  y for  the  two  matrices  are  analogous.  For 
the  new  X matrix,  the  old  a,  column  is  replaced  by  a 
column  f,  with  elements  0,1/  S~n,  where  nt  is  the 
number  of  segments  in  year  /,  and  the  old  x column 
with  elements  xs  is  replaced  by 


x ~ E Vr 
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where  X,is  the  mean  of  the  x,  values  in  year  rand 


1 ■ (*  - E - E »,->) 


Suppose  now  one  wishes  to  estimate  the  total 
wheat  acreage  (in  the  Group  II  primaries)  for  the 
CRD  in  the  last  year  indexed  (r  — T).  This  estimator 
is  given  by 


where  U is  the  design  matrix  for  the  segment  varia- 
bles cr  For  an  estimation  of  equations  (1 1)  and  (3), 
replace  o>2  and  <rc2  by  9 2 and  9C2,  respectively. 

Of  the  three  questions  which  were  raised  pre- 
viously concerning  the  appropriateness  of  the  mixed 
ANOVA  model  (eq.  (8)),  the  first  one,  concerning 
separate  strata  variances  <rrt2,  does  not  now  arise. 
However,  the  other  two  questions  may  again  be 
raised  about  the  model  and  answered  in  a manner 
similar  to  the  discussion  in  the  preceding  section. 


P„(CRD,r)  * Jvjar  + (9) 


where  now  Af  is  the  total  number  of  segments  in  the 
population  of  Group  II  primaries  (counties)  and  A is 
the  population  mean  of  the  size  variables  xr  The 
estimator  (eq.  (9))  can  be  written  in  the  form 


Estimation  Theory  for  a Group  III  Stratum 

The  ratio  estimator  employed  in  currently  used 
LACIE  technology  could  again  be  used  and  would 
reduce  the  Group  III  estimate  of  the  wheat  acreage  to 
equations  (4)  and  (9)  for  the  Group  I and  Group  11 
wheat  acreages. 


^(CRD.r)  = yd 


(10)  FUTURE  WORK 


and  its  variance  in  the  form 


Var  Pu(CRD,r)  ± oe2d' (x’fT 1 A )~xd  (11) 


where  ire2H  if  the  variance-covariance  matrix  of  the 
observation  vector  y with  elements  y„.  Clearly, 


o}h  - a 2 


I +-£-(/'(/ 

V 


(3) 


Future  work  must  be  concerned  with  a validation 
of  the  assumptions  and  the  theory  used  in  the  fifth 
section  of  this  paper  and  later  with  improvements  in 
the  rotation  designs  of  the  fourth  section.  In  this  con- 
text, it  should  be  noted  that,  since  the  2 operational 
years  of  the  past  have  at  least  matched  some  of  the 
segments,  it  would  be  possible  to  estimate  the  com- 
ponents <rr2  and  <r2  separately.  Moreover,  a certain 
model-monitoring  analysis  could  be  performed  on 
the  data  of  the  last  2 years  and  would  concern  the 
validity  of  the  model  assumptions  and  specifically 
the  appropriateness  of  the  regression  model. 

It  is  the  author’s  intention  in  the  near  future  to 
evaluate  this  method  using  Phase  I to  III  operational 
data  for  the  United  States. 
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Appendix  A 

A Simple  “Synthesis”-Based  Method  of  Variance 
Component  Estimation* 

H.  0.  Hartley,0  J,  N.  K.  Rao,b  and  L.  R.  LaMotte c 


1.  INTRODUCTION 

la  this  paper,  we  do  not  attempt  an  evaluation  of 
the  evergrowing  methodology  in  the  estimation  of 
variance  components.  (For  an  excellent  summary  of 
the  literature  up  to  1971,  see  reference  1.)  Optimality 
properties  are  sometimes  achieved  at  considerable 
computational  effort  A case  in  point  is  the  max- 
imum likelihood  (ML)  estimation  (ref.  2),  which  is 
still  fairly  laborious  for  large  data  banks  despite  the 
improvements  through  the  W-transformation  (ref. 
3).  Similar  observations  apply  to  the  general  case  of 
MINQUE  (ref.  4)  recently  simplified  by  Liu  and 
Senturia j(ref.  5).  Other  methods,  such  as  the  Hender- 
son 3 method  (ref.  6)  or  the  abbreviated  Doolittle 
and  square  root  methods  (e.g.,  ref.  7),  depend  on  a1 
subjective  ordering  of  the  components  (such  as  with 
the  forward  Doolittle  procedure),  and,  if  the  ordering 
is  unfortunate,  the  method  may  fail  to  yield  esti- 
mates for  certain  components,  whereas  with  a 
different  ordering  (not  attempted),  all  components 
may  well  be  estimable.  The  work  involved  in  at- 
tempting all  possible  orderings  of  the  variance  com- 
ponents is  usually  prohibitive.  The  present  method 
achieves  optimality  properties  and  is  nevertheless 
computationally  simple.  In  fact,  not  only  does  it 
possess  MINQUE  optimality  for  a particular  choice 
of  norm,  but  it  also  simplifies  various  other  op- 
timality properties  and  necessary  and  sufficient  con- 
ditions for  estimability  associated  with  MINQUE 
(see  sec.  6).  Moreover,  we  are  able  to  derive  suffi- 
cient conditions  for  consistency  which  also  provide 
estimability  conditions  of  a simpler  structure.  The 


•The  material  in  appendix  A was  presented  at  the  regional 
meeting  of  the  Biometric  Society.  Chapel  Hill,  North  Carolina, 
April  19??,  and  a version  published  in  Biometrics,  vol.  34,  no.  2, 
June  1978,  pp.  233-242. 

‘Texas  A & M University,  College  Station,  Texas. 
bCarleton  University,  Ottawa,  Ontario. 
cUniversity  of  Houston,  Houston,  Texas. 


consistency  of  our  estimators  makes  them  conven- 
ient as  starting  points  for  a single  ML  cycle  to  obtain 
asymptotically  fully  efficient  estimates. 


2.  THE  MIXED  ANOVA  MODEL 

Employing  the  currently  used  notation,  we  write 
the  mixed  ANOVA  model  in  the  form 


e-H 

V - Xy  + £ (AD 

<«1 


where  y is  an  rt  x 1 vector  of  observations 

jfisann  x k matrix  of  known  coefficients 
yisafc  x 1 vector  of  unknown  constants 
U,is  an  n x m,  matrix  of  0,1  coefficients 
b/isan  m(  x 1 vector  of  normal  variables 
from  V(0,o-/). 

Specifically,  - /„and  b^  is  an  n-vector  of  "er- 
ror variables."  Moreover,  the  design  matrices  (/,have 
precisely  one  value  of  1 in  each  of  their  rows  and  all 
other  coefficients  0.  The  total  number  of  random 
levels  is  denoted  by 


m - £ m{ 
i = l 


We  may  assume  without  loss  of  generality  that 

X'X  - / (A2) 


for,  if  equation  (A2)  is  not  satisfied,  we  may 
orthogonaiize  X by  a Gram  Schmidt  orthogonaliza- 
tion  process  with  a consequential  reparameterization 
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Oram  Schmidt  process.  Usually,  the  first  cotarnn  Sf ‘ W Ch°°SC  ,hec+1  quadratic  f°r™  O/y)  as 
ts  the  column  vector  with  all  elements  equal  to 
1/ Vi  It  is  the  objective  of  the  method  to  compute  a(v)  » vV  v'» 

estimates  of  the  variance  components  <r}  and  the  ^ v Vi Vf v 

■WV)^V  } - 1 c + 1 (A6) 


3.  THi  PRESENT  METHOD 

The  essence  of  the  present  method  is  to 

(a)  Select  c+ 1 quadratic  forms  Q.  (y)  in  the  ele- 
ments of  y 

(b)  Use  the  method  of  synthesis  (refs.  8 and  9)  to 

obtain  the  coefficients  k}i  in  the  formulas  for  E(Q,l 
in  the  form  J 


(b)  It  follows  from  the  method  of  synthesis  (see 
refs.  8 and  9)  that 

c+l 

EQ/v)  * Z V/2  (A7a) 

/«i 


c+l 

E(Q/)  ‘ TV/2  (A3) 

1*1 


(c)  Estimate  a}  by  equating  the  computed  CL  to 
their  eitpectations;  i.e.,  by  inverting  the  system  (eq. 
(A3))  to  compute  the  vector  a2  with  elements  & £ 


& * K Q(y)  (A4) 

from  the  vector  0(y)  with  elements  O,  (y),  where 
A " (*«)  with  rank  to  be  discussed  in  section  6 

(d)  Replace  any  negative  elements  of  £2  by  0 
We  now  give  more  details  for  (a),  (b),  and  (c). 
(a)  The  Qyfy)  will  be  based  on  contrasts  which  do 
not  depend  on  any  elements  of  y.  Accordingly,  we 
orthogonalize  all  Ut  matrices  on  X and  construct 
matrices  Vt  orthogonal  on  X as  follows.  Denote  by 
u(f,/)  the  nh  column  vector  of  tyand  by  x(r)  the  nh 
column  vector  of  X\  then,  the  columns  v(r,/)  of  V are 
given  by  ' 


with 


kH  * Z (*7u<f.O) ' (^/u(r,0)  (A7b) 


Now,  since  *(rj)  is  orthogonal  on  any  x(o)  (i  e 
we  can  write  the  kJt  in  the 


alternative  form 


kn  * Z (vl'('J))'(v;«t,o) 

B lCIStv,(r./)i<r.o}2  (A8) 


thereby  showing  that  k0  **  kj.  An  alternative  form  of 

Vs 


** • « {(w)  (wf  (a9) 


v(r,/)  = u (/,/) 


or 


Z x(rHxVKt.O)  (ASa) 

r*  1 


‘h"  the 

Wl"  h,ve  Ml  'ank  f+l  if  then  X „ 

matrices  Vf  Vt  are  not  linearly  dependent. 

(c)  We  shall  also  show  in  section  6 that  the 
system  ot  equations 


Vi  - Vt  ~ XX’U{  (A5b) 


Q = K&1  (A10) 
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is  consistent  even  if  the  rank  of  K is  degenerate.  Solv-  Defining  now  the  adjoined  matrices 
ing  equation  (A10)  in  the  form 


d2  - K~ Q (Ail) 


we  shall,  of  course,  be  particularly  interested  in  the 
full  rank  case  when  K~  — K~  *. 


4.  THE  COMPUTATIONAL  LOAD 

It  may  be  helpful  to  give  an  idea  of  the  computa- 
tional efficiency  of  the  present  method  by  tabulating 
the  number  of  products  involved  in  the  main  opera- 
tions of  the  algorithm.  To  this  end,  we  first  note 
simplified  versions  for  the  Observing  that 
Ut+i  - /,  we  have  from  equation  (AS)  that  *W|- 
I — XX  \ and,  since  X'X  — /,  we  find  that  »Wi  v*\ 
- / — XX'  and  finally  from  equation  (A9)  that 


*c+liC+t  - tr(/  - XX'XI  - XX') 

• tr (/  - XX') 

• n - k (A12) 


(A15) 

y- (' il-IK) 


the  bulk  of  the  work  consists  of  the  formation  of  the 
elements  of  the  symmetrical  matrix  V’V~  V'U  — 
U’V.  The  elements  of  this  matrix  are  assembled  in 
submatrices  in  accordance  with  the  partition  (eq. 
(A15))  as  shown  in  schedule  1,  where  it  must  be 
remembered  that  the  range  of  the  column  index  / de- 
pends on  /and  is  / “ 1 m,,  and  the  range  of  r is 

r — 1, . . . , mj,  ro  that  the  submatrix  V'jU,  has 
dimensions  m,  x m,.  The  for  i > j — 1 , . . . , c,  are 
then  obtained  by  forming  tne  sums  of  squares  of  the 
elements  in  each  submatrix  in  accordance  with  equa- 
tion (A7). 

Finally,  we  recite  the  formulas  for  the  remaining 
coefficients  in  the  equations  (A10).  The  1<rf , and 

are  computed  from  equations  (A12)  and 
(Al3),  respectively,  and  the  right-hand  sides  of 
O/y)  from  the  second  form  in  equation  ( A6)  for  J — 

1 c,  whereas  0<+i(y)  is  given  in  accordance 

with  equation  (A14)  by 


Similarly,  we  find  that 


Oc+1(v)  - vv  - (X\)’(X\)  (A16) 


*c+u  - 47  - xx>)(v^ 

- u\y(vi  - xx'v(v;\ 

* tf  Vtf  (A13) 

Further,  we  note  the  form  of  F^,y;  i.e.t 

Civ  - v - xx’v  (A  14) 


Schedule  I : Submatrices  of  V 'U 
(/,  V2  Uc 
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We  can  now  summarize  the  approximate  number 
of  products  involved  in  the  various  operations  of  the 
algorithms. 


Operation 

Approximate  no.  of  product i 
Involved 

Orthosonalization  of  XX 

(1/2 )*+(*+-  l)«t 

where  k*  ■ no.  of  cotumna  in 
original  JIT 

XI l/l"  1 e 

kmn 

XiXUfll-  1 c 

(eq.  (AS)) 

nmk 

t/T-  V'V 
(schedule  1) 

0 

Subtotals  of  elements  of  v(  »,/) 

k(jlj“  1 r 

(eq.  (A7)) 

(l/2)m(m+  I) 

ke+\Jlm  1 f 

(eq.  (AI3)) 

mn 

*<+l.ofl 
(eq.  <AI2» 

0 

O/y  )>-  1 c+  1 

(eqs.  (A6),  second  form,  and 
(AI6)) 

(«+  *+  !)(»+  1) 

The  important  point  is  that  the  number  of  prod- 
ucts is  only  a linear  function  of  the  number  of  data 
lines  n.  An  approximate  formula  for  the  total  num- 
ber of  products  is 


- l)  + (2/m  + IX*  + l)j 


8.  A NUMERICAL  EXAMPLE 

A small  numerical  example  with  n — 4.  k+  — 3, 
* “ 2.  c » I . «|  “ 2.  m ■ 2,  m2  * n — 4 is  shown  in 
schedule  2. 


Schedule  2:  A Numerical  Example  of  a Mixed  Model 


y 

X(o/».) 

U 

/ 

U 2 

Xfnew) 

V/ 

4 

1 

1 

0 

1 

0 

1 

0 

0 

0 

(1/2) 

(1/2) 

(1/2)  - 

(1/2) 

2 

t 

l 

0 

0 

t 

0 

1 

0 

0 

(1/2) 

(1/2) 

-0/2)  (1/2) 

1 

1 

0 

1 

0 

1 

0 

0 

1 

0 

(1/2) 

-(1/2) 

0 

0 

2 

1 

0 

1 

0 

1 

0 

0 

0 

1 

(1/2) 

-(1/2) 

0 

0 

The  orthogonalization  of  X (original)  to  X (new) 
follows  the  standard  Oram  Schmidt  procedure  and 
reduces  the  k+  - 3 dependent  columns  to  k - 2 col- 
umns which  are  orthogonal  and  standardized.  Note 
that 

*2)ncw  “ *<2)olu  - 0 '2)*(l)oId 

and 

*(3)old  “ x0)new  " *<2)„ew 

must  be  eliminated.  Using  now  x(r)  — x(r)new,  we 
orthogonaiize  V\  on  X and  compute  (see  eq.  (AS)) 

x'(l)u(l.l)  - (1/2) 
x'(2)u(l.l)  - (1/2) 

and  hence 

*(1.1)  ■ u(l.l)  - (l/2)x(l)  (t/2)x(2) 

Likewise, 

x'(l)u(2.1)  * (3/2) 
x'(2)u(2.l)  - (1/2) 
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and  hence 

•0*2)  ■ 0(2,1)  - (3/2MD  ♦ <I/2)«(2) 

This  yields  the  matrix  F|  in  schedule  2 which  has 
only  one  independent  column.  The  elements  of 

V\ U\  require  the  computation  of 

- (1/2) 

*0,l/o(2.l)  « v(2,l/u(l,l) 

« -0/2) 

and  v(2,l)'u(2,l)  - 1/2  with  sum  of  squares  of 
- 4(l/2)2  - 1.  Further  (eq.  (A12 ))<kn  - 4 - 2 
- 2 and  (eq.  (A13»  k l2  - *2I  - 4(l/2)2+4(0)2  - 1 
so  that  the  K matrix  is  given  by 

*-(!  9 

Finally  (eq.  (A16», 

02(y)  ■ 42  + 22  ♦ l2  + 22  - 09^  - 

-25-^2 
" 4 

■ 25  - 22.5 

- 2.5 

and  (eq.  (A6)) 

Q»(v) ■ (i 2)2 + {^~2)Y 

- 2 


he  solution  of  Q - f£2  therefore  yields  d22"  1/2, 
— 1.5. 


0.  OPTIMALITY  PROPERTIES  AND  TH8 
CON0I0TSNC  Y OF  THC  EQUATIONS 

The  estimators  described  in  section  3 may  be  seen 

to  be  “best  at  0}  - 0,  / - 1 c,  w2^  - 1"  as 

defined  by  L.  R.  LaMotte  (ref.  10).  Therefore,  the 
consistency  of  equation  ( A10),  regardless  of  the  rank 
of  K,  is  established  as  lemma  4 by  LaMotte.  That 
the  estimators  defined  by  equation  (All)  are  “best” 
among  invariant  quadratic  unbiased  estimators 
guarantees  that  they  are  admissible  in  that  clan;  that 
is,  no  other  invariant  quadratic  unbiased  estimators 
have  uniformly  less  variance  for  all  &.  Further,  as 
noted  by  LaMotte,  die  estimators  (eq.  (All))  have 
the  property  that  in  any  model  for  which  a uniformly 
best  estimator  exists,  equation  (All)  will  be 
uniformly  best.  Finally,  it  may  be  seen  that  the  “syn- 
thesis" estimators  (eq.  (All))  are  also  M1NQUE  as 
in  Rao  (ref.  4,  section  6)  with  V ■»  /.  No  claim  is 
made  that  this  choice  of  the  norm  has  any  particular 
merits  among  the  rather  general  family  of  the  norms 
covered  by  MINQUE  formulas.  However,  it  appears 
reasonable  to  us  that,  in  the  absence  of  any  theoreti- 
cal criteria  for  selection  of  MINQUE  norms,  a norm 
leading  to  simple  estimators  may  be  regarded  as 
meritorious. 

Following  section  A5  in  LaMo*te  (ref.  10),  it  may 
be  seen  that  the  rank  of  K is  equal  to  the  number  of 
linearly  independent  matrices  among  VtV;,  t - 

1 c + 1.  Thus,  a singular  K may  occur  if  the 

(/,(//  matrices  are  not  all  linearly  independent  or  if 
there  exists  (see  eq.  (AS))  a linear  combination  of 
the  UtV(  matrices  whose  columns  are  contained  in 
the  linear  subspace  spanned  by  the  columns  of  X.  In 
the  first  case,  the  singularity  is  caused  by  the  design’s 
leading  to  the  (/,  matrices;  in  the  second,  the 
singularity  is  caused  by  confounding  fixed  and  ran- 
dom effects.  In  either  case,  equation  (A  10)  is  consist- 
ent but  some  linear  combinations  of  the  variance 
components  cannot  then  be  unbiasedly  estimated. 
We  should  stress,  however,  that  other  special  cases 
of  MINQUE  (not  necessarily  invariant  to  y)  may 
also  deserve  particular  attention. 

The  consistent  estimator  2 may  serve  as  a start- 
ing value  for  the  iterative  maximum  likelihood 
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estimation  procedure  described  by  Hemmerle  and 
Hartley  (ref.  3).  Under  certain  regularity  conditions 
(not  discussed  here),  one  single  cycle  of  the  iteration 
will  result  in  asymptotically  efficient  estimators  of 
<r2  and  y.  If  the  iteration  is  carried  to  convergence, 
solutions  of  the  ML  equations  are  reached.  If  no  ML 
cycles  are  performed,  a consistent  estimator  of  y 
can  be  computed  from  the  generalized  least  squares 
(ML)  equations. 


y - {X'H  »JT)  ‘ 1 (X'H  M (A17) 


where 


r-t  ®c+i 


It  has  been  shown  by  Hemmerle  and  Hartley  (ref.  3) 
that  equation  (AI7)  can  be  computed  directly  from 
the  U,U;  and  A"  £/,  matrices  without  the  inversion  of 
the  « x n matrix  //using  their  so-called  W-transfor- 
mation.  In  fact,  the  W0  matrix  (their  eq.  (19))  is  es- 
sentially given  by  the  FJF, matrices  (see  schedule  I) 
and  by  the  contrasts  F/y  required  in  the  computa- 
tion of  Q/y).  The  variance-covariance  matrix  of  ? 
can  likewise  be  computed  through  the  W-transfor- 
mation. 
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Appendix  B 

The  Approximate  Variance  off  a Weighted  Least  Squares 
Estimator  With  “Empirical  Weights’* 


In  this  appendix,  we  prove  s general  theorem  eon* 
cernini  weighted  least  squares  sometimes  referred  to 
as  the  Aitken  method.  We  prove  the  theorem  in  a 
somewhat  more  general  form  than  that  required  for 
this  paper.  Accordingly,  the  notation  for  the 
variance-covariance  matrix  H is  altered  as  shown. 

Consider  the  linear  model 


Using  the  so-called  A-method  to  obtain  the  variance 
of  l,  we  write 


e($  - y)()  - t)'  * 


y ■ Xy  ♦ a (Bl) 


(B5) 


where  y is  an  n-vector  of  observations,  is  an  it  x p 
matrix  of  constants,  y is  a p- vector  of  unknown 
parameters,  and  the  residual  vector  • has  an  it  x it 
variance-covariance  matrix  of  the  form 


E erf  ■ // 

* 

• £ Of,  (B2) 

i-t 


where  (By/By)‘  and  (By/B$)’  are,  respectively, 
thep  x it  and  p x k matrices  of  partial  derivatives  of 
$ with  regard  to  the  elements  of  y and  e,  evaluated  at 
the  expected  values  Ey  ■ Xy  and  E»  ■■  9;  //and  A 
are,  respectively,  the  variance-covariance  matrices  of 
y and  of  e;  and  Cis  the  covariance  matrix  of  y and  *. 
We  now  show  that  IBy/B*)'  - 0 so  that  only  the 
first  term  in  equation  (BS)  needs  to  be  retained  for 
the  A -method.  In  order  to  obtain  the  derivatives  of  9 
given  by  equation  (B3),  we  note  that,  to  first  order, 


where  the  C,  are  known  matrices  and  the  9,  are 
unknown  parameters  which  have  to  be  estimated 
from  the  data  vector  y. 

Consider  now  the  weighted  least  squares  estima- 
tor 

l*#-1*)'1  (*$“*»)  <B3) 

where 


A • £ Vi  <B4) 

and  the  d,are  consistent  cstinia">rt  (computed  from 
y)  and  are  assumed  to  be  approximately  unbiased. 


A ♦ A 1 l - A lAA 

Therefore,  writing  10,  — S,  — f ind  //0  - H[0,),  we 
have  to  first  order 

fT*  - v‘  <*> 


and  hence 


^£  beic)jHo  ix(x'Hox)  ' 

V‘v  - *% '4  (B7> 
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If  we  therefore  find  the  coefficient  of  80,  at  y - Xy.  Further 
we  obtain 


(jrvl-^VVo-^V1*)-1  [if  ■ (*V‘*)-‘*V  (B9) 

•X\-'xy  - (x’Hjp) - 1 (jr'»0- lClllll~,Xi)  - 0 

(B8) 

E($  - t)  ($  - ?)'  = (x'Hq~  *xj ~ 1 1 (Bio) 


c-n • 


* • • * 
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Weighted  Aggregation 

A.  H.  Felvesona 


INTRODUCTION 

In  a sample  survey  such  as  LACIE,  where  one  is 
plagued  with  measurement  errors  and  loss  of  data,  it 
might  be  possible  to  improve  the  precision  of  the 
overall  estimate  if  observations  thought  to  be  of 
questionable  accuracy  are  downweighted  in  the  ag- 
gregation process. 

In  the  LACIE  aggregation  logic,  a “Group  III” 
wheat  area  ratio  estimate  is  currently  made  for  a 
stratum,  a substratum,  or  a collection  of  substrata 
which,  either  by  design  or  by  loss  of  data,  does  not 
contain  a sample  segment.  This  ratio  estimate  is 
made  by  taking  the  estimated  wheat  area  from  sur- 
rounding or  nearby  strata  or  substrata  with  “good" 
data  and  multiplying  it  by  the  historical  ratio  of  the 
Group  III  area's  wheat  to  the  neighboring  area’s 
wheat.  If  the  ratios  of  crop  averages  between  neigh- 
boring political  subdivisions  do  not  change  radically 
from  year  to  year,  the  Group  III  estimate  should  be  a 
reasonably  accurate  means  of  accounting  for  missing 
data. 

A flaw  in  the  LACIE  procedure  is  that,  if  data 
from  even  one  segment  are  available,  that  segment  is 
used  to  estimate  the  wheat  acreage  in  its  stratum  or 
substratum,  no  matter  how  many  segments  were 
originally  allocated  in  the  sampling  design.  Further- 
more, the  wheat  proportion  estimates  from  some 
segments  might  be  of  questionable  accuracy  because 
of  not  having  acquired  the  data  at  the  right  times  in 
the  growing  season  to  discriminate  wheat  from  other 
crops.  As  a result,  wheat  area  estimates  for  strata 
containing  insufficient  data  or  poorly  estimated  seg- 
ments can  be  seriously  distorted,  thus  affecting  the 
large-area  country  production  estimate. 

If  one  knows  that  a stratum  wheat  area  estimate  is 
likely  to  be  poor  because  of  the  preceding  factors. 


aNASA  Johnson  Space  Center,  Houston,  Texas. 


then  it  is  advantageous  to  replace  the  suspected  esti- 
mate by  a weighted  average  of  itself  with  the  afore- 
mentioned Group  III  estimate  of  its  wheat  area.  The 
size  of  the  weights  depends  on  the  degree  of  confi- 
dence one  has  in  the  direct  segment-based  estimates; 
for  example,  the  estimate  for  a stratum/substratum 
containing  segments  with  data  acquired  on  only  one 
Landsat  pass  would  be  more  heavily  weighted  with 
the  Group  III  component  than  with  the  direct  com- 
ponent. Conversely,  a stratum  which  was  thought  to 
have  reasonably  good  acquisition  patterns  for  its  seg-' 
ments  would  be  estimated  by  giving  most  of  the 
weight  to  the  direct  component. 

The  manner  in  which  a weighted  aggregation  tech- 
nique can  be  implemented  given  a set  of  weights  is 
described  in  the  following  section.  The  problem  of 
variance  estimation  is  discussed  in  the  section  en- 
titled “Properties  of  the  Estimate  i,”  and  the  ques- 
tion of  how  one  might  obtain  the  weights  in  an 
operational  environment  is  addressed  in  the  section 
entitled  “Determination  of  Weights.” 


THE  WEIGHTED  AGGREGATION  PROCE8S 


Definitions  and  Notation 

In  the  current  LACIE  Crop  Assessment  System 
(CAS),  one  has  on  hand  sample-segment-based  esti- 
mates of  wheat  acreage  for  strata  containing  at  least 
one  sample  segment  with  usable  data  and  historical 
wheat  acreages  for  all  strata  in  a country.  In  countries 
with  detailed  historical  data,  there  are  also  estimates 
available  for  Group  I substrata  or  Group  II  collec- 
tions of  substrata  containing  at  least  one  usable  sam- 
ple segment  as  well  as  historical  wheat  acreages  for 
all  substrata.  To  avoid  further  confusion,  the  term 
“domain”  will  be  used  to  mean  a stratum  in  countries 
without  detailed  historical  data  and  to  mean  either  a 
Group  I or  III  substratum  or  a Group  II  collection  of 
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substrata  in  countries  with  detailed  historical  data.1 
The  following  definitions  can  now  be  made. 

Let  d\  be  the  “direct”  or  segment-based  estimate 
of  wheat  acreage  for  the  Ah  domain,  (i  — 1, . . . , «), 
provided  that  domain  has  at  least  one  processed  sam- 
ple segment.  For  all  domains,  define 


Then,  equation  (2)  can  be  written 


(3) 


d*,  data  exists  for  /th  domain 
0,  otherwise 


(1) 


In  addition,  let  A,  be  the  historical  wheat  acreage 
and  w,  a gi.-en  weight  (0  « 1)  associated  with 

the  Ah  doman.  It  is  assumed  in  this  section  that  the 
w,are  known  or  computable;  the  problem  of  obtain- 
ing them  will  be  discussed  in  the  section  entitled 
“Determination  of  Weights.” 

Finally,  for  each  domain,  there  is  a prescribed  set 
of  other  domains  on  which  to  ratio  for  the  purpose  of 
computing  a Group  HI  estimate  of  wheat  acreage. 
Specifically,  for  the  Ah  domain  V /,  there  exists  a set 
of  other  domains  S , such  that  z,,  the  Group  III 
wheat  acreage  estimate  for  V ,,  is  given  by 


zt 


(2) 


Let  the  n x l vector  b(  — (bn,...,bln)Tbt  defined 
by 


1.  Vj  e St 

0,  otherwise 


where d - (rf„ . . . , d„) rand h - (A, hn)T. 


Estimation 

We  now  construct  a set  of  wheat  acreage  estimates 
where  % will  be  shown  to  be  the  limit  of  a 
convergent  sequence  of  iterated  estimates 
(a/w)j*Lo-  At  any  stage  of  the  iteration,  say  the 
v+lst,  a/v+I)  will  be  a weighted  average  between  the 
direct  estimate  d,  and  a “historical”  estimate  z/v>, 
which  is  defined  in  equation  (S). 

To  start  the  iteration  process,  let  a/0)  — </,,/■»  1, 
2,  ....  n.  Then,  the  v + 1st  iterated  estimate  is 
defined  by 


= w(dt  + (l  - w()zM  (4) 


where 

2tv) = (s) 


Note  that  z/0)  — zt  and  hence  a/11  is  a simple 
weighted  average  between  d,  and  2,.  At  later  itera- 
tions, z/v>  is  similar  to  2,  but  uses  a/v)  in  place  of  d,  in 
equation  (3). 

By  letting  the  n x l vector  •<">  « (a/*'),  . . . , 
aM) T,  and  the  n x » matrix 


c ■ (c«) 

'For  definitions  of  Groups  I,  II,  and  III  and  designations  of 
LACIE  countries,  see  the  paper  by  Feiveson  et  al.  entitled  e 0j(\  - w Jfc  lf)Tfl 

“LACIE  Sampling  Design."  In  the  United  Sutes,  a domain  is  ” \ v " 1 

either  a county  or  a collection  of  counties  within  a Crop  Reporting 
District  (CRD). 
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then,  equation  (4)  can  be  written 


Theorem  2 


aO'+l)  ■ u + Ca(*') 


If  A - (ay)  < 0 is  an  u x n matrix,  and  x is  any 
(6)  vector  with  positive  components  xj , x2, . . . , x„, 


where  u » (uj , . . . , un)  rand  ut  ■»  wfl,. 

’ ft 

E v/ 

" If 

E V/ 

min 

/■I 

< p( A)  < max 

/■I 

From  equation  (6),  it  follows  that 

. x‘  . 

!</<n 

= (|  + c + C2  + ...  + C'Ju  + C*+1a(0) 


hence,  converges  to  a vector  a if  and  only  if  the 
series  I + C + C2  + . . . converges.  We  now  quote  two 
theorems  which  will  show  that  a(v)  converges  to 
for  “reasonable”  values  of  the  matrix  B ™ (by), 
where 


S = (i-crlu  (7) 


In  the  following  theorems,  the  function  p(M),  for 
any  matrix  M,  is  defined  by 

p(M)  = max|X.(M)| 
i 

where  A,(M)  is  the  Ah  eigenvalue  of  M. 


Theorem  I 

If  M is  an  arbitrary  complex  n x n matrix  with 
p(M)  < 1,  then  I — M is  nonsingular,  and 

(I  - M)“ 1 = l + M + M2  + . . . 


the  series  on  the  right  converging.  Conversely,  if  the 
series  on  the  right  converges,  then  p(M)  < 1. 


These  theorems  may  be  found  on  pages  82  and  47, 
respectively,  of  reference  1. 

In  order  to  show  equation  (7),  we  will  first  show 
that  p( C)  < 1 and  then  use  Theorem  1.  We  can  at- 
tempt to  show  that  p(C)  < 1 by  using  Theorem  2 
with  Xj  ™ hj.  Then,  for  fixed  /, 

E cUxtlx!  = ( 1 - wi)hi  E bnhflht  E btkhk 


Thus,  if  w,  > 0 for  all  /,  by  Theorem  2,  we  have  p(C) 
< 1.  In  practice,  however,  some  wt  will  be  zero; 
hence,  some  restrictions  must  be  put  on  the  B matrix 
to  ensure  p(C)  < 1.  A natural  restriction  to  require 
is  that  by  — 0 if  wj  — 0;  i.e.,  if  the  direct  estimate  in 
the  .Ah  domain  is  nonexistent  or  thought  to  be  com- 
pletely worthless  ( Wj  — 0) , then  that  domain  should 
not  be  used  in  a ratio  estimate  for  some  other  do- 
main. With  this  restriction,  the  C matrix  now  takes 
on  the  form 


r X r r X (n-r) 


(n -r)  X r.  (n-r)  X (n-r) 
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where  the  order  of  the  domains  has  been  rearranged 
so  that  tvj  - - w,  - 0 and  > 0 for/  - r+ I, 

• • « I ^ 

Since  the  nonzero  eigenvalues  of  C2  are  also  the 
nonzero  eigenvalues  of  C,  it  is  clear  that  p(C)  - 
p(C2).  By  applying  Theorem  2 to  C2  as  in  the  pre- 
vious paragraph,  we  now  have  p(C)  - p(C2)  < 1. 

By  Theorem  1,  the  series  I + C + CJ  + . . . con- 
verges to  (I  - C)-1;  hence,  C»  -*  0 and 


lim  a*1’*  ■ (I  - C)-1u 

|He* 

s t 


PROPERTIES  OP  THE  ESTIMATE  ft 
Blae 

Suppose  a,  is  the  true  wheat  acreage  for  the  Ah  do- 
main. If  £(4>  - a,  for  w,  f»0  and  the  ratio  of  the 
wheat  acreage  in  domain  i to  that  of  its  ratio  estima- 
tion set  S,is  the  same  for  the  current  year  as  it  was  in 
the  year  which  produced  h,  then  ft  is  unbiased. 

To  show  unbiasedness  under  the  preceding  condi- 
tions, an  induction  argument  will  be  used.  First,  from 
equation  (4), 

£(.,«>)  ■ wft  ♦ ( 1 -«-.)£(,,) 

From  the  second  assumption, 

Y Sty*/  “ hij'Lbuhi 


\s  a consequence,  remembering  that  w,  — 0 implies 
that  bu  - 0,  we  have 

E(zt)  “ bikhk 

/ k 

c V'* 


It  thus  follows  that 

E(*i0))  " + (»  “ *))«* 

* «,  (9) 

Suppose  now  that  £l(fl/u,)  “ at.  Then  similarly, 

4^°)  * eM  ♦ ('  ~ 

" Wft  + (l  - 


“ ♦ (>  - w,h  v* 

* wiat  * (•  - %)a< 


Hence,  a/1'*  is  unbiased  for  ail  v,  which  implies 
£(^)  «■  af. 


Varianoo 

Strictly  speaking,  the  w's  are  random  variables 
because  they  are  functions  of  local  weather  and  crop 
distribution.  Since  ft  - (I  - C)_,u,  the  conditional 
covariance  matrix  of  ft  given  w - (w,, . . . , w„)T  is 
given  by 


S - (I  - - Cr)"!  00) 


where  is  the  conditional  covariance  matrix  of  u 
given  w.  The  unconditional  variance  of  ft  depends  on 
hov  w is  computed  and  will  not  be  discussed  here. 


Variance  of  the  Large-Area  Estimate 

The  large-area  wheat  acreage  estimate  is  simply 

the  sum  of  the  estimates  over  the  domains;  i.e.,  / - 

•• 
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f>6f.  As  a result,  the  conditional  variance  of  A is 
given  by 

v(a)  - •rr*  (11) 


where  • - (1,1 l)r.  In  the  following  section, 

we  shall  examine  how  this  compares  with  the 
variance  of  the  LACIE  estimate  under  certain  condi- 
tions. 


Estimation  of  V(A) 

If  an  estimate  of  is  available,  it  can  be  used  in 
equation  (10)  to  obtain  an  estimate  of  X and  hence 
V{A).  In  LACIE-type  operations,  the  d,  can  be 
assumed  independent  since  they  are  based  on  inde- 
pendent data  sets. 


The  unconditional  variance,  say  r^,  of  each  d,can 
be  estimated  as  is  currently  done  in  LACIE  (see  the 
paper  by  Chhikara  and  Feiveson  entitled  “LACIE 
Large-Area  Acreage  Estimation"  for  details).  If  the  </, 
and  w,  are  “approximately"  independent,  the  esti- 
mates Pj1  can  be  used  to  approximate  by 

“ *■*(*!  V *S.V) 

Thus, 

v(a)  = 6r[(l  - C)  ‘^U(|  Cr)  *]e  (12) 


is  (he  estimated  variance  of  A. 


DETERMINATION  OF  WEIGHTS 


to  choose  w such  that  the  mean-squared  error  of  ^is 
minimized,  but  this  is  impossible  without  knowledge 
of  the  bias.  If  an  estimate  of  is  available,  it  might 
be  possible  to  minimize  the  variance  of  a (eq.  (12)) 
with  respect  to  w;  however,  this  is  a formidable  tusk 
for  arbitrary  B.  For  the  case  B ■ J,  (-  aa7)  some 
progress  can  be  made  as  will  be  shown. 


ThaCaaaB  - J 

Suppose  one  is  estimating  wheat  acreage  for  a 
relatively  small  region  in  a country  (such  as  a CRD 
in  the  United  States),  where  the  ratio  estimation  set 
is  the  totality  of  all  domains  in  the  region.  Then,  the 
matrix  B is  all  ones;  i.e., 


In  LACIE,  withiu  a CRD  in  the  United  States, 
this  is  actually  the  case,  as  long  as  there  are  at  least 
three  usable  sample  segments  in  the  CRD  (see  the 
paper  by  Chhikara  and  Feiveson  for  details). 

In  this  situation,  I — C — (I  - va7),  where  v ■ 
(vt v„) T and  v,  — (1  — wt)hj/(h.).  As  a conse- 

quence, 

(I  - C)~ 1 = I (14) 

which  gives 

a ■ u + — L_jweru  (15) 


Tha  General  Cate 


Other  than  using  the  obvious  choice  of  h>,  — 0 for 
domains  with  no  data,  there  is  no  straightforward 
procedure  for  determining  w.  Ideally,  one  would  like 
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It  follows  that 


A - St 


T 

r « 


1 


i - s* 


.ru 

I 


k 

- 0 ♦ *)£d, 

f-l 

“ ALAC!E 


(17) 


where 


(16) 


where  R is  the  Group  III  ratio  as  shown  in  equation 

Bias  and  variance  o/Aj.—ln  general,  A f (including 
*LAQ0  '*  8 biased  estimator  of  the  total  wheat 
acreage  A , even  if  the  rf,  are  unbiased  when  **t  #>0. 
Let  a,  be  the  true  current-year  wheat  acreage  for  V? 
If  we  define  0 - (X  c*,)(X  fy)  and  ey  - a,  - then 


r, 

4-e' 


- 0 


To  distinguish  equation  (16)  from  thegene.al  case 
of  arbitrary  B,  we  will  use  the  notation  Aj(ox  A when 

* * J * * 

If  the  Wj  are  all  zero  or  one,  Aj  is  the  standard 

LACIC  estimator.  To  see  this,  suppose  the  domains 

are  arranged  in  order  such  that  w,  *»  *■ . . . wk  «■  1 

and  «■ . . , wn  ■»  0.  Then 


i-i 

k 

E*/ 

<*t 


It  then  follows  that 


*■  S V/ 

A 1 

2>A 

\ 

(Ejlygv  ~ j 

j - ei) 

. (£“*> 

E-w  - £v 


(18) 
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since  A m 1 at.  If  £(</,)  • at  and  V(dt)  — we  where  cr/  - Varfrf,).  Substituting  in  equation  (16), 
have  we  have 


If  the  ^are  such  that  1 X h/J  < < X wpLh  then  E(fij) 
*A.  Consequently,  equation  (20)  can  be  minimized 
with  respect  to  w to  obtain  an  approximately 
minimum  mean-squared  error  for  Aj.  Using  the 
Schwarz  inequality,  it  can  be  seen  that  equation  (19) 
is  minimum  when  wt<t  {hjla}).  The  corresponding 
optimal  variance  is  given  by 


Comparison  with  LACIE  estimator. — Suppose  we 
are  estimating  wheat  in  a CRD  with  n counties  and 
B - J.  Suppose  also  that  the  first  k counties  have 
usable  data  but  the  remaining  n - k counties  have 
lost  all  their  data  because  of  cloud  cover.  The  LACIE 
estimate  of  A,  the  total  wheat  acreage,  is  given  by 
equation  (17),  whereas  the  “optimal"  estimate  is 
given  by  letting 

h, 

-k.  i <k 

wf  ■ (22) 

0,  / > * 


Comparing  variances,  we  see  that 


The  variance  of  is  similar  to  equation  (21)  ex- 
cept the  upper  limit  in  the  denominator  sum  is  k in- 
stead of  «;  i.e  . 


Taking  the  ratio  of  variances,  we  obta  n 
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which  is  always  less  than  or  equal  to  1,  with  equality 
when  h,  <* 

Using  official  LACIE  estimates  of  within-county 
variances  to  obuin  the  p wu  evaluated  for  two 
CRD’s,  the  North  Central  in  Montana  and  the  North 
Central  in  Kansas.  In  the  Montana  CRD,  p wu  0.57, 
which  suggests  that  weighted  aggregation  would  give 
a considerably  more  accurate  estimate  than  the  cur- 
rent procedure.  In  the  Kansas  CRD,  p wu  almost  1; 
here,  the  LACIE  estimate  wu  quite  efficient.  More 
work  needs  to  be  done  with  existing  LACIE  data  to 
evaluate  the  weighted  technique  taking  into  account 
changes  in  the  a}  due  to  acquisition  patterns  and 
more  general  B matrices.  Neither  factor  wu  con- 
sidered in  the  preliminary  calculations  described 
herein. 


I.  Varga,  Richard  Matrix  Iterative  Analytic.  Prentice-Hall, 
1962. 
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Design,  Implementation,  and  Results 
of  LACIE  Field  Research 

M.  E Bauer' M.  C McEwen*  W.  A.  MaiUa* and  IE  Harlan* 


INTRODUCTION 

Major  tdvancementi  have  been  made  in  recent 
yean  in  the  capability  to  acquire,  proceea,  and  in* 
terpret  remotely  sensed  multispectral  tneasuiemenu 
of  the  energy  reflected  and  emitted  from  crops,  soils, 
and  other  Earth  surface  features.  As  a result  of  ex* 
penmen ts  such  as  LACIE,  the  technolofy  is  moving 
rapidly  toward  operational  applications.  There  is, 
however,  a continuing  need  for  quantitative  studies 
of  the  multispectral  characteristics  of  crops  and  nils 
if  further  advancement*  in  the  technology  are  to  be 
matte.  In  the  past,  many  such  studies  were  made  in 
the  laboratory  because  of  a lack  of  instrumentation 
suitable  for  field  studies,  but  the  applicability  of 
laboratory  studies  is  generally  limited.  The  develop* 
mem  of  sensor  systems  capable  of  collecting  high- 
quality  spectral  measurements  under  field  conditions 
has  made  it  possible  to  pursue  investigations  which 
would  not  have  been  possible  a few  years  ago. 

A major  effort  was  initiated  in  the  fall  of  1974  by 
the  NASA  Johnson  Space  Center  (JSC)  with  the 
cooperation  of  the  US.  Department  of  Agriculture 
(USDA)  to  acquire  fully  annotated  and  calibrated 
multitemporal  sets  of  spectral  measurements  and 
supporting  agronomic  and  meteorological  data  (ref. 
1).  The  Purdue  University  Laboratory  for  Applies* 
lions  of  Remote  Sensing  (LARS)  was  responsible  for 
the  technical  design  and  coordination  of  the  experi- 
ment as  well  as  for  major  portions  of  the  data  ac- 
quisition, processing,  and  analysis.  Other  organiza- 
tions, particularly  the  Environmental  Research  In- 
stitute of  Michigan  (ERIM),  Texas  A tt  M Remote 


•Purdu*  Uaivertny.  West  Lifiyc'  ;.  Indiana 
'’NASA  Johnson  Spact  Canltr.  Houston.  Taaaa 
cEnvironm«nUl  Reward)  Irauiutc  of  Michigan,  Ann  Arbor. 
Michigan. 

^Tcxm  A & M Umvertity.  College  Station,  Teua. 


Sensing  Center,  and  Colorado  State  University,  con- 
tributed to  the  experiment  planning  and  date 
analysis. 

Spectral,  agronomic,  and  meteorological  measure- 
ments were  made  it  LACIE  test  sites  in  Kansas  and 
North  Dakota  for  3 years  and  in  South  Dakota  for  2 
years.  The  remote-sensing  measurements  include 
data  acquired  by  truck-mounted  spectrometers,  a 
helicopter-borne  spectrometer,  an  aircraft 
multispectral  scanner  (MSS),  end  the  Lendset 
multispectral  scanners.  These  date  are  supplemented 
by  an  extensive  set  of  agronomic  and  meteorological 
data  acquired  during  each  mission. 

The  LACIE  field  measurements  data  form  one  of 
the  most  complete  and  best  documented  dau  sett  ac- 
quired for  agricultural  remote-sensing  research. 
Thus,  they  are  well  suited  to  serve  as  a data  bate  for 
research  to  (l)  determine  quantitatively  the  relation- 
ship of  spectral  to  agronomic  characteristics  of  crops, 
(2)  define  future  sensor  systems,  and  (3)  develop  ad- 
vanced data  analysis  techniques.  The  dsta  base  is 
unique  in  the  comprehensiveness  of  sensors  and 
missions  over  the  same  site*  throughout  the  growing 
season  and  in  the  calibration  of  all  multispectral  dau 
to  a common  standard. 

Continuing  analysis  of  the  field  dau  is  providing 
insight  into  the  spectral  properties,  spectral  iden- 
tification, and  assessment  of  crops.  The  analyses  in- 
clude development  of  predictive  relationships  be- 
tween spectral  variables  and  leaf  area  index,  biomass, 
plant  water  content,  and  percent  soil  cover;  deter- 
mination of  ‘.he  effects  of  cultural  and  environmen- 
tal factors  on  the  spectral  reflectance  of  wheat;  in- 
vestigations of  the  spectral  separability  of  barley  and 
spring  wheat;  detetmination  of  the  early-season 
Landsat  threshold  for  detection  of  wheat;  and  com- 
parisons of  Landsat  MSS  and  thematic  mapper 
spectral  bands  for  crop  identification  and  assess- 
ment 

The  remainder  of  this  paper  describes  the  project 
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objectives,  presents  an  overview  of  the  experimental 
approach,  describes  the  data  acquisition  program, 
and  discusses  selected  results  based  on  field  data. 
The  paper  ends  with  a summary  of  the  key  ac- 
complishments and  results  of  the  experiment  and 
recommenda'.ions  for  future  Held  research. 


OBJECTIVES 

The  overall  objective  of  the  LACIE  Field 
Measurements  Project  was  to  acquire,  process,  and 
distribute  to  researchers  fully  annotated  and  calibra- 
ted multitemporal  seu  of  spectral  measurements 
over  the  wavelength  range  of  0.4  to  15  micrometers, 
along  with  suppoiting  agronomic  and  meteorological 
data.  These  data  would  serve  as  a data  base  for  (I) 
determining  quantitatively  the  temporal-spectral 
characteristics  of  spring  and  winter  wheat,  the  soil 
background,  and  surrounding  confusion  crops;  (2) 
defining  future  rrultispectral  sensor  systems;  and  (3) 
developing  advanced  data  processing  and  analysis 
techniques. 

Specific  objectives  are  listed  below  for  each  of 
these  categories.  The  objectives  emphasize  analyses 
to  increase  understanding  of  agricultural  scenes; 
however,  the  data  may  also  be  used  to  pursue  sensor 
design  and  data  processing  objectives. 

I.  Scene-related  objectives 

a.  Determination  of  the  relation  of  crop 
canopy  characteristics  sixh  as  percent  soil  cover,  leaf 
area  index,  biomass,  and  plant  water  content  to 
multi  temporal  spectral  response 

b.  Determination  of  the  effects  of  cultural  and 
environmental  variables  on  the  spectral  properties 
end  spectral  identification  of  wheat 

c.  Determination  of  the  spectral  dis- 
criminability  of  wheat,  small  grains,  and  other  crops 
u a function  of  growth  stage 

d.  Determination  of  the  presence,  severity, 
end  extent  of  crop  stresses  such  as  drought,  disease, 
and  winterkill  from  spectral  measurements 

e.  Determination  of  the  year-to-year  variation 
in  the  condition  and  spectral  response  of  wheat  and 
other  crops 

f.  Determination  of  the  relation  of  grain  yield 
to  the  multitemporal  spectral  response  of  wheat 

g.  Determination  of  the  effects  on  spectral 
response  of  geometric  factors  such  as  Sun  angle,  view 
angle,  and  canopy  structure  of  wheat  and  other 
selected  crops 


h.  Determination  of  the  effect  of  the  at- 
mosphere on  the  measured  spectral  responses  of 
wheat  and  other  crops 

i.  Determination  of  the  characteristics  and  use 
of  thermal  measurements  for  discrimination  of 
wheat  from  other  crops 

j.  Validation  of  canopy  reflectance  models 

2.  Sensor-system-related  objectives 

a.  Determination  of  optimum  or  required 
multispectra!  sensor  system  parameters  including 
spectral  bands,  signal-to-noise  ratio  (S/N),  noise- 
equivalent  difference  in  reflectance  (NEAp ),  noise- 
equivalent  difference  in  temperature  (NEA^),1  and 
time  and  frequency  of  sensor  overpasses 

b.  Comparison  and  evaluation  of  Landsat  MSS 
end  thematic  mapper  wave  length,  bands  for  crop 
identification  and  assessment 

3.  Data-procesaing-system-roiated  objective 

e.  Development  of  advanced  data  processing 
and  analysis  techniques  that  use  multitemporal, 
spatial,  spectral,  transformed  spectral,  snd  ancillary 
data  characteristics 


OVERVIEW  OF  EXPERIMENTAL  APPROACH 

An  overview  of  the  experimental  approach  is 
shown  in  figure  1.  At  the  beginning  of  the  project, 
the  technical  issues  and  specific  objectives  to  be  ad- 
dressed with  the  field  measurements  data  were 
defined.  This  led  to  the  experimental  design  for  data 
acquisition  and  processing  snd  to  the  definition  of 
initial  data  analysis  plans  and  products. 

A multistage  approach  to  data  acquisition  was 
taken,  including  areal,  vertical,  and  temporal  staging. 
Areal  sampling  was  accomplished  with  test  sites  in 
different  pans  of  the  UJS.  Greet  Plains.  Vertical  stag- 
ing, or  collection  of  data  by  different  sensor  systems 
•t  different  altitudes,  ranged  from  mobile  towers  to 
Landsat.  Temporally,  data  were  collected  at  7-  to  21- 
day  intervals  to  sample  all  important  growth  stages 
and  during  3 years  to  obtain  a measure  of  the  year-to- 
year  variation  in  growing  conditions  and  their  in- 
fluence cm  spectral  response. 

Measurements  were  made  at  three  LACIE  test 
sites  during  3 crop  years.  1975  to  1977.  The  sites  are 


’NEAp  end  NEAT  are  measure*  of  minimum  detectable 
difference*  tn  tcene  reflectance  and  temperatura. 
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FIGURE  t Overview  of  experimental  approach  for  LAC1E  field  research. 


In  Finney  County,  Kansas;  Williams  County,  North 
Dakota;  and  Hand  County,  South  Dakota.  Finney 
County  and  Williams  County  were  chosen  to  repre- 
sent  winter  and  spring  wheat  growing  areas,  respec- 
tively. Hand  County  is  typical  of  the  transitional 
zone  between  winter  and  spring  wheat  growing  areas. 

The  primary  sensors  for  data  collection  were 
truck-mounted  spectrometers,  a helicopter-borne 
spectrometer,  an  aircraft  multispectral  scanner,  and 
the  Landsat-1  and  -2  multispectral  scanners.  Each 
sensor  system  has  unique  capabilities  for  acquiring 
spectral  data.  The  spectrometers  produce  the  highest 
quality  reflectance  measurements  but  provide  only 
limited  measurements  of  spatial  variability.  On  the 
other  hand,  an  aircraft  scanner  provides  spatial  sam- 
pling of  the  scene  and  can  obtain  data  at  multiple 
altitudes,  but  its  spectral  coverage,  although  broader 
than  that  of  a Landsat  MSS,  is  limited  to  a fixed  set  of 
wavelength  bands.  The  helicopter  and  aircraft  data 
acquisition  systems  have  the  advantage  of  flexible 
scheduling  and,  therefore,  provide  greater  oppor- 
tunity to  obtain  cloud-free  data  at  critical  crop 


growth  stages  than  the  Landsat  system  provides. 
Landsat  provides  wide-area  coverage  but  is  limited  in 
its  spatial  resolution  and  the  placement  and  number 
of  spectral  bands. 

The  staging  of  data  acquisition  is  summarized  in 
figure  2.  Helicopter-spectrometer  and  aircraft-scan- 
ner data  were  collected  in  a series  of  flightlines  over 
commercial  fields  in  the  LACIE  intensive  test  site  in 
each  of  the  three  counties.  Landsat  MSS  data  were 
acquired  and  processed  for  the  entire  test  site,  as  well 
as  for  surrounding  areas.  These  data  provide  a 
measure  of  the  natural  variation  in  the  temporal- 
spectral  characteristics  of  wheat  and  surrounding 
cover  types. 

The  truck-mounted  spectrometers  collected 
spectra  of  crops  in  controlled  experimental  plots  at 
agricultural  research  stations  near  the  test  sites  at 
Garden  City,  Kansas,  and  Williston,  North  Dakota. 
These  data,  combined  with  the  more  detailed  and 
quantitative  measurements  of  crop  and  soil  condi- 
tions which  were  made  at  the  experiment  stations, 
enable  more  complete  understanding  and  interpreta- 
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ITS*  INTENSIVE  TEST  SITE 

FIGURE  2 Schematic  illustration  of  LACIE  field  measurements  data  acquisition. 


lion  of  the  spectra  collected  from  commercial  Helds. 
Past  experience  has  shown  that  there  are  generally 
too  many  interacting  variables  in  commercial  fields 
to  determine  exact  causes  of  observed  differences  in 
spectral  response.  With  data  from  plots  where  only 
two  to  four  factors  are  varied  under  controlled  condi- 
tions, it  is  possible  to  determine  more  exactly  and 
understand  more  fully  the  energy-matter  interac- 
tions occurring  in  crops. 

The  spectral  measurements  were  supported  by 
descriptions  of  the  targets  and  their  conditions.  The 
observations,  counts,  and  measurements  of  the  crop 
canopy  include  maturity  stage,  plant  height,  biomass, 
leaf  area  index,  percent  soil  cover,  and  grain  yield. 
Also  included  are  measurement  conditions  such  as 
sensor  altitude  and  view  angle,  as  well  as  measure- 
ments of  the  atmospheric  and  meteorological  condi- 
tions. The  data  are  supplemented  by  aerial  photogra- 
phy and  ground-level  vertical  and  oblique  photo- 


graphs of  the  fields  and  test  plots. 

A data  library  of  all  spectral,  agronomic, 
meteorological,  and  photographic  data  collected  is 
maintained  at  LARS.  The  data  have  been  processed 
in  standard  data  formats  and  measurement  units  and 
made  available  to  JSC-supported  investigators  and 
other  interested  researchers. 


DESCRIPTION  OP  DATA  ACQUISITION, 
PROCESSING,  AND  DISTRIBUTION 

This  section  describes  the  acquisition,  processing, 
and  distribution  of  the  LACIE  field  measurements 
data.  It  begins  by  describing  the  test  sites  and  the  ex- 
periments at  the  agriculture  experiment  stations, 
followed  by  descriptions  of  the  sensors  and  sensor 
calibration.  The  procedures  for  acquiring  the 
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spectral,  agronomic,  and  meteorological  data  are  then 
described.  The  section  ends  with  a description  of  the 
data  processing,  library,  nnd  analysis  systems. 


Teat  Site  and  Experiment  Daaertptlon 

The  test  sites  (fig.  3)  were  located  in  Finney 
County,  Kansas;  Williams  County,  North  Dakota; 
and  Hand  County,  South  Dakota.  Each  site  consists 
of  a LACIE  intensive  test  site  and,  in  Kansas  and 
North  Dakota,  an  agricultural  research  station. 
Measurements  were  acquired  for  3 years  at  the  Kan- 
sas and  North  Dakota  sites  and  for  2 years  at  the 
South  Dakota  site. 

The  test  sites  were  chosen  to  include  as  wide  a 
range  of  important  wheat  production  areas  as  possi- 
ble. In  addition,  the  Finney  Countv  and  Williams 
County  sites  were  selected  because  of  their  prox- 
imity to  agricultural  research  stations.  Personnel 
from  the  USDA  Agricultural  Stabilization  and  Con- 
servation Service  (ASCS)  were  available  in  each 
county  to  collect  the  required  intensive  test  site 
ground-truth  data. 

At  the  experiment  station  in  Garden  City,  Kansas, 
experiments  were  conducted  on  dryland  and  irri- 
gated winter  wheat  and  small  grains.  At  the 
Williston,  North  Dakota,  experiment  station,  a 
small-grains  experiment  and  a cultural  practice  ex- 
periment with  spring  wheat  were  conducted. 

Intensive  test  sites.—: The  intensive  test  sites  are  8.1 
by  9.7  kilometers  in  size.  Three  flightlines,  each  9,7 
kilometers  long,  were  located  across  each  site.  The 
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number  of  fields  of  each  major  cover  type  in  each 
site  for  1976  is  summarized  in  table  1. 

Finney  County,  Kansas:  The  test  site  is  located  in 
the  High  Plains  Tableland  physiographic  area  at 
latitude  38°10'  N and  longitude  100*43'  W.  The  eleva- 
tion of  the  site  is  900  meters.  The  site  is  overlaid  by  3 
to  10  meters  of  loess  from  the  early  Wisconsin  age. 

The  soils  of  the  test  site  are  in  the  Mollisol  order, 
Ustoll  suborder,  and  Argiustolls  great  group. 
Mollisols  are  soils  that  have  nearly  black,  friable, 
organic-rich  surface  horizons  high  in  bases,  Ustolls 
are  formed  in  semiarid  regions;  they  are  dry  for  long 
periods  and  have  subsurface  accumulations  of  carbo- 
nates. The  mqjor  soil  series  in  the  area  are  Richfield 
and  Ulysses,  which  are  deep,  fertile,  well-drained, 
nearly  level  to  gently  sloping  loamy  soils  of  the  up- 
land that  are  well  suited  to  cultivation. 

The  area  has  a distinct  continental  type  of  climate 
characterized  by  abundant  sunshine  and  constant 
wind.  Most  of  the  precipitation  falls  during  the  early 
part  of  the  year,  with  a rapid  decline  in  the  prob- 
ability of  receiving  adequate  rainfall  during  July  and 
August.  Thus,  the  growth  cycle  of  winter  wheat  is 
well  matched  to  the  available  moisture  supply. 
Average  annual  precipitation  for  Finney  County  is 
48.5  centimeters  14.3  centimeters  from  March 
through  May.  20.1  centimeters  from  June  through 
August,  9.7  centimeters  from  September  through 
November,  and  4,4  centimeters  from  December 
through  February. 


TiHl.t:  I. — Number  oj  Commertial  Fields  of  Each 
Crop  or  Com  Type  in  the  Field  Measurements 
Test  Sites,  1976 
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The  mqjor  crops  in  Finney  County  are  wheat  and 
grain  sorghum,  which  account  for  about  60  and  20 
percent,  respectively,  or  the  total  cropland.  The  ma- 
jority of  wheat  is  produced  following  summer  fallow 
practices,  although  an  increasing  amount  is  being  ir- 
rigated. Winter  wheat  is  seeded  in  September  or  early 
October,  then  is  dormant  from  December  to  Febru- 
ary. Green-up  occurs  in  March;  the  crop  is  fully 
headed  by  mid-May;  and  harvest  is  typically  com- 
pleted during  the  first  week  of  July. 

Williams  County,  North  Dakota:  This  test  site  is 
located  at  latitude  48°19*  N and  longitude  103°25'  W. 
It  is  representative  of  the  cool  semiarid  areas  of  the 
northern  Great  Plains  where  annual  precipitation 
averages  33  to  38  centimeters.  The  site  is  at  an  eleva- 
tion of  6S0  meters  and  lies  in  the  glaciated  area  with  a 
drift  mantle  and  an  undulating  to  steep  surface. 

The  soils  in  the  site  are  of  the  Mollisol  order, 
Boroll  suborder,  with  Williams  and  Williams-Zahl 
being  the  mtuor  associations  present.  Both  occur  on 
undulating  to  rolling  landscapes  and  are  well  to  ex- 
cessively drained.  Much  of  the  surface  drainage  is  to 
depressions.  The  soils  were  developed  from 
calcareous  glacial  till  and  are  suitable  for  cropland 
and  pasture.  The  soils  of  the  Williams  association  are 
very  productive. 

The  climate  of  the  area  is  typically  continental, 
with  long  cold  winters,  short  warm  summers,  wide 
diurnal  ranges  in  temperature,  frequent  strong 
winds,  and  limited  (as  well  as  uncertain  and  highly 
variable)  precipitation.  Average  amounts  of  pre- 
cipitation are  4.6,  15.5,  12.2,  and  4.3  centimeters  in 
the  winter,  spring,  summer,  and  fall,  respectively. 

The  major  crop  is  wheat,  which  occupies  about  70 
percent  of  the  grain  crop  acreage.  Both  hard  red  and 
durum  spring  wheats  are  grown.  Most  of  the  wheat  is 
grown  on  summer  fallow  land.  The  major  cover 
types  in  the  site  are  wheat,  summer  fallow,  and 
pasture;  limited  acreages  of  rye,  barley,  alfalfa,  and 
flax  are  also  grown.  The  cropping  calendar  for  the 
spring  wheats  begins  with  seedbed  preparation  in  late 
April  to  early  May.  Planting  is  generally  in  mid-May; 
heading  occurs  from  late  June  to  mid-July;  and  har- 
vest is  from  mid-  to  late  August. 

Hand  County,  South  Dakota:  The  test  site  is  in  the 
north-central  Great  Plains  at  latitude  44°34'  N and 
longitude  99°00'  W.  It  is  a transition  area  with  the 
Corn  Belt  to  the  east,  spring  wheat  producing  areas  to 
the  north,  and  winter  wheat  producing  areas  to  the 
south.  The  boundary  between  the  subhumid  lowland 
of  eastern  South  Dakota  and  the  more  arid  Great 


Plains  area  of  central  and  western  South  Dakota 
passes  through  Hand  County.  The  area  is  nearly  level 
to  gently  undulating.  The  principal  soils  of  the  test 
site  are  Houdek  and  Bonita,  which  are  in  the  Mollisol 
order,  Ustoll  suborder.  They  are  dark-colored  per- 
meable loams  underlaid  by  slowly  permeable  glacial 
till. 

Hand  County  has  a continental  climate.  Winters 
are  long  and  cold,  and  summers  are  warm.  The 
average  annual  precipitation  is  47  centimeters; 
typically,  33  to  36  centimeters  fall  between  April  and 
September.  The  county  is  subject  to  frequent  weather 
changes,  and  airmasses  that  pass  through  the  area 
bring  a wide  variety  of  temperature  and  moisture 
conditions. 

The  principal  crops  of  Hand  County  are  winter 
and  spring  wheat,  pasture  and  hay,  corn,  barley,  and 
oats.  Most  wheat  is  grown  following  summer  fallow. 

Agriculture  experiment  stations. — Agronomic  ex- 
periments with  wheat  and  other  crops  were  available 
for  study  at  the  agriculture  experiment  stations  at 
Garden  City,  Kansas,  and  Williston,  North  Dakota. 
The  research  farms  are  operated  by  Kansas  State 
University  and  North  Dakota  State  University.  The 
advantages  of  using  experimental  plots  at  the  sta- 
tions were  that  (1)  considerable  amounts  of 
agronomic  data  describing  the  treatments  and  their 
effects  on  the  growth  and  development  of  the  crop 
could  be  readily  obtained,  and  (2)  sources  of 
difference  in  spectral  response  could  be  more  readily 
determined  since  only  the  factors  of  interest  were 
varied  while  other  factors  were  held  constant.  The 
replicated  experiments  were  designed  to  provide  a 
range  of  growing  conditions  typical  of  those  found  in 
the  intensive  test  sites.  The  crops  were  planted, 
grown,  and  harvested  using  conventional  practices 
and  equipment. 

A small-grains  and  a wheat  experiment  were  con- 
ducted at  each  location.  The  treatments  and  experi- 
mental designs  for  the  1977  crop  year  for  each  loca- 
tion are  shown  in  figures  4 and  S. 

Garden  City,  Kansas:  The  objective  of  the  small- 
grains  experiment  was  to  determine  whether  various 
small  grains  can  be  discriminated  from  each  other  on 
the  basis  of  their  spectral  reflectance.  The  experi- 
ment included  four  winter  wheat  varieties  and  one 
variety  each  of  barley,  rye,  and  triticale  (fig.  4). 

The  principal  objective  of  the  dryland  and  irri- 
gated winter  wheat  experiments  was  to  characterize 
crop  spectral  response  as  a function  of  crop  maturity 
and  to  relate  the  spectral  response  to  crop  variables 
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DRYLAND  WINTER 
WHEAT  EXPERIMENT 


SMALL-GRAINS 

EXPERIMENT 


IRRIGATED  WINTER 
WHEAT  EXPERIMENT 


NITROGEN  FERTILIZATION  NITROGEN  FERTILIZATION  - 60  LRfACRE 

NgNONE  SEEDING  RATE  - 40  LB/ACRE 

N,  SO  LB/ACRE 


NITROGEN  FERTILIZATION 
NgNONE 
Nt  60  LB/ACRE 


PLANTING  DATE 
D,  OCT.  1. 1070 
D2OCT.  14,1076 


PLANTING  OATE 
D,  OCT.  1, 1070 
D2  OCT.  14.1978 


EAGLE  VARIETY 
SEEDING  RATE  - 40  LB/ACRE 


EAGLE  VARIETY 

SEEDING  RATE  - 80  LB/ACRE 

PREPLANT  IRRIGATION  - SEPT.  17. 1978 


FIGURE  4. — Remote-sensing  experiments  at  the  Garden  City,  Kansas,  agriculture  experiment  station. 


SPRING  WHEAT  EXPERIMENT 


N 

t 


N1  - 0 kg/ha 
N2  - 44  kg/ha 


Vj  ■ WALDRON  IAWNLESS)  D,  ■ MAY  9, 1977 
V2-  OLAF  (AWNEDI  D2«  MAY  23. 1977 


FIGURE  5. — Remote-sensing  experiments  at  the  W'illiston.  North  Dakota,  agriculture  experiment  station. 


such  as  leaf  area  index  and  biomass  and  to  cultural 
variables  such  as  planting  date,  irrigation,  and 
nitrogen  fertilization.  The  treatments  were  selected 
to  give  a range  of  leaf  area  indexes  and  biomass  at 
any  particular  maturity  stage  or  measurement  time. 

Williston,  North  Dakota:  The  objective  of  the 
small-grains  experiment  was  the  same  as  for  the 


small-grains  experiment  at  Garden  City.  This  trial  in- 
cluded two  varieties  each  of  hard  red  spring  wheat, 
durum  wheat,  oats,  and  barley  (fig.  5). 

The  objective  of  the  spring  wheat  experiment  was 
to  quantify  the  effects  on  spectral  response  of  the 
major  variables  affecting  wheat  growth,  develop- 
ment, and  yield.  The  factors  and  levels  of  each  factor 
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Table  //.— Characteristics  of  the  Multispectral  Scantier 
Systems 


Characteristic 

Landsat -1 
and -2 
MSS 

JSC 

MSS 

/SC 

MMS11 

Spectral  range, jam  .... 

0.5  to  l.l 

0.4  to  13 

0.4  to  1.1 
10  to  12 

Number  of  band*  

4 

24 

11 

Total  field  of  view. 

«!«• 

11.56 

80 

110 

Normal  operational 

altitude,  km  

944 

0.S  to  6.1 

0.S  to  6.1 

Instantaneous  field  of 

view,  m 

79 

1 to  12 

I to  15 

Precision  of  data. 

bits 

5 to6 

8 

8 

*Modultr  rauliiband  scanner 


were  (1)  soil  moisture — wheat  in  1976  and  fallow  in 
1976,  (2)  cultivar— standard  height  and  semidwarf, 
(3)  planting  date— early  and  late,  and  (4)  nitrogen 
fertilization— 0 and  34  kg/ha. 


Sensor  Descriptions 

The  characteristics  of  the  primary  sensors  used  to 
acquire  spectral  data  over  the  intensive  test  sites  and 
agriculture  experiment  stations  are  described  in  this 
section.  The  sensors  used  in  the  intensive  test  sites 
included  Landsat  MSS,  airborne  MSS,  helicopter- 
borne  spectrometer,  and  tripod-mounted  Landsat- 
band  radiometers.  The  sensor  systems  acquiring 
spectral  data  at  the  agriculture  experiment  stations 
were  the  truck-mounted  spectroradiometer  and  in- 
terferometer systems  operated  by  LARS  and  JSC, 
respectively.  General  descriptions  of  the  sensor 
systems  are  given  in  the  following  sections  and  sum- 
marized in  tables  II  and  III. 

Landsat  multispectral  scanner. — Landsat  1 and  2 
MSS  data  were  acquired  at  18-day  intervals.  The  MSS 
data  have  four  spectral  bands  from  0.5  to  1.1 
micrometers.  The  sensor  scans  crosstrack  swaths  of 
185  kilometers.  Computer-compatible  tape  (CCT) 
data  and  imagery  (both  color  and  black  and  white) 
were  requested  for  each  cloud-free  overpass  of  the  in- 
tensive test  sites. 

Airborne  multispectral  scanners. — During  1975,  the 
24-channel  scanner  (ref.  2)  operated  by  JSC  was  the 


Table  III. — Characteristics  of  Hie  Spectrometer 
Systems 


OiarocitHslIc  JSC  FSS"  LARS  JSCFSAf 

Exotnh  JOC 


Spectral  range,  pm  0 4 to  2.5  0.4  to  24  04  to  2.5 

I io  14  3 to  14  3 to  14 

Spectral  raaolution  at 

1.0  MW.  Mm  0.02S  0.02$  0.0064 

Scan  time.  ican/Mc I 0.033  to  2.0  10 

Field  of  view,  dag 22  0.7$.  15  II 

Normal  operational 

altitude,  m 60  6 6 

Data  stoma  formal  Digital  tape  Analog  tape  Digital  tape 

Camera 

Focal  length,  mm 76  $S  $0 

Field  of  view,  deg 36  43  46 

Film  type  70  mm  color  3$  mm  color  35  mm  color 


*F»M  nMCtramttar  imam 
“Field  fi|fliture  acquisition  system. 


primary  scanner  system;  during  1976  and  1977,  the 
11 -channel  modular  multiband  scanner  (MMS)  was 
used.2  Color  and  color-infrared  photography  was  ob- 
tained during  the  scanner  flights  to  be  used  as 
reference  data  by  analysts. 

Helicopter-borne  field  spectrometer  system.— The 
helicopter-borne  field  spectrometer  system  (FSS)  is  a 
Filter  wheel  spectrometer  instrument  that  is  a 
modification  of  the  S-191  sensor  used  in  the  Skylab 
Earth  Resources  Experiment  Package  (EREP)  (ref. 
3).  The  FSS  has  been  modified  by  NASA  for  mount- 
ing on  a helicopter  (fig.  6).  The  instrument  produces 
data  in  14-track  digital  format  that  are  converted  to 
CCT's  for  subsequent  reformatting  and  analysis. 

The  spectral  range  of  the  spectrometer  is  0.42  to 
2.50  and  8.0  to  14.0  micrometers.  The  field  of  view  is 
22°,  which  gives  a spot  size  of  24  meters  diameter 
from  a 60-meter  altitude.  The  helicopter  flies  at  100 
km/hr.  The  camera  has  a 76-millimeter  focal  length 
and  a 36s  field  of  view,  giving  40  meters  square 
ground  coverage. 

Truck-mounted  spectrometer  systems. — The  Ex- 
otech Model  20C  field  spectrometer  operated  by 
LARS  acquires  spectral  data  over  the  visible,  reflec- 
tive infrared,  and  thermal  infrared  wavelength 
regions  (ref.  4).  The  instrument  consists  of  two  inde- 


2,1  Modular  Mulliband  Scanner  (MMS)."  JSC  Internal  Note 
No.  74-FM-47,  NASA  Johnson  Space  Center.  Houston.  Tex., 
1974. 
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U.l  Kt  ft.— Utlliopur  spvtiroitit'li'r  syvlnn  opera  ltd  hr  the 
'S  ' ,"^l,'on  sl14"'  ( i acquittal:  mca<rurement>  nf  the 
calibration  'lam'ard  I hr  speelroim  lt  r is  located  just  Mow  the 
»mdo»  I he  .alibralion  standard  is  a speciall)  rnatrd  cant  as 
pane!  6.5  b>  1 1 meters  in  si/e  mounted  on  a Hat  let  el  platform 


penitently  functioning  units  The  short-wavelength 
unit  senses  radiation  from  0.38  to  2.4  micrometers 
and  the  long- wavelength  unit  senses  radiation  from 
- lo  5 4 and  ’0  to  13.5  micrometers  The  short- 
wavelength  unit  is  equipped  wuh  a translucent 
dillusing  plate,  which  is  used  to  monitor  incident 
spectral  irradiance  hitch  optical  head  has  a reflective 
lore-optic  system  that  permits  remote  selection  of 
the  field  of  view  (0  75°  or  15V 

The  instrument  is  mounted  on  a mobile  aerial 
tower  that  operates  with  an  instrumentation  van  con- 
taining the  control  electronics  and  data  recorder  for 
the  system  (tig  ) The  data  produced  by  the  instru- 
ment are  recorded  on  an  analog  magnetic  tape 
recorder  and  later  converted  into  digital  information 
by  a laboratory  analog-to-digital  converter.  Calibra- 
tion sources  designed  for  field  use  are  used  to  cali- 
brate the  spectrometer  onsite  Calibrated  spectral 
data  and  field  observations  are  combined  on  digital 
magnetic  tapes  during  the  data  reformatting  process 
the  Bloch  wideband  interferometer  t field  Sig- 
nature  Uquismon  Si  stem  or  I s ys»  operated  b\  JSC 
acquires  spectral  data  over  the  visible  and  infrared 
portions  of  the  spectrum  (ref  >)  The  instrument 
scat",  the  spectrum  rapidly  enough  to  account  for  en- 
vironmental variables  ami  is  equipped  with  a self- 
contained  computer  svstem  that  yields  spectral  data 
Irom  the  interferograms  produced  bv  the  instru- 
ment I he  instrument  control  electronics  and  com- 
puter are  mounted  in  an  instrument  van.  and  the  op- 
tical head  of  the  instrument  is  mounted  on  a mobile 
aiiia!  tower  1 he  spectral  data  (expressed  as  wave 


numbers)  produced  by  the  instrument  were  pro- 
cessed by  JSC  to  provide  CCTs  of  spectral  reflec- 
tance I actor  calibrated  with  respect  to  wavelength 

Lanthai-banJ  rmlnmt’lers.— four-band  radi- 
ometers (Exolech  Model  1 00)  w ith  the  same  spectral 
bands  as  the  I andsat  MSS  were  operated  bv  Purdue 
University  and  Texas  A & M University  to  acquire 
measurements  in  selected  fields  at  the  finney  Coun- 
ty and  Williams  l ounty  test  sites  to  support  canopy 
modeling  studies  In  addition,  during  |d77,  measure- 
ments were  made  throughout  the  growing  season  of 
the  plots  at  the  Williston  experiment  station,  using  a 
radiometer  mounted  on  a lightweight  van  These 
measurements,  made  at  hourly  .ntervals.  are  being 
used  to  investigate  the  diurnal  variation  in  reflec- 
tance of  w heat 

\fetcoroh.vnal  tin, I atmospheric  sensor s —Standard 
meteorological  instrumentation  was  used  to  obtain 
measurements  of  temperature,  humidity,  windspeed 
and  wind  direction,  barometric  pressure,  and  total  ir- 
radiance Solar  radiometers  were  used  to  obtain  opti- 
cal  depth  measurements  in  six  visible  and  infrared 
bands  during  l andsat  overpasses  and  during  aircraft 
and  helicopter  missions 


Sensor  Calibration  and  Correlation 

A key  objective  of  the  L AC  II  f ield  Measure- 
ments Project  was  the  acquisition  of  calibrated 
multispectr.il  data.  Calibrated  data  are  required  to  1 1 ) 
facilitate  comparisons  of  data  from  different  sensors 
and  (3)  compare  and  relate  spectral  measurements 
made  at  one  time  and  location  to  those  made  at  other 
times  and  locations. 


I It.t  R I ' —Held  spordiirudiniiK'Irr  wsUin  operand  h>  I \Rs 

niaktaK  speelr*!  imawm  oients  nf  vmull  trains  pint-. 
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To  have  comparable  data,  scene  reflectance  was 
chosen  as  the  measured  property  rather  than  scene 
radiance.  Scene  reflectance  is  a property  only  of  the 
scene,  whereas  scene  radiance  is  a property  of  the  il- 
iumination  also?  Calibration  largely  removes  the 
effects  of  varying  illumination  and  measurement 
conditions  because  of  changing  Sun  angle,  at- 
mospheric conditions,  and  sensor.  The  bidirectional 
reflectance  distribution  function  gives  the  most  com- 
plete description  of  the  reflectance  characteristics  of 
a surface.  However,  because  this  property  is  difficult 
to  measure,  more,  common  use  is  made  of  the  reflec- 
tance factoi. 

Reflectance  factor  is  defined  as  the  ratio  of  inci- 
dent radiant  flux  reflected  by  a sample  surface  to  that 
which  would  be  reflected  into  the  same  reflected 
beam  geometry  by  a perfectly  diffuse  (Lambertian) 
standard  surface  identically  irradiated  and  viewed 
(ref.  6).  Because  the  principal  component  of  the  irra- 
diance  is  direct  solar  irradiance  and  the  measurement 
is  made  in  a relatively  small  cone  angle  (1S°  to  20s), 
the  term  “bidirectional  reflectance  factor"  is  used  to 
describe  the  measurement.  One  of  the  directions  is 
specified  by  the  solar  zenith  and  azimuth  angles;  the 
other  is  specified  by  the  zenith  and  azimuth  viewing 
angles. 

Because  no  perfectly  reflecting  diffuser  is  availa- 
ble, painted  barium  sulfate  (BaS04)  reference  sur- 
faces, which  are  highly  diffuse,  were  used  (fef.  7). 
The  spectral  bidirectional  reflectance  factor  of  these 
surfaces  was  measured  in  both  the  laboratory  and  the 
field  by  processes  that  are  traceable  to  the  reflectance 
of  pressed  barium  sulfate  (fig.  8).  A correction  using 
the  published  reflectance  of  the  pressed  barium  sul- 
fate enables  the  computation  of  an  approximation  of 
the  bidirectional  reflectance  factor. 

Because  of  the  presence  of  skylight,  the  measure- 
ment is  not  strictly  bidirectional.  The  process  of 
eliminating  skylight  by  subtracting  the  spectral 
response  of  the  shadowed  scene  and  shadowed  stan- 
dard has  merit  in  that  it  could  remove  the  effects  of 
the  skylight.  However,  the  additional  measurements 
and  calculations  add  uncertainty  to  each  computed 
reflectaice.  This  uncertainty  is  greater  than  the 
effect  itself  (ref.  8).  Furthermore,  because  the  in- 
terest of  the  project  was  in  producing  data  directly 
reiatabie  to  satellite  data,  which  includes  the  effects 
of  the  skylight,  the  single  comparison  method  was 
used.  Because  the  dominant  effects  are  due  to  the 
directional  nature  of  the  irradiance,  the  term 
“bidirectional  reflectance  factor"  is  appropriate  to 
describe  the  measurements. 
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Calibration  of  truck-mounted  spectrometer 
systems.— -Temperature  variations,  dust,  vibration, 
and  other  adverse  factors  associated  with  field 
measurements  require  that  calibration  be  performed 
at  the  field  site.  The  procedures  chosen  reflect  the 
availability  of  suitable  standards  and  the  principle 
that  the  calibration  measurements  be  obtained  under 
the  same  conditions  as  the  target  measurements. 

The  short-wavelength  unit  was  calibrated  for 
spectral  reflectance  factor.  A standard  based  on  the 
highly  reflecting  properties  of  barium  sulfate  was 
used  as  a basis  for  the  reflectance  factor  calibration. 
The  standards  were  prepared  according  to  pro- 
cedures described  by  Shai  and  Schutt  (ref.  7). 

The  painted  barium  sulfate  field  standard  was 
used  to  fill  the  field  of  view  of  the  instrument  under 
nearly  the  same  conditions  as  for  the  measurement 
of  plots.  For  the  simplest  calibration,  the  response  to 
the  standard,  the  response  to  the  scene,  the  full-dark 
response  (automatically  provided  during  each 
spectral  scan),  and  the  spectral  reflectance  properties 
of  the  standard  are  used  to  compute  the  bidirectional 
reflectance  factor.  Since  it  is  inconvenient  to  make 
this  direct  comparison  for  each  measurement,  the 
solar  port  is  frequently  used  to  transfer  the  reflec- 
tance standard  for  the  LARS  Exotech  20C  system. 

The  calibration  calculation  consists  of  forming  the 
ratio  of  the  instrument  response  for  the  target  to  that 
for  the  reflectance  standard  and  correcting  for  the 
known  reflectance  of  the  standard.  This  procedure 
produces  a reflectance  factor  for  the  given  Sun  angle 
and  normal  viewing  of  the  target. 

During  the  calibration  observations,  the  instru- 
ment was  aimed  straight  down  at  the  reflectance 
standard  from  a distance  of  2.4  meters  for  the  Ex- 
otech 20C  system  and  1 meter  for  the  FSAS  system. 
Care  was  taken  to  ensure  that  the  standards  were  not 
shadowed  and  that  the  illumination  conditions  were 
as  similar  as  possible  to  the  conditions  of  the  obser- 
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vation  of  the  subject.  Calibration  observations  were 
performed  at  approximately  15-minute  intervals. 

Wavelength  calibration  of  the  reflective 
wavelength  unit  was  accomplished  by  irradiating  the 
solar  port  with  sources  having  known  spectral  lines 
(ref.  9).  The  primary  sources  are  the  General  Electric 
AI00.H4T  mercury  vapor  lamp  and  the  helium 
Pluecker  tube.  A field  wavelength  calibrator  based  on 
the  helium  tube  was  chosen  for  use  because  it  has  at 
least  one  strong  line  in  the  range  of  each  section  of 
the  circular  variable  niters. 

Calibration  of  the  helicopter-borne  FSS. — The  heli- 
copter-borne spectrometer  was  calibrated  using  a 60- 
percent  reflectance  canvas  panel  and  the  measure- 
ments made  by  the  truck-mounted  spectrometer  of 
the  canvas  panel.  These  in  turn  were  related  to  the 
measurements  of  the  barium  sulfate  painted  panels 
and  the  pressed  barium  sulfate  standard. 

The  calibration  procedure  used  deals  with  limita- 
tions imposed  by  the  size  and  location  of  the  stan- 
dard by  calibrating  the  instrument  at  a low  altitude  (6 
meters)  and  collecting  data  over  the  flightlines  at  60 
meters.  This  procedure  assumes  that  atmospheric  ab- 
sorption and  path  radiance  are  negligible  for  a 60- 
meter  path. 

The  absence  of  an  onboard  solar  sensor  integrated 
into  the  instrument  makes  it  desirable  that  calibra- 
tions be  performed  as  frequently  as  possible. 
Therefore,  the  reflectance  panels  were  centrally  lo- 
cated and  procedures  were  followed  which  allowed 
calibration  within  15  minutes  of  any  data  acquisition 
(beginning  of  each  flightline  of  data  collection). 

The  data  processing  facility  converts  the  FSS  data 
to  bidirectional  reflectance  factor  based  on  the 
measurements  made  of  the  barium  sulfate  standard 
and  the  canvas  panel.  The  calibration  calculation 
consists  of  forming  the  ratio  of  the  FSS  response  for 
the  target  to  that  for  the  canvas  standard  and  correct- 
ing for  the  measured  reflectance  of  the  canvas  stan- 
dard. This  procedure  produces  a reflectance  factor 
for  the  given  solar  illumination  angle  and  normal 
viewing  of  the  subject. 

Field  calibration  of  the  FSS  with  respect  to 
emissive  radiation  was  accomplished  by  recording 
spectral  observations  of  a blackbody  at  a temperature 
below  ambient  and  another  blackbody  at  a tem- 
perature above  ambient.  The  subsequent  scans  of 
subject  scenes  were  converted  to  spectral  radiance 
using  linear  interpolation. 

Calibration  of  airborne  MSS  data. — The  reflective 
data  from  the  airborne  MSS  can  be  calibrated  to 
reflectance  using  the  five  gray  canvas  panels  located 


at  the  site  and  the  spectral  bidirectional  reflectance 
factor  measurements  made  by  the  truck-mounted 
spectrometers  over  the  canvas  panels.  The  nominal 
reflectances  of  the  panels  are  6, 12, 18,  30,  and  60 
percent. 

The  gray  panel  reflectance  factor  and  MSS 
response  data  collected  at  low  altitude  (500  meters 
above  the  panels)  can  be  related  through  linear 
regression.  The  regression  equation  can  then  be  used 
to  transform  the  low-altitude  airborne  MSS  data  to 
bidirectional  reflectance  factor.  Fields  overflown  at 
the  lower  altitude  can,  in  turn,  be  used  as  calibration 
targets  to  transform  higher  altitude  data  to  bidirec- 
tional reflectance  factor. 

The  emissive  MSS  data  can  be  calibrated  by  means 
of  the  two  blackbodies  at  known  temperatures  lo- 
cated in  the  scanner  and  viewed  with  each  scan  of 
the  scene. 

Sensor  correlation  procedures.— The  three  major 
sensor  systems— the  truck-mounted  spectrometers, 
the  helicopter-borne  spectrometer,  and  the  aircraft 
MSS— can  be  correlated  using  the  spectral  data  col- 
lected by  each  system  over  common  targets;  i.e.,  five 
6-  by  12  meter  gt  y canvas  panels  (fig.  9).  The 
aircraft  scanner  collected  data  over  the  panels  during 
each  mission.  The  helicopter  and  truck  spectrometer 
systems  measured  the  reflectance  of  the  four  darker 
gray  panels  during  correlation  experiments  per- 
formed during  each  crop  year.  The  calibration 
measurements  made  of  the  brightest  canvas  calibra- 
tion panel  by  the  helicopter  and  truck  spectrometer 
systems  were  also  used  in  correlating  the  sensors. 

All  the  spectrometers  were  brought  together  in 
1977  for  complete  calibration  and  correlation.  This 
included  measurement  of  common  targets  and 
reflectance  standards  (fig.  10),  comparison  of  data 
collection  procedures,  and  evaluation  of  instrument 
performance. 


Data  Acquisition 

The  collection  of  multispectral  remote-sensing, 
agronomic,  and  meteorological  data  for  the  intensive 
test  sites  and  agriculture  experiment  stations  is  de- 
scribed in  this  section. 

Intensive  test  sites.— This  section  discusses  spectral 
data  collection  procedures  and  the  measurements 
and  observations  of  crop,  soil,  and  meteorological 
parameters  in  the  intensive  test  sites. 

Spectral  data  collection:  Helicopter-borne  FSS.  air- 
borne MSS,  and  tripod-mounted  radiometer  data 
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H(,|  Kt  Canxa*  grax-lvxrl  panels  used  for  correlation  of 
spi'rlromdri  s>  stems  and  calibration  of  aircraft  scanner  data. 


lit 


I If  i l KK  III. — 1 rink  mini  till'd  spectrometer  s>  xlems  operated  b» 
I XKs  and  JS(  preparing  to  measure  the  reflectance  of  the  can- 
sax  calibration  panel  in  a sensor  correlation  experiment.  The 
common  target  *>as  measured  simultanenuslx  b»  these  st  stems, 
as  well  as  b»  the  helicopter  spectrometer,  to  relate  the  instru- 
ment responsii  Hies. 


were  collected  within  .1  3-  or  4-da>  mission  window 
Whenever  possible,  data  were  obtained  on  the  same 
day  and  lime  as  ihe  L undsat  overpasses. 

The  helicopter  spectrometer  data  were  obtained 
under  stable  atmospheric  conditions  with  20  percent 
or  less  cloud  cover  at  solar  elevation  angles  greater 
than  30  At  ihe  test  site,  six  4 7-hilometer  flightlincs 
were  flown  by  the  helicopter  in  three  sets  of  two 
lines  l lightlmes  were  flown  at  an  altitude  of  Ml 
motets,  ai  100  km/hr  groundspeed,  and  in  an  east- 
west  direction  Reference  panel  calibration  measure- 
ments were  made  from  a b-meter  altitude  im- 
mediately before  fly  ing  each  set  of  two  flightlines 
Correlation  of  spectra  and  fields  was  accomplished 


using  simultaneously  acquired  70-millimctcr  color 
photography  A total  incidence  pyranometer  was  lo- 
cated at  the  helicopter  calibration  site  to  provide  a 
strip-chart  record  ol  the  irradinnee  conditions  on  the 
day  of  data  acquisition  (usually  beginning  I hour 
before  and  ending  I hour  alter  the  data  acquisition 
period)  These  strip  charts  provide  the  data  analyst 
with  a visual  record  of  the  irradiance  conditions  at 
the  site  during  helicopter  and  MSS  data  acquisition 

The  airborne  scanner  system  acquired  data  o'er 
the  intensive  test  sites  and  agriculture  experiment 
stations  concurrently  w ith  data  collection  by  the  heli- 
copter spectrometer  The  intensive  test  sites  were 
overflown  at  3300-  and  7000-nietcr  altitudes  and  the 
experiment  stations  and  calibration  panels  at  500 
meters.  Collecting  data  at  the  two  altitudes  over  the 
test  site  flightlines  provided  different  spatial  resolu- 
tions and  different  amounts  of  atmosphere  between 
the  scene  and  sensor  Data  collection  requirements 
specified  that  cloud  cover  be  less  than  30  percent  and 
solar  elevation  greater  than  30°.  Color  and  color-in- 
frared photography  (23  centimeters)  was  obtained 
simultaneously  with  the  scanner  data 

A Landsat-band  radiometer  mounted  on  a 2-meter 
tripod  was  used  10  collect  data  from  one  to  three 
fields  in  the  Finney  County  and  Williams  County 
test  sites.  The  measurements  were  made  at  lour 
limes  during  the  day  to  provide  four  different  Sun 
angles.  A painted  barium  sulfate  field  standard  was 
measured  between  the  measurements  of  the  canopy. 
Ihe  spectral  measurements  include  wheat  canopy 
reflectance,  soil  reflectance,  the  ratio  of  diffuse  to 
total  irradiance.  and  leaf  transmittance.  Canopy 
description  daia  include  leal  urea  index,  biomass, 
number  ol  tillers  and  leaves,  and  photographs  The 
photographs  include  vertical  and  45"  views  and  plant 
profile  scenes  When  possible,  these  data  were  ac- 
quired at  five  maturity  stages  (seedling,  tillering, 
jointing,  (lowering,  and  ripe)  at  several  locations  in 
typical  fields 

Agronomic  data  collection  Agronomic  measure- 
ments and  observations  were  acquired  describing  ihe 
condition  ol  each  ol  the  fields  for  which  spectral  data 
were  collected  These  agronomic  data  describe  the 
condition  ol  each  field  as  fully  as  possible  and  are 
used  to  account  for  differences  in  the  spectral 
measurements  The  data  were  recorded  on  standard 
forms,  keypunched,  and  transmitted  to  LARS  for  in- 
clusion in  ihe  data  bank  Data  describing  all  fields  in 
the  intensive  test  sites  were  collected  by  I'SDA 
A SC'S  (ret  10)  The  following  duta  were  collected 
during  the  spring  and  fall  inventories  field  number. 
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acreage,  crop  species  and  variety,  irrigation,  fertiliza- 
tion, planting  date,  and  other  descriptive  informa- 
tion. 

Periodic  observations  coinciding  with  landaat 
overpasses  and  aircraft/helicopter  missions  were 
matte  to  describe  the  condition  of  the  Helds.  The 
variables  observed  were  maturity  stage,  percent  soil 
cover,  plant  height,  surface  moisture  condition,  stand 
quality,  quality  relative  to  other  fields  in  the  site, 
field  operations,  density  of  stand,  weed  infestation, 
and  growth/yield  detractants.  Vertical  35-millimeter 
photographs  were  taken,  and  additional  descriptive 
comments  were  added  as  appropriate.  Grain  yields  of 
selected  fields  were  measured  at  harvest  time. 

Meteorological  data  collection:  The  following  at- 
mospheric and  meteorological  measurements  were 
made  in  conjunction  with  FSS  and  aircraft  scanner 
data  collection  at  the  intensive  test  sites:  percentage 
and  type  of  cloud  cover,  wet  and  dry  bulb  tem- 
perature, barometric  pressure,  total  irradiance, 
windspeed  and  wind  direction,  and  optical  depth  at 
seven  visible  and  near-infrcred  wavelengths.  Daily 
measurement  records  of  temperature,  precipitation, 
relative  humidity,  soil  temperature,  and  wind  were 
obtained  from  the  nearest  weather  station.  In  addi- 
tion. rainfall  was  recorded  at  six  to  eight  locations  in 
each  test  site. 

Agriculture  experiment  stations.— The  collection  of 
spectral,  agronomic,  and  meteorological  date  at  the 
agriculture  experiment  stations  is  described  in  this 
section. 

Spectral  data  collection:  The  spectral  date  at  the 
agriculture  experiment  stations  wc'C  collected  by 
JSC  at  Garden  City,  Kansas,  and  by  LARS  at 
Willision.  North  Dakota.  The  primary  sensors  were 
the  Block  wideband  field  interferometer  and  the  Ex- 
otech Model  20C  field  spectroradiomeier.  During 
1975,  an  Exotech  Model  20D  similar  to  the  Model 
20C  was  operated  by  the  NASA  Earth  Resources 
Laboratory  at  Garden  City.  These  were  augmented 
by  Barnes  PRT-5  precision  radiation  thermometers 
boresighted  with  the  spectrometers.  To  obtain  data 
that  could  be  readily  compared,  the  interferometer 
and  spectroradiomeier  were  operated  following  simi- 
lar procedures.  The  instruments  were  operated  from 
their  aerial  towers  at  6 meters  above  the  target  to 
minimize  the  shadowing  of  skylight  and  yet  ensure 
that  the  field  of  view  of  the  instrument  contained 
only  the  subject  of  interest.  Care  was  taken  to  avoid 
scene  shadowing  and  to  minimize  the  reflective  in- 
teraction caused  by  personnel  or  vehicles.  The 
routine  data-taking  mode  of  the  instruments  is 


straight  down.  Two  measurements  of  each  plot  ware 
matte  by  moving  the  sensor  so  that  a new  scene  with- 
in the  plot  filled  the  field  of  view. 

To  minimize  the  effect  of  solar  elevation  changes 
on  the  spectral  response,  measurements  were  made 
only  when  the  Sun  angle  was  greater  than  45°  above 
the  horizon  in  the  late  spring  and  summer  and 
greater  than  30°  in  the  late  fall  and  early  spring. 

Data  recorded  at  the  time  of  each  measurement 
included  date,  time,  reference  illumination,  air  tem- 
perature. barometric  pressure,  relative  humidity, 
windspeed  and  wind  direction,  percentage  and  type 
of  cloud  cover,  field  of  view,  latitude,  longitude,  and 
zenith  and  azimuth  view  angles.  Periodically  during 
the  day,  spectral  measurements  of  skylight  were 
recorded  by  spectrometers  with  a solar  port.  A 35- 
millimeter  color  photograph  of  each  observation  was 
taken  from  the  aerial  tower,  as  were  oblique  ground- 
level  photographs  of  each  plot. 

Agronomic  data  collection:  Crop  and  soil  infor- 
mation for  the  plots  at  the  research  stations  were  col- 
lected at  Gaiuen  City  by  JSC  with  assistance  from 
the  agriculture  experiment  station  personnel  and  at 
Williston  by  LARS.  At  the  beginning  of  the  season, 
information  describing  the  species  and  cultivar,  ir- 
rigation practices,  fertilization  history,  soil  type,  and 
planting  date  was  obtained  for  each  plot. 

Observations  made  at  the  time  of  each  mission  for 
each  plot  included  maturity  stage;  plant  height;  per- 
cent soil  cover;  surface  soil  moisture  and  roughness; 
stand  quality;  field  operations  such  as  cultivation  or 
harvesting;  stress  factors  (insect  damage,  disease, 
nutrient  deficiencies,  moisture  stress,  weeds,  or  lodg- 
ing); leaf  area  index;  number  of  stems,  leaves,  and 
heads;  fresh  weight  of  plants;  dry  weights  of  stems, 
leaves,  and  heads;  and  soil  moisture  profile.  Vertical 
and  oblique  35-millimeter  color  photographs  were 
taken,  and  grain  yields  were  measured  at  harvest 
time. 

Meteorological  data  collection:  Percentage  and 
type  of  cloud  cover,  wet  and  dry  bulb  temperature 
(or  relative  humidity),  barometric  pressure,  total  ir- 
radiance, and  windspeed  and  wind  direction  were 
measured  in  conjunction  with  the  truck-mounted 
spectrometer  data  collection.  Daily  measurement 
records  of  air  temperature,  humidity,  radiation, 
wind,  precipitation,  and  soil  temperature  were  also 
obtained  from  the  nearest  weather  station. 

Summary  of  data  acquisition  missions.— Table  IV 
summarizes  the  data  acquisition  for  the  1976  crop 
year  at  Finney  County.  Kansas,  and  Williams  Coun- 
ty, North  Dakota,  for  the  major  sensors  involved  in 
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(he  experiment.  In  each  year,  as  in  1976,  an  effort 
was  made  to  obtain  data  at  each  of  the  important 
growth  stages  of  wheat  at  each  level  of  the  sampling 
scheme,  from  controlled  experimental  plot  to  Land* 
sat  scene.  Whenever  possible,  helicopter  spectra  and 
aircraft  scanner  data  were  gathered  near  the  time  of  a 
Landsat  overpass.  A complete  schedule  of  acquired 
data  for  each  location  is  given  in  the  data  library 
catalogs  discussed  in  the  following  section. 


Data  Prooaaabtg,  Library, 
and  Analysis  tyatama 

An  important  aspect  of  the  project  was  to  prepare 
the  data  for  later  analysis  according  to  uniform  for- 
mats and  to  register  the  agronomic,  meteorological, 
and  measurement  data  with  the  spectral  data. 
Following  processing,  data  were  cataloged  in  the  data 
library  and  distributed  to  interested  researchers.  Soft- 
ware for  interactive  plotting  and  analysis  of  trie  data 
has  also  been  developed. 

Data  processing.— Before  computer  processing, 
spectrometer  data  were  evaluated  manually  using 
strip  charts  of  raw  data,  photographs,  records  of 
system  parameters,  and  strip  charts  of  irradiance 
conditions.  Computer  processing  of  spectrometer 
data  included  calibration,  data-quality  evaluation, 
reformatting,  and  storage.  In  order  to  compare  data 
from  the  different  spectrometers,  the  spectrometer 
data  were  processed  in  a standardized  format  with 
bandwidths  of  0.01  micrometer  from  0.4  to  2.4 
micrometers  and  of  0.05  micrometer  from  2.7  to  14 
micrometers.  Spectrometer  data  were  merged  with 
ancillary  data  for  storage  on  nine-track  computer- 
compatible  tapes. 

Aircraft  scanner  data  were  first  converted  to 
visicorder  imagery  for  manual  evaluation  of  data 
quality.  This  imagery  is  a i 3-centimeter  wide, 
medium-contrast  paper  strip  record  of  the  data  for 
individual  channels.  Aircraft  scanner  data  were  also 
subjected  to  computerized  examination  to  validate 
the  performance  of  the  sensors  and  the  data-record- 
ing  system  In  addition,  a strip  chart  of  total  irra- 
diance was  used  to  verify  the  irradiance  conditions 
during  the  overflight.  The  scanner  data  were  then 
processed  to  nine-track  computer-compatible  tapes 
in  LARSYS  format. 

Landsat  MSS  data  were  previewed  from  black- 
and-white  transparencies  of  the  image  for  each  band 
to  establish  data  quality  and  cloud  cover  conditions 
within  the  intensive  test  sites  and  the  complete 
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Landsat  frames.  Following  data-quality  evaluation, 
Landsat  MSS  data  were  processed  to  nine-track  com- 
puter-compatible tapes. 

Data  library  and  distribution — The  multispectral 
data  library  maintained  at  LARS  for  the  LAC1E 
Field  Measurements  Project  contains  over  100000 
spectra  (corresponding  to  measurements  of  over  800 
plots  and  fields)  and  over  2000  observations  made 
with  Landsat-band  radiometers  (ref.  11).  The  library 
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also  include!  seven!  hundred  scenes  of  aircraft-  end 
satellite-acquired  scanner  data. 

A data  library  catalog  was  prepared  for  each  crap 
year  containing  summary  and  detailed  schedules  erf1 
data  acquisition  by  location,  sensor  system,  end  mis- 
sion. Digital  data  products  available  foAntlysU  in- 
clude Landsat  and  airborne  scanner  data,  helicopter* 
and  truck -spectrometer  spectra  and  ancillary  data, 
and  tripod-radiometer  spectra  and  ancillary  data  (fig. 
II).  Aerial  and  ground-level  photography  acquired 
concurrently  with  spectrometer  and  scanner  data  is 
•Iso  available. 

Data  have  been  routinely  distributed  to 
researchers  at  EMM,  LARS,  and  Texas  A A M in 
conjunction  with  the  Supporting  Research  program 
sponsored  by  JSC.  In  addition,  data  have  been  pro- 
vided to  the  NASA  Goddard  Space  Flight  Center, 
Goddard  Institute  for  Space  Studies,  and  General 
Electric  Corporation.  Copies  of  data  sets  are  pro* 
vided  to  qualified,  interested  investigators  for  the 
cost  of  reproducing  the  dau.  Requests  for  data 
snould  be  addressed  to  Chief,  Earth  Observations 
Division,  Mail  Code  SF.  NASA  Johnson  Space 
Center.  Houston,  TX  77058. 

Data  analysh  syttems, — LARSYS  (Version  3.1)  is 
a fully  documented  software  system  designed  to  pro- 
vide the  tools  for  analysts  of  MSS  dau  (ref.  12).  The 
pattern  recognition  and  interactive  dau  handling 
techniques  in  LARSYS  have  been  used  worldwide 
for  analysis  of  aircraft  and  Landsat  scanner  data  in 
many  applications. 

EXOSYS  is  a specialized  software  system 
developed  at  LARS  for  analysis  of  spectrometer 
data.  It  provides  researchers  with  the  capability  to 
sort  spectrometer  dau  by  combinations  of  measure- 
ment (e.g..  solar  elevation)  end  ancillary  variables 
(e4-,  leaf  area  index).  Analysis  features  of  EXOSYS 
include  the  ability  to  compute  functions  of  band- 
averaged  reflectances,  perform  correlations  with 
crop  parameters,  and  fit  polynomial  curves  to  the 
dau  using  a least  squares  technique.  Initial  results  are 
reviewed  and  then  sent  to  a line  printer  or  a graphics 
plotter. 


RESULTS  OF  SELECTED  ANALYSE  OF 
FIELD  MEASUREMENTS  DATA 

To  realize  the  full  potential  of  remote  sensing  for 
crop  identification,  condition  assessment,  and  yield 
prediction,  it  is  important  to  understand  and  quan* 


FIGURE  II.— <>.*■* low t««  Ml  ten  pratert*  of  tfct  LACIE  flcM 
iMtwdMMi  tela  library. 


tify  the  relation  between  agronomic  and  spectral 
characteristics  of  crops.  Equally  important  is  the 
development  of  improved  capabilities  for  accurately 
measuring  the  spectral,  spatial,  and  temporal  varia- 
tions of  agricultural  scenes  and  for  extracting 
meaningful  information  from  these  date.  The 
LACIE  field  measurements  dau  make  important 
contributions  in  both  of  these  areas.  The  dsu  are  par- 
ticularly useful  because  many  measurements  and  ob- 
servations of  the  crops  and  their  agronomic  charac- 
teristics were  recorded  throughout  several  seasons. 
Furthermore,  complete  spectra  permit  simulation  of 
the  response  in  any  specified  wavelength  band,  and 
the  radiometric  calibration  of  the  dau  permits  valid 
comparisons  to  be  made  among  different  sensors  and 
different  dates  and  locations  of  dau  collection. 

Many  of  the  factors  affecting  the  reflectance  prop- 
erties of  plant  leaves  have  been  identified  and  in- 
vestigated through  laboratory  measurements.  The 
relationships  of  physical-biological  parameters,  Mich 
as  chlorophyll  concentration,  water  content,  and  leaf 
morphology,  to  the  reflectance,  transmittance,  and 
absorption  of  leaves  have  been  well  established. 
Some  of  the  papers  and  reviews  describing  these  rela- 
tionships include  Gates  et  si.  (ref.  13).  Breccc  and 
Holmes  (ref.  14),  Gausman  et  al.  (ref.  IS),  and 
Sinclair  et  al.  (ref.  16). 

Although  knowledge  of  the  reflectance  charac- 
teristics of  single  leaves  is  basic  to  understanding  the 
reflecuncc  properties  of  crop  canopies  in  the  field. 
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this  information  cannot  be  allied  directly  to  the 
field  situation  because  there  are  ignificant 
differencea  between  the  spectra  of  single  leaves  and 
the  spectra  of  canopies.  The  reflectance  charac- 
teristics of  canopies  are  considerably  more  complex 
than  those  of  tingle  leaves  because  in  there 

are  many  more  interactini  variables.  Some  of  the 
more  important  agronomic  parameters  influencing 
the  reflectance  of  field-grown  canopies  are  leaf  area 
index,  biomass,  leaf  angle  distribution,  leaf  color, 
percent  soil  cover,  and  soil  color.  Differences  in 
these  parameters  are  caused  by  variations  in  many 
cultural  and  environmental  factors,  eluding  plant- 
ing date,  cultivar,  seeding  rate,  fertilization,  soil 
moisture,  and  temperature. 

The  data  analysis  phase  of  the  LACIE  Add 
research  be»xn  in  1976  by  addressing  several  of  the 
LACIE  critical  issues,  particularly  the  discrimination 
of  wheat  affd  small  grains.  More  recently,  the 
analyses  have  been  extended  to  address  objectives 
related  to  future  crop  inventory  systems,  such  as  the 
use  of  remote  sensing  to  gather  information  about 
crop  condition  and  yield.  Because  it  would  not  be 
possible  to  describe  adequately  the  results  of  ail  the 
Investigations,  several  studies  that  are  representative 
of  the  types  of  investigations  that  have  used  field 
measurements  date  have  bem  selected  for  this 
report. 

Several  of  the  other  studies  that  have  used  the 
LACIE  field  measurements  data  are  briefly  sum- 
marized here.  Lsndgrebe  et  al.  (ref.  17)  used  the 
spectrometer  anJ  MSS  data  to  simulate  and  evaluate 
alternative  comr'inations  of  scanner  system 
parameters  for  the  tnematic  mapper,  such  as  the  in- 
stantaneous field  of  view  and  signal -to-noise  ratio.  A 
comparison  of  the  Landeat  MSS  and  thematic  map- 
per wavelength  bands  for  crop  identification  has 
been  made  by  Beuer  et  el.  (ref.  8).  The  thermal 
measurements  from  the  helicopter  spectrometer 
have  been  examined  by  Harlan  et  al.  (ref.  18),  and 
Bauer  et  al.  (ref.  8)  developed  a model  describing  the 
radiant  temperature  characteristics  of  spring  wheal 
canopies  in  relation  to  the  geometry  of  the  canopy 
and  environmental  variables.  At  part  of  the  LACIE 
field  measurements  research,  Vanderbilt  et  al.  (ref. 
19)  developed  a method  to  obtain  information  on  the 
geometrical  properties  of  crop  canopies  needed  for 
canopy  reflectance  and  radiant  temperature  models. 
Berry  and  Smith  (ref.  20)  used  LACIE  field  measure- 
menu  dau  te  test  a canopy  reflectance  model  to  pre- 
dict the  spectral  response  in  the  Landsat  MSS 
wavelength  bands  of  winter  wheat  with  varying 


amounts  of  leaf  area  and  as  a fitnetton  of  Sun  angle. 
A nondestructive  method  to  estimate  leaf  area  index 
involving  analysis  of  digitized  aerial  photographs 
was  developed  by  Hartan  et  al.  (ref.  18). 




UQfJQUVlS 

Of  the  research  objective*  listed  in  the  second  sec- 
tion of  this  paper,  the  following  are  addressed  as  ex- 
amples of  currant  research  results  from  LACIE  field 
measurements. 

1.  To  determine  the  relationship  of  agronomic 
variables  such  as  biomass  and  leaf  area  index  to 
multispoctral  reflectance  of  spring  wheat 

2.  To  determine  the  effects  of  cultural  and  en- 
vironmental factors  on  the  spectral  response  of 
wheat 

2.  To  assess  the  spectral  diacriminabilif  y of  spring 
wheat  and  other  small  grains 

4.  To  determine  the  early-season  threshold  for 
detection  of  wheat 

5.  To  compare  snd  evaluate  the  Landsat  MSS  and 
thematic  mapper  bands  for  prediction  of  crop  cannpy 
variables 

The  reaulu  obtained  to  date  for  these  objectives 
are  presented  in  subsequent  sections,  following  a 
summary  of  the  approach  used. 


Kxporimontal  Approach 

The  data  used  for  the  analyses  in  this  report  were 
acquired  during  1975  and  1976  in  Kansas  and  North 
Dakota  by  the  helicopter-  and  truck-mounted 
spectrometer  systems  at  approximately  10-  to  14-day 
intervals  during  the  wheel  growing  seasons.  Bidirec- 
tional reflectance  factor  and  agronomic  dau  were  ac- 
quired in  approximately  7$  (kids  from  each  of  the 
intensive  test  sites  and  60  plots  from  the  agriculture 
experiment  sutiona. 

Correlation  and  regression  analyses  were  used  to 
relate  biological  and  physical  variables  describing  the 
canopies  to  spectral  response.  To  rotate  the  reflec- 
tance measurement*  more  directly  to  Landsat,  the 
analyses  were  performed  uring  reflectance  dau 
averaged  into  bands  corresponding  to  the  Landsat 
MSS  and  thematic  mapper  spectral  bands.  The 
‘tasseled-cap"  transformation  (ref.  21 ) was  also  used 
%o  determine  the  greenness  and  brightness  compo- 
nenu  of  the  Landsat  MSS  band  reflectances  for  some 
of  the  anal/scs 
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Rotation  of  UndMt  and  Raid  Maaauramanta 
of  Spootrol  Raaponaa 

Frequently  the  question  is  asked  whether  results 
from  analyses  of  field  measurements  data  can  be  re- 
lated to  and  applied  to  Landsat  MSS  data.  To  help 
answer  this  question,  analyses  of  the  relationship 
between  Landsat  MSS  data  and  helicopter 
spectrometer  measurements  of  the  spectral  response 
were  performed  (ref.  22).  Landsat  data  for  135  fields 
and  S dates  were  correlated  with  the  reflectances 
measured  by  the  helicopter  spectrometer.  The  Land- 

* ' riuvnE ■ !*»—»■■  IVIRMVH  m wwn  H*«»t  ipwi  \ 

sat  data  were  first  adjusted  using  the  XSTAR  spectrometer  measurement*  ttf  qyectral  response  far  five  acqulsl-  j 

algorithm  (ref.  23)  to  minimize  differences  among  tlw  dates  ta  Finney  County,  Kansas  (MSS  band  4,  0.5  to  0.6  j 

the  five  dates  in  Sun  angle  and  atmospheric  condi-  micrometer).  «( 

tions.  As  shown  in  figure  12,  for  MSS  band  4 (0.5  to 
0.6  micrometer),  the  two  sets  of  measurements  are 

highly  correlated;  similar  relationships  were  found  and  proposed  satellite  MSS  systems  were  compared, 
for  the  other  spectral  bands.  Using  empirical  rela-  The  measurements  were  made  at  the  WiUiston, 
tionships  such  as  these,  or  results  of  radiative  North  Dakota,  Agriculture  Experiment  Station  on 

transfer  modeling  (ref.  24),  crop  discriminability  can  nine  different  dates  during  the  summer  of  1976. 

be  predicted  by  relating  measured  reflectance  The  amount  of  vegetation  present  is  one  of  the 
differences  to  corresponding  differences  in  Landsat  principal  factors  influencing  the  reflectance  of  crop 

signals.  canopies.  Figure  13  illustrates  the  effect  of  the 

amount  of  vegetation  (as  measured  by  leaf  area  in- 
dex, percent  soil  cover,  biomass,  and  plant  height)  on 
Prediction  of  Crop  Canopy  Chamctorlttle*  the  spectral  response  during  the  period  between 

From  Reflectance  Measurements  tillering  and  the  beginning  of  heading,  when  the 

maximum  green-leaf  area  is  reached.  As  leaf  area  surd 
One  of  the  major  long-term  goals  of  agricultural  biomass  increase,  there  is  a progressive  and  charac- 

remote  sensing  is  to  estimate  from  spectral  measure*  teristic  decrease  in  reflectance  in  the  chlorophyll  ab- 

ments  crop  variables  that  can  subsequently  be  used  sorption  region,  increase  in  the  near-infrared  reflec- 

to  assess  crop  vigor  or  be  entered  into  a yield  predic-  lance,  and  decrease  in  the  middle-infrared  reflec- 
tion model.  To  achieve  this  goal,  the  complex  rela-  tance. 

tionship  between  the  spectral  reflectance  of  crop  Plant  development  and  maturity  (as  opposed  to 
canopies  and  their  biological  and  physical  charac-  growth  or  increase  in  size)  cause  many  changes  in 

teristics  must  be  understood.  canopy  geometry,  moisture  content,  and  pigments- 

One  of  the  LACIE  field  research  objectives  (ref.  tion  of  leaves.  These  changes  are  also  manifested  in 

25)  was  to  determine  the  relationship  of  canopy  the  reflectance  characteristics  of  crop  canopies, 

characteristics  to  reflectance  and  to  assess  the  poten-  Figure  14  shows  the  spectra  of  spring  wheat  at 

tial  for  estimating  these  characteristics  from  several  different  maturity  stages  (changes  in  the 

remotely  sensed  measurements  of  reflectance.  The  amount  of  vegetation  are  also  occurring), 

variables  selected  for  analysis  are  indicators  of  crop  The  linear  correlations  of  five  canopy  variables 
vigor  and  growth,  which  could  be  used  to  augment  with  reflectances  in  the  proposed  thematic  mapper 

agromet  models  of  crop  growth  and  yield.  (Landsat  D)  and  Landsat  MSS  bands  are  listed  in  ta- 

This  section  treats  the  effect  of  varying  amounts  ble  V.  The  relationships  of  percent  soil  cover,  leaf 

of  vegetation  and  of  maturity  stage  on  the  spectra  of  area  index,  fresh  biomass,  and  plant  water  content 

spring  wheat  canopies,  the  relation  of  canopy  varia-  with  reflectance  in  selected  wavelength  bands  are 

bles  to  reflectance  in  different  regions  of  the  shown  in  figure  15.  The  correlations  and  plots  in- 

spectrum,  and  the  potential  capability  to  predict  elude  data  from  all  treatments  for  the  stages  of 

canopy  variables  from  reflectance  measurements.  As  maturity  when  the  canopy  is  green,  seedling  through 

part  of  the  analysis,  the  wavelength  bands  of  current  flowering. 
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FIGURE  IT. — Effect  of  leaf  area  Index,  percent  toil  cover,  dry 
biomass,  and  plant  height  on  the  spectral  reflectance  of  spring 
wheat  during  the  period  between  tillering  and  the  beginning  of 
heading,  when  the  maximum  green-leaf  area  is  reached.  Data 
were  acquired  at  Williston.  North  Dakota,  from  May  28  to  June 
IB,  1976,  and  include  plots  with  different  soli  moisture  levels, 
planting  dates,  nitrogen  fertilitation.  and  cultlvars. 
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FIGURE  14.— Spectral  reflectance  or  spring  wheal  canopies  at 
several  maturity  stages.  Measurements  were  made  at  Williston, 
North  Dakota,  from  May  to  August,  1976,  and  Include  plots  with 
different  soil  moisture  levels,  planting  dales,  nitrogen  fertilisa- 
tion, and  cultlvars. 


Table  V. — The  Linear  Correlations  (r)  of  Reflectances 
in  the  Proposed  Thematic  Mapper  and  Landsat 
MSS  Wavelength  Bands  With  Percent  Soil  Cover, 
Leaf  Area  Index,  Fresh  and  Dry  Biomass, 
and  Plant  Water  Content 


Wavelength 
band,  pm 

Percent 

soil 

cover 

leaf 

area 

Index 

Fresh 

biomass 

Dry 

biomass 

Plant 

water 

content 

Thematic  mapper 

0.4$  to  0.52 

-0.82 

-0.79 

-0.7S 

-0.69 

-0.76 

0.32  to  0.60 

-.82 

-.78 

-.81 

-.77 

-.82 

0.63  to  0.69 

-.91 

-.86 

-.80 

-.73 

-.81 

0.76  io  0.90 

.93 

.92 

.76 

.67 

.79 

I.SS  to  I.7S 

-.85 

-.80 

-.83 

-.79 

-.84 

2.08  to  2.35 

-.91 

-.85 

-.86 

-.81 

-.86 

Landsat  MSS 


O.S  to  0.6 

-0.82 

-0.79 

-0.81 

-0.76 

-0.81 

0.6  to  0.7 

-.90 

-.85 

-.81 

-.74 

-.82 

0.7  to  0.8 

.84 

.84 

.57 

.46 

.60 

0.8  io  l.l 

.91 

.90 

.77 

.68 

.79 

Fresh  biomass,  dry  biomass,  and  plant  water  con- 
tent correlate  most  highly  (table  V)  with  reflectance 
in  the  middle-infrared  band,  2.08  to  2. 35 
micrometers.  Percent  soil  cover  and  leaf  area  index 
correlate  most  highly  with  a near-infrared  band,  0.76 
to  0.90  micrometer.  The  visible  wavelengths  were 
less  sensitive  to  leaf  area  and  biomass;  similar  results 
have  also  been  reported  by  Colwell  (ref.  26)  and 
Tucker  (ref.  27).  Other  canopy  variables  analyzed 
that  were  not  correlated  with  reflectance  were  plant 
height,  percent  green  leaves,  and  percent  plant 
moisture. 

These  and  other  analyses  of  the  data  indicate  that 
the  amount  of  photosynthetically  active  (green) 
vegetation  has  a dominant  influence  on  the  reflec- 
tance characteristics  of  crop  canopies.  This  observa- 
tion is  substantiated  by  the  decrease  in  the  correla- 
tion of  canopy  variables  and  reflectance  as  the 
canopy  begins  to  senesce  or  ripen  (refs.  25  and  28). 

Understanding  the  relation  of  the  agronomic 
properties  of  crop  canopies  to  reflectance  in  various 
regions  of  the  spectrum  is  the  first  step  in  the 
development  of  models  using  spectral  measure- 
ments. The  remainder  of  this  section  describes  the 
regression  models  developed  for  prediction  of  crop 
growth  characteristics. 
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wavelength  bands. 
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Table  VI  shows  results  for  selections  of  one  to  six 
wavelength  bands  to  predict  canopy  variables.  By 
computing  all  possible  regressions,  the  best  subset  of 
one  to  six  wavelength  bands  was  selected,  consider* 
ing  the  amount  of  variability  explained  and  the  bias 
of  the  resulting  regression  equation.  The  near-  and 
middle-infrared  bands  were  found  to  be  most 
strongly  related  to  the  canopy  variables.  For  leaf  area 
index  and  percent  soil  cover,  the  0.76-  to  0.90- 
micrometer  wavelength  band  accounts  for  more  of 
the  variation  than  any  other  single  band.  The  2.08-  to 
2.35-micrometer  wavelength  band  is  the  single  most 
important  band  for  predicting  the  variation  in  fresh 
biomass,  dry  biomass,  and  plant  water.  The  2.08-  to 
2.35-micrometer  wavelength  band  is  one  of  the  two 
most  important  bands  in  explaining  the  variation  in 
percent  soil  cover  and  one  of  the  three  most  impor- 


tant bands  in  explaining  the  variation  in  leaf  area 
index. 

The  relationships  between  the  measured  and  pre- 
dicted leaf  area  index  and  percent  soil  cover  are 
shown  in  figure  16.  Similar  results  were  obtained  for 
the  other  canopy  variables.  The  results  show  that 
reflectance  measurements  in  a small  number  of 
wavelength  bands  in  important  regions  of  the 
spectrum  can  explain  much  of  the  variation  in 
canopy  characteristics  and  can  be  used  to  estimate 
canopy  variables  such  as  leaf  area  index  and  biomass. 

Table  VII  shows  the  maximum  R 2 value  obtained 
for  predictions  of  each  canopy  variable  using  the 
Landsat  MSS  bands,  the  best  four  thema'.ic  mapper 
bands,  and  all  six  reflective  thematic  mapper  bands. 
In  every  case,  the  best  four  thematic  mapper  bands 
explained  more  of  the  variation  in  a canopy  variable 
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Table  VI.— Selection  of  Combinations  of  the  Best  1,2,  ...6  Wavelength  Bands  for  Estimating  Percent  Soil  Cover, 
Leaf  Area  Index,  Fresh  Biomass,  Dry  Biomass,  and  Plant  Water  Content  During  the  Seedling  to  Flowering  Stages 

of  Crop  Development 


Canopy 

variable 

No. 

bands 

entered 

CP 

(a) 

Bands  entered,  turn 

0.45 . > 0.52 

0.52  to  0.60 

0.63  to  0.69 

0.76  to  0.90 

1.55  to  1.75 

2.08  to  2.35 

Percent 

I 

0.86 

X 

soil 

2 

.92 

16 

X 

X 

cover 

3 

.92 

15 

X 

X 

X 

4 

.93 

4 

X 

X 

X 

X 

$ 

.93 

5 

X 

X 

X 

X 

X 

6 

.93 

7 

X 

X 

X 

X 

X 

X 

Leaf 

1 

.84 

37 

X 

area 

2 

.87 

7 

X 

X 

index 

3 

.88 

2 

X 

X 

X 

4 

.88 

4 

X 

X 

X 

X 

S 

.88 

5 

X 

X 

X 

X 

X 

6 

.88 

7 

X 

X 

X 

X 

X 

X 

Fresh 

1 

.73 

239 

X 

biomass 

2 

.76 

211 

X 

X 

3 

.83 

109 

X 

X 

X 

4 

.88 

41 

X 

X 

X 

X 

5 

.90 

12 

X 

X 

X 

X 

X 

6 

.93 

7 

X 

X 

X 

X 

X 

X 

Dry 

1 

.65 

252 

X 

biomass 

2 

.67 

229 

X 

X 

3 

.81 

78 

X 

X 

X 

4 

.84 

44 

X 

X 

X 

X 

S 

.87 

20 

X 

X 

X 

X 

X 

6 

.88 

7 

X 

X 

X 

X 

X 

X 

Plant 

1 

.75 

201 

X 

water 

2 

.77 

175 

X 

X 

content 

3 

.83 

98 

X 

X 

X 

4 

.88 

34 

X 

X 

X 

X 

5 

.90 

9 

X 

X 

X 

X 

X 

6 

.90 

7 

X 

X 

X 

X 

X 

X 

*Tht  regression  equation  is  unbiased  when  the  Cp  value  is  less  than  or  equal  lo  the  number  of  terms  (wavelength  bands)  in  the  equation. 


than  the  four  Landsat  bands.  Addition  of  the  other 
two  thematic  mapper  bands  resulted  in  small  in- 
creases in  the  R 2 values. 

The  lower  correlations  (table  V)  and  predictions 
(table  VII)  of  the  Landsat  MSS  bands  compared  to 
the  thematic  bands  are  attributed  to  the  width  and 
location  of  the  bands  with  respect  to  the  spectral 
characteristics  of  vegetation.  For  example,  the  data 
in  table  V demonstrate  a disadvantage  of  collecting 
data  in  the  0.7-  to  0.8-micrometer  wavelength  range. 
The  inclusion  in  this  band  of  the  region  (near  0.7 
micrometer)  of  rapid  transition  from  the  chlorophyll 
absorption  region  of  the  spectrum  to  the  highly 


reflecting  near-infrared  region  (0.70  to  0.74 
micrometer)  results  in  a weaker  relation  between 
reflectance  and  crop  canopy  variables.  Similar  results 
were  reported  by  Tucker  and  Maxwell  (ref.  29).  This 
low  correlation  reduces  the  usefulness  of  the  data  in 
the  0.7-  to  0.8-micrometer  wavelength  band. 


Effect  of  Agronomic  and  Environmental 
Factora  on  Spectral  Reflectance 

The  crop  canopy  is  a dynamic  entity  influenced  by 
many  agronomic  and  environmental  factors.  The 
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FIGURE  16.— Comparison  of  measured  and  predicted  percent 
soil  cover  and  leaf  area  Index  of  sprint  wheat. 


effects  of  several  cultural  and  environmental  factors 
on  the  reflectance  of  spring  wheat  were  investigated 
using  data  acquired  at  the  Williston,  North  Dakota, 
Agriculture  Experiment  Station.  The  examples  of 
spring  wheat  spectra  from  selected  measurement 
dates  shown  in  figure  17  illustrate  the  effects  of 
available  soil  moisture,  planting  date,  nitrogen  fer- 
tilization, and  cultivar.  Additional  examples  of  crop 
spectra  from  the  other  test  sites,  crops,  and  years 
have  been  compiled  by  Hixson  et  al.  (ref.  30). 

The  Waldron  (standard  height)  cultivar  planted 
early  on  fallow  land  with  nitrogen  fertilization  was 
selected  as  a standard  of  comparison.  One  treatment 
at  a time  was  varied  from  this  standard,  permitting 
comparisons  of  reflectance  spectra  measured  on 
plots  with  different  soil  moisture  levels,  planting 
dates,  fertilization,  and  cultivar.  AH  treatment  com- 
parisons were  made  using  spectra  acquired  on  June 
18,  1976,  during  the  stem  extension  phase  of 
development,  except  for  the  comparison  of  cultivars, 
which  was  on  July  16,  after  heading. 


In  1976,  the  effects  of  available  soil  moisture  on 
plant  growth  and  spectral  response  were  quite  signifi- 
cant. Wheat  planted  on  fallow  land  had  more  tillers 
and,  therefore,  greater  biomass,  leaf  area,  and  percent 
soil  cover  than  the  wheat  crop  grown  on  land  that 
had  been  cropped  the  previous  year.  These 
differences  account  for  the  decreased  visible  reflec- 
tance, increased  near-infrared  reflectance,  and 
decreased  middle-infrared  reflectance  in  the  fallow 
treatment.  The  effect  of  planting  date  on  spectral 
response  is  also  illustrated  in  figure  17.  The 
differences  are  attributed  to  differences  in  the 
amount  of  vegetation  present,  as  well  as  differences 
in  maturity  stage. 

Adding  nitrogen  fertilizer  increased  the  amount  of 
green  vegetation  early  in  the  growing  season.  The  fer- 
tilized treatment  had  the  spectral  characteristics  of  a 
greener,  denser  vegetative  canopy— decreased  red 
reflectance,  slightly  greater  near-infrared  reflectance, 
and  reduced  middle-infrared  reflectance. 

The  two  wheat  cultivars,  Olaf  (semidwarf,  awned) 
and  Waldron  (standard  height,  awniess),  were  simi- 
lar in  appearance  before  heading.  After  heading, 
some  differences  between  the  two  cultivars  were  ap- 
parent but  are  probably  not  significant.  The  greatest 
spectral  differences  were  in  the  middle  infrared,  in- 
dicating a difference  in  the  moisture  and  biomass 
between  the  two  cultivars  at  this  growth  stage. 

In  one  analysis  (ref.  22),  one-way  multivariate 
analyses  of  variance  were  performed  on  the  Landsat 
MSS  band  reflectance  data  from  individual  plots  of 


Table  Vli—TheR2  Values  for  Predictions  of  Percent 
Soil  Cover,  Leaf  Area  Index,  Fresh  and  Dry  Biomass, 
and  Plant  Water  Content  With  Four  Landsat  MSS 
Bands,  the  Best  Four  Thematic  Mapper  Bands,  and  the 
Six  Thematic  Mapper  Bands 


Wavelength 

bands 

Percent 

sail 

cover 

Leaf 

area 

index 

Fresh 

biomass 

Ory 

biomass 

Plant 

water 

content 

Lands*?  MSS 
bands 

0.91 

0.86 

0.86 

0.84 

0.8$ 

Best  four 
thematic  mapper 
bands8 

.93 

88 

88 

.84 

.88 

Six  thematic- 
mapper  bands 

93 

88 

91 

.88 

.90 

“See  lehle  VI 
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FIGURE  17. — Effects  of  agronomic  treatments  on  the  spectral  reflectance  of  spring  wheat.  Spectra  were  measured  on  June  18  during 
the  stem  extension  stage  of  development,  except  for  the  spectra  of  cultivars  which  were  measured  on  July  16  after  heading,  (a)  Previous 
crop,  (b)  Planting  date,  (c)  Nitrogen  fertilization,  id)  Cultivar. 


spring  wheat  using  data  from  the  entire  season.  Soil 
moisture  availability  was  found  to  be  the  most  sig- 
nificant factor.  A decreased  moisture  supply,  caused 
by  planting  wheat  for  a second  year  in  succession  on 
the  same  plot,  both  decreased  the  magnitude  of  green 
development  from  that  of  wheat  planted  on  fallow 
ground  and  delayed  the  date  of  maximum  greenness. 
A similar  delay  in  maximum  greenness  was  observed 
when  the  planting  date  was  delayed  by  10  days,  but 
the  difference  in  maximum  greenness  levels  was  not 
as  pronounced  as  in  the  case  where  available  soil 
moisture  was  reduced. 


Figure  18  illustrates  the  effects  of  the  soil 
moisture  and  nitrogen  fertilization  factors  on  the 
maximum  values  attained  by  the  greenness  and 
brightness  components  of  reflectance  for  the  small- 
grains  test  plots.  Maximum  greenness  is  most 
affected  by  soil  moisture  as  plentiful  soil  moisture 
produces  more  vegetation,  which  covers  the  soil. 
Nitrogen  fertilization  was  observed  to  affect  the 
greenness  component  in  a similar  fashion,  with  the 
greening  value  of  nitrogen  fertilizer  being  very  evi- 
dent on  those  plots  that  were  continuously  cropped. 
Soil  moisture  also  affected  the  brightness  compo- 
nent. 
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FIGURE  18. — Effect!  of  suit  moisture  and  nitrogen  fertilisation  on  tbe  maximum  greenness  and  brightness  components  of  reflectance 
of  small  grains,  (a)  Maximum  greenness,  June  18.  (to  Maximum  brightness,  July  29  to  August  6. 


These  spectra  and  analyses  illustrate  the  dynamic 
character  of  the  canopy  and  the  many  factors  that  in- 
fluence the  spectral  reflectance  of  the  canopy.  More 
quantitative  analyses  of  the  effect  of  agronomic 
treatments  and  environmental  variables  on  reflec- 
tance of  wheat  are  currently  being  conducted. 


Spectral  Discrimination  of  Spring  Whoat 
From  Othor  Small  Gretna 

One  of  the  critical  issues  that  arose  during  L AC1E 
was  spectrally  discriminating  spring  wheat  from  the 
other  spring  small  grains.  These  crops  have  similar 
reflectance  spectra  and  crop  calendars;  consequently, 
LACIE  initially  did  not  attempt  to  inventory  them 
separately.  Instead,  a small-grains  area  estimate  was 
obtained,  and  historical  data  on  crop  production  were 
used  to  establish  spring-wheat-to-small-grains  ratios 
for  producing  a spring  wheat  estimate.  It  was  found 
that  these  ratios  could  vary  appreciably  from  year  to 
year,  introducing  errors  in  the  spring  wheat  esti- 
mates. Consequently,  some  supporting  research 


effort  was  directed  toward  investigation  of  spectral 
techniques  for  achieving  such  discrimination  (ref. 
22).  Although  m^jor  emphasis  was  placed  on 
analysis  of  Landsat  data  from  LACIE  blind  sites, 
analysis  of  field  measurement  data  played  a strong 
supportive  role,  along  with  analysis  of  USDA  crop 
statistics.  Only  data  from  the  first  2 years  were  in- 
cluded in  this  analysis.  An  expanded  small-grains  ex- 
periment was  conducted  in  1977, 

The  objectives  of  the  analyses  of  field  measure- 
ments data  on  spectral  reflectance  of  wheat  and 
small  grains  were  (1)  to  characterize  the  spectral 
reflectance  of  spring  wheat  and  other  small  grains  as 
a function  of  time  throughout  the  growing  season. 
(2)  to  characterize  the  sources  and  extent  of 
variability  to  be  expected,  and  (3)  to  develop  dis- 
crimination techniques  for  distinguishing  between 
spring  wheat  and  other  small  grains  through  in- 
creased understanding  of  Landsai  signals  for  there 
crops. 

The  change  in  spectral  character  of  spring  wheat 
reflectance  at  five  maturity  stages  was  illustrated  in 
figure  14,  while  similarities  and  some  differences  of 
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spring  wheat,  barley,  and  oats  spectra  on  three  dates 
are  shown  in  figure  19.  The  spectral  patterns 
throughout  the  growing  season  were  determined  for 
spring  wheat  and  other  small  grains.  One  technique 
was  to  plot  the  time  trajectories  of  transformed 
reflectance  values  for  these  crops  and  look  for 
differences  that  might  prove  useful  for  discrimina- 
tion. Linear  combinations  of  values,  analogous  to  the 
tasseled-cap  transformation  of  Landsat  data  (see 
next  section),  were  used  to  form  greenness  and 
brightness  components  of  reflectance. 

Figure  20  presents  spectral  trajectories  for  hard 
red  spring  wheat,  barley,  and  oats;  durum  spring 
wheat  is  very  similar  to  hard  red  spring  wheat  Each 
trajectory  is  for  a crop  that  had  been  planted  on 
prior-year-fallowed  soil  (more  available  soil  moisture 
than  continuously  cropped  soil)  and  had  been  fer- 
tilized. Thus,  they  represent  spectral  patterns  for  the 
best  growing  conditions  available  at  the  experiment 
station.  Although  the  general  shapes  of  the  spectral 
trajectories  are  similar,  several  differences  can  be 
seen  among  them;  notably,  barley  attained  greater 
values  in  both  greenness  and  brightness  before  head- 
ing and  its  brightness  upon  ripening  was  greater  than 
that  of  wheat.  Less  distinctiveness  was  observed  in 
the  spectral  characteristics  of  other  plots  with  crops 
that  were  grown  under  less  favorable  conditions 
(fig.  18). 

An  analysis  of  color  photographs  (fig.  21)  and 
agronomic  measurements  (table  VIII)  made  in  con- 
junction with  the  spectral  measurements  helps  to  ex- 
plain the  physical  causes  of  the  observed  spectral 
differences  and  variability.  Grown  under  favorable 
conditions,  the  barley  had  greater  biomass,  leaf  area 
index,  and  percent  soil  cover  than  spring  wheat, 
resulting  in  higher  maximum  greenness  values.  The 
barley  matured  and  ripened  1 week  to  10  days  before 
the  spring  wheat.  Longer  lighter  colored  awns, 
drooping  heads,  and  greater  soil  cover  all  contributed 
to  a greater  maximum  brightness  for  barley  at 
maturity. 

In  summary,  analyses  of  field  measurements  data 
provided  insights  into  the  causes  of  the  spectral 
characteristics  of  spring  wheat  and  barley  that  may 
prove  useful  for  discrimination.  For  instance, 
differences  in  greenness  and  brightness  at  heading 
and  brightness  at  ripening  and  the  timing  of  these 
events  appear  key  to  their  spectral  discrimination. 
One  preliminary  operational  technique  for  direct 
spectral  classification  of  spring  wheat  was  tested  dur- 
ing LACIE  Phase  III  and  improved  techniques  are 
currently  under  development. 
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FIGURE  19.— Spectra  of  (print  wheal,  barley,  and  oats  at  deni 
extension,  heading,  and  ripe  states  of  maturity,  (a)  Stem  exten- 
sion, Jane  IS,  1976.  (b)  Heading,  July  16, 1976.  (c)  Ripe,  July 
29,  1976. 


Early-8saton  Detection  of  Wheat 

In  LACIE,  it  was  found  that  early-season  esti- 
mates of  winter  wheat  area  tended  to  be  low  and 
unreliable,  because  the  emergence  and  development 
of  green  vegetative  cover  on  the  soil  are  variable 
because  of  differences  in  planting  dates,  crop  rota- 
tion, irrigation  and  fertilization  practices,  and  local 
weather. 

A study  was  conducted  using  LACIE  field 
measurement  data  to  investigate  the  threshold  of 
wheat  detectability  in  Landsat  data  (ref.  22).  Heli- 
copter-spectrometer and  agronomic  data  acquired  for 
10  dates  during  the  1975-76  growing  season  at  the 
Finney  County,  Kansas,  intensive  test  site  were 
analyzed. 

Figure  12  illustrated  the  reflectance  spectra  for 
fields  with  different  amounts  of  vegetation.  To  relate 
these  data  to  Landsat  analysis,  reflectance  values  for 
the  Landsat  MSS  bands  were  computed.  A useful 
technique  for  Landsat  MSS  data  analysis  has  been  to 
form  linear  combinations  of  the  bands,  defining  a 
new  coordinate  system  for  describing  the  data.  One 
such  transformation,  the  tasseled-cap  transforms- 
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FIGURE  M.— Comparison  of  atlecteS  gr*etin«*s-brightness  tnjMtorico  of  ^ring  wheal,  barley,  and  oati.  The  meaearemenl  date*  and 
irowth  dates  an  as  fellows:  (1)  May  28,  seeSHng;  (2)  Jane  3,  lllleriai;  (3)  June  18,  feinting;  (4)  Jaly  I,  heading;  (5)  July  15,  milk; 
(4)  Jaly  29,  tooth;  (7)  August  1,  ripe;  (•)  August  6,  ripe.  Note;  Barley  was  la  Ih.  milk  stage  on  July  S ant  was  ripe  by  July  29. 


tion,  defines  a brightness  variable  that  alines  closely 
with  the  direction  followed  by  reflectances  of  varied 
soils.  Orthogonal  to  brightness  is  the  greenness  varia- 
ble, which  is  oriented  toward  the  spectral  response 
from  healthy  green  vegetation.  These  two  compo- 
nents describe  most  of  the  variability  observed  in 
Landsat  scanner  measurements  of  agricultural 
scenes  (see  the  paper  by  Kauth  et  *1.  entitled 
“Feature  Extraction  Applied  to  Agricultural  Crops 
as  Seen  by  Landsat").  A principal  components 
analysis  of  the  reflectances  revealed  that  98  percent 


T A BLE  VIII.  — Agronomic  Ch  arac  ten's  tics  of  Small 
Grains  on  Three  Measureme.il  Dates  at  WiUtston, 
North  Dakota,  Agriculture  Experiment  Station? 


Pate 

Small  grains 

Percent 

soil 

cover 

Leaf 

area 

index 

Fresh 

biomass, 

ghn1 

Maturity 

stage1’ 

June  18 

Spring  wheat 

so 

IS 

S67 

3.3 

Barley 

90 

2.9 

1326 

3.4 

Oats 

80 

2.0 

1022 

3.4 

July  16 

Spring  wheat 

30 

.7 

1162 

5.1 

Barley 

70 

1.3 

1686 

5.2 

Oats 

SO 

1.2 

1388 

5.1 

July  29 

Spring  wheat 

30 



854 

52 

Barley 

60 

— 

961 

5.4 

Oats 

so 

— 

820 

5.3 

*THc  plots  ire  the  same  ones  shown  in  Oiurrs  19  to  21 . were  grown  on  fallow  land, 
and  received  nitrogen  (emitter 

^Maturity  stages  3 3 to  3 4.  stem  estension.  5 1.  milk.  5 2.  toft  dough.  5 3.  hard 
dough.  5 4.  ripe 


of  the  variability  was  in  a plane  analogous  to  the 
greenness  and  brightness  plane  for  Landsat  MSS 
data. 

Four  fields  with  different  management  practices 
were  selected  to  illustrate  the  relationship  of  the 
greenness  component  of  reflectance  to  measurement 
data  (fig.  22).  The  absence  of  fall  green  development 
in  the  late-planted  fields  and  the  appreciable  fall 
greening-up  of  the  field  that  was  irrigated  and 
planted  at  the  normal  time  are  apparent. 

The  proportions  of  late-  and  early-planted  fields 
and  irrigated  and  nonirrigated  fields  will  vary  from 
site  to  site,  as  will  other  factors  that  determine 
development  rates.  Yet,  it  is  of  interest  to  determine 
both  how  the  collection  of  wheat  fields  in  the  Finney 
County  site  developed  in  1975-76  and  how  well  they 
would  have  been  detected  by  a decision  rule  that 
called  them  wheat  if  their  greenness  component  of 
reflectance  exceeded  a given  threshold  by  a given 
date. 

To  provide  a quantification  of  the  greening-up 
characteristics  of  this  group  of  fields,  histograms 
were  computed  to  describe  the  percentage  of  fields 
exceeding  a given  greenness  value  as  a function  of  ac- 
quisition date.  Figure  23  displays  these  results  in  two 
ways:  (1)  with  fixed  threshold  levels  ind  varied 
dates  and  (2)  with  fixed  dates  and  varied  thresholds. 
With  a threshold  of  0.06,  95  percent  of  the  wheat 
fields  would  have  been  detected  on  the  eighth  mis- 
sion (May  6, 1976),  63  percent  of  them  on  the  sev- 
enth mission  (April  18),  and  38  percent  on  the  sixth 
mission  (March  31).  For  a lower  threshold  of  0.04, 
the  corresponding  percentages  would  have  been  100, 
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H(,l  RK  21.— (mnparison  of  oblique  and  urtical  .Icon  „f  spring  *h»n  and  bark,  canopies  .1  loo  si  gcs  of  nialuril,  I.)  Spring 
ohral.  July  16.  1076.  da,  l<»K.  milk,  (bl  Barley.  July  16.  1976,  day  1<*.  ...Ilk  (cl  Spring  ohrat.  July  ITo.  day  ’.II.  hard  dough.  <dl 
Hirlf > , Jul>  29,  |976,  dn>  211,  bird  dough. 
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FIGURE  22.— GnenMcs  component  of  nflccUMC*  of  wlattr  »hool  11*14*  at  ■ hMbi  of  fair  for  differaM  management  proctlcoo.  (a) 
Nonlrrtgated,  normal  planting  date,  (b)  Irrigated,  normal  planting  date,  (c)  Nonlrrlgated.  late  planting  dale,  (d)  It  'Igated,  late  planting 
date. 


86,  and  62  for  the  eighth,  seventh,  and  sixth  mis- 
sions, respectively.  On  the  two  next  earliest  acquisi- 
tions, fourth  (November  11)  and  fifth  (March  18), 
28  and  38  percent  of  the  fields  would  have  been 
detected,  respectively.  For  a threshold  of  0.02,  90 
percent  of  fields  would  have  been  detected  at  acquisi- 
tion': 1 and  S.  The  nonwheat  fields  in  this  data  set 
were  also  tested  for  the  greenness  threshold  crossing, 
with  good  exclusion  of  them  by  the  0.06  and  0.04 
thresholds.  For  example,  for  the  threshold  of  0.06, 
only  one  field  exceeded  the  threshold  on  the  eighth 
acquisition  and  none  on  earlier  missions.  As  a point 
of  reference,  a root  mean  square  error  (RMSE)  of 
0.018  in  greenness  would  correspond  to  a two-count 
uncorrelated  RMSE  noise  level  in  each  Landsat 
band. 

The  relationship  between  the  greenness  compo- 
nent of  measured  reflectance  and  t!.:  observed  per- 
cent soil  cover  for  the  wheat  fields  was  also  analysed. 
A greenness  reflectance  threshold  of  0.02  corre- 


sponded to  18  to  25  percent  soil  cover,  one  of  0.04  to 
30  to  3S  percent,  and  one  of  0.06  to  40  to  45  percent 
soil  cover.  These  values  need  refinement  because 
only  coarse  (20  percent)  increments  of  soil  cover 
were  recorded  for  the  fields  analyzed. 


SUMMARY  OF  KEY  ACCOMPLISHMENTS 
AND  RESULTS 

The  LAC1E  Field  Measurements  Project  suc- 
cessfully acquired  a large  amount  of  high-quality 
spectral  measurements  during  3 years  at  three  test 
sites  in  Kansas,  South  Dakota,  and  North  Dakota. 
Analyses  of  these  data  are  providing  new  knowledge 
about  the  spectral  properties  of  crops  in  relation  to 
their  agronomic  characteristics. 

Spectral  measurements  were  made  of  controlled 
experimental  plots  of  wheat  and  other  small  grains 
using  truck-mounted  spectrometers  and  of  commer- 
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ciai  Heidi  of  wheat  and  other  cropi  by  a helicopter- 
borne  spectrometer,  an  airborne  scanner,  and  the 
Landsat  MSS.  The  spectral  data  are  supported  by  ex- 
tensive agronomic  and  meteorologies!  measure- 
ments and  observations.  Together,  the  spectral, 
agronomc,  and  meteorological  data  form  the  most 
comprehensive  data  set  now  available  for  agricultural 
remote-sensing  research.  The  data  have  been  pro- 
cessed and  provided  to  investigators  who  are  now 
using  them  in  research  programs;  they  are  available 
for  use  by  other  investigators. 


The  capability  to  acquire  and  analyze  spectral 
measurements  was  significantly  advanced  during  the 
LACK  Field  Measurement  Project  One  of  the  im- 
portant attributes  of  the  LACIE  fMd  measurements 
spectral  data  is  that  they  are  rad) ©metrically  cali- 
brated. Calibration  enables  valid  comparisons  of 
measurements  from  different  dates,  sensors,  and/or 
locations,  fhe  procedures  for  Held  calibration  of  data 
have  been  defined  and  tested,  and  dm  Knowledge 
gained  will  continue  to  be  applied  in  future  investiga- 
tions. 

Hie  development  of  a computerized  Held  research 
data  base  and  an  interactive  graphics  and  statistics 
software  system  has  significantly  increased  the 
capability  to  analyze  and  interpret  interrelationships 
of  the  spectral  and  agronomic  characteristics  of  crops 
and  mUs. 

Another  result  of  die  LACIE  field  measurements 
experience  is  die  definition  of  specifications  of  a 
standardized,  flexible,  and  economical  multispectral 
data  acquisition  system  for  Held  research.  The  instru- 
ment system  would  consist  of  a multiband 
radiometer,  including  the  thematic  qiapper 
wavelength  bands,  and  a data  recording-handi  ing- 
playback  module.  Development  and  usefof  these  Hp- 
•trument  systems  will  make  it  possible  and  econom- 
ical to  acquire  and  process  calibrated  spectral 
measurements  from  tripods,  trucks,  or  helicopters 
over  s wide  variety  of  crops.  This  approach  to 
spectral  data  collection  was  successfully  tested  <by 
LARS  in  1977. 

Analysis  of  the  LACIE  field  measurements  data 
is  providing  new  knowledge  and  understanding  of 
the  spectral  characteristics  of  wheat  and  the  biologi- 
cal-physical factors  affecting  spectral  response.  For 
example,  strong  relationships  have  been  found  be- 
tween reflectance  and  percent  soil  cover,  leaf  area  in- 
dex, biomass,  and  plant  water  content.  These  are  fun- 
damental measures  of  crop  vigor  that  can  be  used  in 
crop  growth  and  yield  prediction  models.  In  relating 
agronomic  and  spectral  characteristics  of  wheat,  it 
has  been  found  that  s middle-infrared  wavelength 
band,  2.08  to  2.35  micrometers,  is  most  i»».r  tint  in 
explaining  variation  in  biomass  and  plant  water  con- 
tent, whereas  a near-infrared  band,  0.76  to  0.90 
micrometer,  accounts  for  the  most  variation  in  per- 
cent soil  cover  and  leaf  area  index.  In  evaluating  sen- 
sor characteristics,  it  has  been  determined  that  the 
reflective  wavelength  bands  proposed  for  the 
thematic  mapper  are  more  strongly  related  to  and 
better  predictors  of  the  canopy  variables  than  the 
Undaat  MSS  bands,  in  other  studies,  insights  for 
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development  of  discrimination  techniques  have  been 
gained  through  analysis  of  the  spectral  differences 
between  spring  wheat  and  barley  and  die  spectral 
development  of  wheat  fields  early  in  the  growing 
season. 


ftBCOMMKNOATIONt  PON  FUTURE  FIELD 
REBEARCH 

Although  the  LaCIE  Field  Measurements  Project 
acquired  a large  quantity  of  data,  the  sample  of  crop, 
soil,'  and  weather  conditions  was  small,  even  for 
wheat.  Each  of  the  3 years  in  each  site  was  different 
in  terms  of  the  weather  and  the  crop  response  to  it; 
however,  the  crop  cannot  be  treated  as  a constant 
even  if  the  weather  does  not  vary  significantly  from 
year  to  year.  Changing  economic  conditions  and  ad* 
vancements  in  agricultural  technology  will  bring 
changes  in  crop  and  soil  management  (e.g., 
minimum  tillage)  and  even  the  crop  itself  (e.g,  in* 
troduction  of  semidwarf  varieties  of  wheat). 
Measurements  of  wheat  and  its  confusion  crops 
should  be  continued  over  additional  years  if  the  fall 
potential  of  the  current  effort  is  to  be  achieved. 

As  o.ic  looks  ahead  to  the  development  of  a global 
food  and  fiber  information  system  using  remote* 
sensing  techniques,  it  is  critical  to  begin  to  make  the 
field  measurements  required  to  understand  the 
spectral  characteristics  of  crops  other  than  wheat, 
such  as  corn  soybeans,  rice,  cotton,  and  rangeland. 
One  of  the  lessons  that  should  come  from  the 
LACIE  Field  Measurements  Project  is  the  impor- 
tance of  conducting  field  research  before  the  results 
are  needed  to  design  a large-scale  effort.  Because  of 
the  year-to-year  variations  in  weather,  several  years 
of  data  are  required. 

The  primary  sensors  used  for  LACIE  field 
measurements  were  spectrometers  capable  of  pro- 
ducing high-resolution  spectra.  In  the  future,  a new 
approach  to  the  collection  of  field  measurements 
data  will  be  needed  because  it  will  not  be  feasible 
simply  to  multiply  the  current  approach  by  the  in- 
creased number  of  crops  and  regions  that  should  be 
included  in  future  experiments.  Multiband 
radiometer  systems  can  economically  provide  the 
necessary  spectral  measurements.  With  these  instru- 
ments, it  will  be  possible  to  acquire  measurements  at 
more  sites  than  is  possible  with  the  currently  avails* 
ble  high-spectral-resolution  spectrometer  systems. 
And,  it  is  more  observations  of  crops  and  soils  under 
a wide  variety  of  conditions  (not  detailed  spectral 
measurements  of  a limited  number  of  locations  and 


crop  conditions)  that  are  needed  to  increase  our  un- 
derstanding of  the  spectral  characteristics  of 
agricultural  semes.  There  win  be  a continuing  need 
for  the  high*reaolution  spectrometer  systems  to  be 
used  in  field  research,  but  less  complex  systems  are 
also  required  that  can  be  used  to  make  large  numbers 
of  measurements  at  many  sites  economically  and 
accurately. 

The  approach  to  data  acquisition  should  include 
cooperative  efforts  with  USDA,  land-grant  univer- 
sities, and  commercial  test  stations  to  make  detailed 
crop,  soil,  and  meteorological  measurements  in  con- 
trolled plots,  as  well  as  less  intensive  observations  of 
commercial  fields  in  larger  mat  rites. 

In  conclusion,  field  research  is  an  essential  com- 
ponent of  the  development  of  agricultural  remote 
sensing.  A sound  field  research  program  can  provide 
the  basis  on  which  larger  scale  satellite  experiments 
and  operational  systems  are  constructed.  The  overall 
objectives  of  future  field  research  should  be  to  obtain 
a quantitative  understanding  of  the  radiation  charac- 

, sties  of  agricultural  crops  and  their  soil  back- 
grounds and  to  assess  the  capability  of  current, 
planned,  and  future  satellite  sensor  systems  to  cap- 
ture available  useful  spectral  information.  Field 
research  is  a particularly  important  component  of 
developing  remote-sensing  techniques  for  assessing 
crop  condition  and  predicting  crop  yields. 
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The  USDA  Application  Test  System 


FOREWORD 

The  U.S.  Department  of  Agriculture  (USDA)  is 
aware  of  the  potential  for  using  satellite  remote-sens- 
ing techniques  to  support  present  and  future  USDA 
information  requirements.  The  ability  of  key  U.S. 
decisionmakers,  both  in  government  and  in  private 
industry,  to  accurately  access  the  production  poten- 
tial of  major  world  crop  .p  a timely  manner,  as  well 
as  to  assess  world  mark:  potential,  has  been  eco- 
nomically rewarding  to  U S.  foreign  trade.  Com- 
modity experts  of  the  Foreign  Agricultural  Service 
(FAS)  of  the  USDA  have  the  expertise  to  accurately 
analyze  and  evaluate  foreign  crop  data  but  often  do 
not  have  as  timely  and/or  as  complete  data  for  the 
formulation  of  their  crop  production  forecasts.  Inac- 
curate and/or  untimely  crop  information  can  be  cost- 
ly to  U.S.  foreign  trade,  to  the  American  farmer,  and 
to  the  consumer.  Crop  production  potentials  can 
change  very  quickly  because  of  the  vulnerability  of 
crops  to  the  effects  of  weather  and  other  natural 
phenomena,  such  as  disease  and  insect  infestation. 
For  example,  the  impact  of  an  acute  event  such  as  an 
overnight  freeze  can  quickly  change  the'1  production 
potential  of  a given  crop  and  thus  alter  existing  world 
market  conditions.  Understandably,  more  timely 
and  more  accurate  foreign  crop  condition  informa- 
tion can  be  of  great  benefit  to  U.S.  foreign  trade. 

The  significance  of  the  LACIE  is  that  it  demon- 
strates that  current  Earth  resources  and  meteorologi- 
cal satellite  data  offer  commodity  experts  and  deci- 
sionmakers information  that  can  potentially  im- 
prove the  timeliness  and  accuracies  of  foreign  crop 
production  estimates.  The  users  will  determine  the 
cost-effective  applications  of  remote-sensing  tech- 
nology. The  Application  Test  System  (ATS)  of  the 
USDA  was  developed  as  a part  of  the  LACIE  and 
will  be  one  of  the  vehicles  used  to  transfer  remote- 
sensing  technology  in  the  future.  Currently,  the  ATS 
is  testing  and  evaluating  the  latest  satellite  and  com- 
puter processing  and  analysis  techniques  in  terms  of 
their  future  application  potential  bv  the  USDA. 
USDA  management  will  review  ATS  tests  and 
evaluations  of  candidate  techniques  prior  to  making 


a decision  on  their  transfer  to  operational  elements. 
The  ATS  as  part  of  the  USDA  will  be  responsive  to 
changing  and  expanding  user  requirements.  This 
year,  for  example,  there  were  further  clarifications  of 
USDA  requirements  with  the  issuance  of  the  Secre- 
tary of  Agriculture's  “Initiative  for  Aerospace  Tech- 
nology," calling  for  improved  information  on  the 
“early  warning  of  changes  affecting  production  and 
quality  of  renewable  resources.”  As  a result  of  the 
initiative,  the  ATS  is  beginning  to  test  and  evaluate 
present  satellite  and  computer  processing  and 
analysis  techniques  as  tools  for  timely  assessment  of 
crop  conditions  in  foreign  countries.  The  ATS  is 
evaluating  candidate  techniques  developed  by 
LACIE  as  well  as  techniques  developed  by  the 
general  research  community  and  by  private  industry. 

The  purpose  of  this  session  is  to  describe  the  ex- 
perience in  technology  transfer  between  the  LACIE 
and  the  ATS:  the  technical  and  functional  designs  of 
the  ATS;  the  ATS  central  data  base  concept  and 
design;  and  the  analysis  component  of  the  ATS.  The 
following  paragraphs  present  a brief  description  of 
each  of  the  six  papers  presented  in  this  session. 

“The  Application  Test  System:  An  Approach  to 
Technology  Transfer”  presents  the  approach,  the 
achievements,  and  the  shortcomings  of  the  ex- 
perience in  technology  transfer  between  the  LACIE 
and  the  ATS. 

“Functional  Definition  and  Design  of  a USDA 
System”  describes  the  design  of  a USDA  prototype 
system  that  has  many  of  the  same  characteristics  as 
the  LACIE  system.  This  prototype  system  has  not 
been  implemented,  but  it  is  available  if  USDA  man- 
agement decides  to  use  it. 

“The  Application  Test  System:  Technical  Ap- 
proach and  System  Design”  describes  the  require- 
ments for  and  eventual  design  of  a computer  system 
for  large-scale  processing  of  Landsat  data.  The  com- 
puter system  is  composed  of  modular  off-the-shelf 
components  of  limited  specialization  and  can  readily 
accommodate  state-of-the-art  changes  in  hardware 
and  software  technology. 

“Data  Base  Design  for  a Worldwide  Multicrop  In- 
formation System”  addresses  the  design  of  the 
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central  data  base  supporting  the  ATS.  The  data  base 
will  support  tnulticrop  and  multicountry  information 
requirements  identified  by  the  end  users,  as  well  as 
the  everyday  functional  and  analytical  needs  of  the 
ATS  crop  and  image  analysts,  management,  and 
system  development  teams. 

“The  Application  Test  System:  Experiences  to 
Date  and  Future  Plans"  details  the  data  analysis 
component  of  the  ATS,  describing  both  short-  and 
long-term  analysis  objectives.  The  ATS  crop  analyst 
uses  a state-of-the-art  interactive  image  processing 
system  for  the  analysis  of  Landsat  multispectral 
scanner  (MSS)  data.  The  analyst  has  available  a 
central  data  base  that  contains  valuable  data  records, 
such  as  historical  and  current  Landsat  MSS, 
meteorological,  and  crop  statistics  data,  based  on  a 


unique  25-  by  25-nautical-mile  grid  system. 

"Resource  Modeling:  A Reality  for  Program  Cost 
Analysis"  describes  a tool  developed  for  manage- 
ment of  the  ATS.  Given  varying  requirements,  the 
cost  model  can  quickly  assess,  allocate,  and  manage 
ATS  resources.  The  model  also  provides  budget  pro- 
jections and  comparisons  and  personnel  staffing  re- 
ports. 

The  ATS  will  continue  to  test  and  evaluate 
satellite  and  computer  processing  and  analysis  tech- 
niques in  terms  of  their  applicability  to  USDA  infor- 
mation needs.  The  ATS  state-of-the-art  modular 
design  can  readily  accommodate  changes  and 
therefore  can  be  easily  modified  or  augmented  to 
support  future  USDA  requirements. 
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The  Application  Test  System: 

An  Approach  to  Technology  Transfer 

A . C.  Aaronson* K.  Biilow,0 F.  C.  David, a R. L Packard,0 andF.  W. Ram b 


INTRODUCTION 

The  Application  Test  System  (ATS)  of  the  U.S. 
Department  of  Agriculture  (USDA)  was  imple- 
mented to  test  and  evaluate  the  latest  satellite  and 
computer  processing  and  analysis  technologies  in 
terms  of  their  application  feasibility  by  the  USDA. 
Technologies  to  be  evaluated  include  those 
developed,  tested,  and  evaluated  by  the  LACIE,  as 
well  as  candidate  technologies  developed  by  the 
research  community  and  private  industry.  This 
paper  presents  some  background  leading  to  the  im- 
plementation of  the  ATS  and  discusses  the  tech- 
nology transfer  experience  between  the  LACIE  and 
the  ATS,  highlighting  the  approach,  the  achieve- 
ments, and  the  shortcomings. 


CONCEPT  AND  APPROACH 

Technology  transfer  is  a term  most  often  used  in 
the  scientific  community  to  define  the  movement  of 
technical  capabilities  from  a research  and  develop- 
ment (R&D)  environment  to  a user-oriented  group 
for  application  in  an  operational  program.  Although 
the  basic  term  is  simple  in  definition,  the  actual 
transfer  of  technology  is  not  a simple,  straightfor- 
ward process.  A major  problem  area  is  the  lack  of 
effective  interaction  between  the  R&D  community 
and  the  end  user.  Those  dements  of  the  technology 
which  must  be  evaluated  by  the  end  user  and  con- 
sidered by  the  R&D  community  before  the  transfer 
is  consummated  include 

1.  Technology  applicability  to  user  needs 

2.  Technology  cost/benefit  trade-offs 

3.  Personnel  training  in  the  use  of  the  technology 


aU.S.  Department  of  Agriculture.  Houston,  Texas. 
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4.  Impact  of  changes  in  technology 
The  sections  that  follow  discuss  some  of  the 
LACIE  and  ATS  experiences  for  each  of  the  preced- 
ing user  considerations  in  achieving  effective 
transfer  of  technology.  It  is  the  belief  of  USDA  man- 
agement that  the  ATS  can  be  a significant  vehicle  for 
transferring  satellite  remote-sensing  technology  to 
the  USDA. 


Technology  Applicability  to  User  Needs 

The  technologies  developed  by  the  LACIE  were 
designed  to  support  an  experiment  objective  of  pro- 
viding end-of-season  wheat-production  estimates 
that  were  within  .10  percent  of  true  production  9 of  10 
years.  When  this  performance  criterion  was  docu- 
mented, explicit  USDA  aerospace  and  remote- 
sensing  information  requirements  were  unknown. 
The  basic  premise  throughout  LACIE  was  that  a 
country-level  production  estimate  was  an  absolutely 
essential  ingredient. 

Prior  to  the  LACIE,  the  USDA  had  been  for- 
mulating departmental  requirements  which  could  be 
satisfied  by  using  remotely  sensed  data.  The  work  of 
the  USDA  Remote  Sensing  User  Requirements  Task 
Force  (RSURTF)  solidified  a basic  list  of  depart-, 
mental  remote-sensing  requirements  in  December 
1975  after  the  start  of  the  LACIE.  Without  a set  of 
specific  user  requirements,  the  LACIE  established 
the  requirement  to  inventory  wheat  production  in  a 
number  of  LACIE-selected  countries.  The  LACIE 
goals  were  later  modified  to  accommodate  a number 
of  user  needs  published  in  the  list  of  RSURTF  re- 
quirements. The  timeliness  and  accuracy  criteria  for 
the  LACIE  wheat  production  reports  were  impacted 
by  this  list  of  user  requirements. 

Ideally,  a well-defined  set  of  user  requirements  to 
guide  the  establishment  of  project  objectives  should 
have  existed  before  LACIE  was  begun  (see  the  ple- 
nary paper  by  Murphy  et  al.  entitled  “Technology 
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Transfer:  Concepts,  User  Requirements,  and  a Prac- 
tical Application"  for  more  detail).  The  LACIE 
could  then  have  been  in  a position  to  test  and  evalu- 
ate system  components  that  were  more  directly 
responsive  to  user  requirements.  Nevertheless,  it  was 
possible  for  LACIE  to  essentially  respond  to  a num- 
ber of  user  requirements  during  the  197S-77  time 
frame. 

Recently,  the  USDA  has  modified  and  enlarged 
their  remote-sensing  requirements.  This  is  reflected 
by  the  Secretary  of  Agriculture's  “Initiative  for  Aero- 
space Technology"  released  in  April  1978.  The  Secre- 
tary’s Initiative,  based  on  close  cooperation  with 
NASA,  the  Agency  for  International  Development, 
and  the  Departments  of  Interior  and  Commerce,  in- 
cludes a priority  listing  of  USDA’s  ^formation  re- 
quirements that  remote-sensing  technology  could 
support.  The  seven  information  requirements  are 

1.  Early  warning  of  changes  affecting  production 
and  quality  of  renewable  resources 

2.  Commodity  production  forecasts 

3.  Renewable  resources  inventory  and  assess- 
ment 

4.  Land  use  classification  and  measurement 

5.  Land  productivity  estimates 

6.  Conservation  practices  assessment 

7.  Pollution  detection  and  impact  evaluation 

These  new  priorities  documented  by  the  user  will 

serve  as  primary  guidelines  to  future  programs  using 
remotely  sensed  data.  In  the  near  future,  LACIE 
technology  will  be  expanded  from  a single-crop  ap- 
plication to  a multicrop  application.  The  LACIE 
techniques  and  planned  follow-on  applications  es- 
sentially address  the  Secretary  of  Agriculture's  infor- 
mation requirements  for  commodity  production 
forecasts.  In  response  to  the  Secretary's  Initiative, 
the  ATS  is  expanding  its  original  design  and  imple- 
mentation plans  of  inventorying  wheat  production 
like  the  LACIE  to  test  a crop  condition  assessment 
system  that  will  detect  and  assess  the  impact  of  ab- 
normal events  on  agricultural  production  (see  the 
paper  by  May  et  al.  entitled  “The  Application  Test 
System:  Experiences  to  Date  and  Future  Plans"  for 
more  detail).  The  ATS  will  also  continue  to  have  the 
technical  and  analytical  components  required  to  im- 
plement a commodity  production  inventorying 
system  developed  and  successfully  demonstrated  in 
the  LACIE  (see  the  paper  by  Evans  et  al.  entitled 
“Functional  Definition  and  Design  of  a USDA 
System"  for  more  detail).  This  capability  will  be 
tested  and  evaluated  in  the  ATS  during  the  next  two 
crop  seasons. 


Technology  Coat/Benefit  Trade-Offs 

The  ATS,  together  with  USDA  management  in 
Washington,  D.C.,  will  assess  the  immediate  and 
future  benefits  of  the  information  produced  by 
remote-sensing  technology.  Of  equal  importance  is 
an  assessment  of  the  cost-effectiveness  of  the  tech- 
nology (see  the  paper  by  Fouts  and  Hurst  entitled 
“Resoutce  Modeling:  A Reality  for  Program  Cost 
Analysis"  for  more  detail). 

Presently . ATS  emphasis  is  on  testing  and  evalua- 
tion of  information  produced  by  the  crop  condition 
assessment  system.  The  ATS  will  perform  tests  of 
the  system  over  selective  agricultural  areas  of  the 
world  in  order  to  evaluate  ATS  output  products  and 
system  cost.  The  ultimate  decision  to  transfer  selec- 
tive techniques  used  by  the  crop  condition  assess- 
ment system  (such  as  Green  Index  Number  in- 
terpretations and  yield  model  estimates)  to  the  end 
user  will  be  made  by  USDA  management  in  Wash- 
ington, D.C.  The  USDA  management  will  review 
ATS  technical  tests  and  evaluations,  as  well  as  cost 
evaluations,  to  assist  in  their  decision  on  the  transfer 
of  the  selective  techniques  to  operational  elements. 

The  flexibility  of  the  ATS  computer  hardware 
design  and  central  data  base  system  makes  the  ATS 
cost-effective  as  well  as  easily  adaptable  to  applica- 
tions testing.  The  modular  hardware  design  is  com- 
posed of  minicomputers,  high-density  disk  drives, 
graphics  terminals,  interactive  image  analysis  sta- 
tions, and  other  supportive  equipment  (see  the  paper 
by  Benson  et  al.  entitled  “The  Application  Test 
System:  Technical  Approach  and  System  Design" 
for  more  detail).  The  central  data  base  is 
geographically  oriented  and  will  store  historical  and 
current  Landsat  meteorological  and  collateral  data 
(see  the  paper  by  Driggers  et  al.  entitled  “Data  Base 
Design  for  a Worldwide  Multicrop  Information 
System"  for  more  detail).  The  data  base  will  support 
a wide  spectrum  of  application  needs  identified  by 
the  image  and  crop  analysts  as  well  as  the  everyday 
functional  needs  of  ATS  management  and  system 
development  teams. 


Personnel  Training  In  the  Use  of  the 
Technology 

The  training  of  human  resources  in  the  use  and 
application  of  a given  technology  is  as  much  a part  of 
the  technology  transfer  process  as  the  transfer  of 
concepts,  algorithms,  and  procedures.  Comprehen- 
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sive  tests  and  evaluations  of  candidate  technologies 
could  not  be  conducted  without  the  technical 
knowledge  needed  to  understand  and  implement  a 
given  technology.  Ultimately,  personnel  have  to  put 
the  new  procedures  and  techniques  to  use.  All  users 
of  the  technology  must  have  the  opportunity  to 
become  thoroughly  acquainted  with  the  new  tech* 
nology  before  its  adoption. 

Because  of  this  USDA  management  philosophy, 
USDA  personnel  were  assigned  to  operational  pro- 
ject elements  during  the  LACIE  to  gain  a better  un- 
derstanding of  remote-sensing  technology.  Personnel 
assigned  to  the  Crop  Assessment  Subsystem  (CAS) 
of  the  LACIE  prepared  monthly,  unscheduled,  and 
end-of-year  crop  production  reports  for  each  of  the 
active  LACIE  countries.  In  addition,  USDA  person- 
nel gained  experience  in  the  acquisition,  storage,  and 
retrieval  of  Landsat  multispectra!  scanner  (MSS) 
data  as  weli  as  in  the  interpretation  and  analysis  of 
the  spectral  data.  A group  of  USDA  analysts  partici- 
pated in  the  testing  and  evaluation  of  an  interactive 
image  processing  system  for  sample  segment  wheat 
area  determination.  These  analysts  became  familiar 
with  the  latest  techniques  in  MSS  data  analysis. 

The  USDA  personnel  were  also  actively  involved 
in  testing  and  evaluating  the  development  of  crop 
yield  models,  crop  calendars,  and  other  crop- 
condition-related  programs.  Later  in  the  project, 
USDA  crop  analysts  helped  to  initiate  a LACIE  pro- 
gram that  used  vegetative  indexes  (transformed 
four-channel  MSS  data)  to  monitor  moisture  stress 
and  crop  condition. 

All  these  experiences  in  the  management, 
analysis,  and  reporting  components  of  the  LACIE 
during  the  period  1974-78  familiarized  a core  of 
USDA  personnel  with  the  latest  satellite  and  com- 
puter processing  and  analysis  technologies. 


Impact  of  Changes  In  Technology 

The  ATS  will  be  minimally  affected  by  expected 
changes  in  remote-sensing  technology  due  to  the 
implementation  of  a state-of-the-art  computer 
^system  that  is  designed  to  readily  accommodate 
change.  The  system  relies  on  off-the-shelf  compo- 
nents of  limited  specialization  and  is  capable  of  re- 
sponding to  state-of-the-art  developments  in  hard- 
ware and  software  technology.  The  ATS,  when 
directed  by  USDA  management,  can  augment  the 
present  system  configuration  with  additional 
minicomputers,  or  if  technological  advancements  or 


additional  needs  warrant,  it  can  replace  present  im- 
age analysis  hardware  with  new  and  improved  equip- 
ment that  may  be  developed  in  the  future. 


IMPLEMENTATION  APPROACH 


User  System*  Planning  and 
Application*  Tost  Group 

The  first  organized  USDA  effort  to  effect  an  appli- 
cation test  system  was  in  1976  when  the  User 
Systems  Planning  and  Applications  Test  Group 
(USPATG)  was  organized.  In  this  first  step,  10  auto- 
matic data  processing  (ADP)  experts  under  the  man- 
agement of  a USDA  user  were  dedicated  to  develop- 
ing a system  to  meet  USDA  remote-sensing  goals. 
They  were  given  the  responsibility  of  designing  and 
implementing  a system  capable  of  testing  and 
evaluating  LACIE  technology  with  respect  to  USDA 
requirements.  Initially,  the  USPATG  was  primarily 
composed  of  ADP  personnel.  Later,  USDA  crop 
analysts  formally  trained  in  the  use  of  interactive  im- 
age processing  systems  in  the  LACIE  were  added  to 
the  USPATG.  NASA  personnel  have  also  been 
assigned  to  the  USPATG  to  help  facilitate  the 
transfer  of  LACIE  technology. 

Initially,  the  ATS  implementation  approach  called 
for  (1)  ATS  personnel  to  establish  ATS  functional 
specifications  and  to  be  responsible  for  the  practical 
assessment  of  technologies  and  (2)  ATS  personnel, 
augmented  by  contractor  support,  to  be  responsible 
for  the  detailed  design  and  technical  implementation 
of  transferred  technologies;  ATS  personnel  will  con- 
tinue to  perform  these  functions,  augmented  by  con- 
tracts when  required.  In  all  cases,  the  ATS  is  respon- 
sible for  system  operation  and  the  preliminary 
evaluation  of  output  products.  Final  evaluation  of 
output  products  will  be  made  by  USDA  management 
in  Washington,  D.C. 


Mechanism*  for  Technology  Transfer 

The  primary  mechanisms  for  the  transfer  of 
LACIE  technologies  were  the  ATS  written  requests 
for  proposals  (RFP’s),  the  preliminary  and  critical 
design  reviews  (PDR  and  CDR),  the  ATS  Design 
Review,  the  Classification  Procedures  Advisory 
Team  (CPAT),  and  the  USDA  LACIE  personnel. 
During  the  ATS  development  period  (fig.  1),  RFP’s 
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were  written  to  establish  ATS  design  specifications 
and  the  software/hardware  composition  of  the  ATS. 
The  specifications  included  in  the  RFP's  were  an 
outgrowth  of  the  technologies  developed  by  the 
LACIE,  and  technical  project  personnel  reviewed 
the  RFP’s  for  consistency  with  state-of-the-art  tech- 
nology. Open  CDR’s  were  held  by  the  ATS  contrac- 
tors to  present  and  review  ATS/contractor  design 
specifications.  A mechanism  was  established  within 
the  CDR  that  allowed  non-ATS  personnel  (other 
LACIE  personnel)  to  submit  discrepancy  reports  on 
a particular  ATS  design  aspect. 

In  October  1977,  all  aspects  of  the  ATS  were 
reviewed  by  LACIE  personnel.  The  ATS  design 
review  was  held  over  a 2-day  period.  During  the  first 
day,  ATS  personnel  presented  details  of  present  and 
planned  system  capabilities.  During  the  second  day, 
review  participants  divided  into  small  groups  to 
review  particular  technical  and  analytical  procedures 
to  be  implemented  by  the  ATS,  including  analyst 
procedures,  yield  models,  accuracy  assessment  pro- 
cedures, data  base  design,  crop  condition  assessment 
approaches,  early-season  estimation  approaches,  and 
the  LACIE/ ATS  interface.  The  resultant  review  re- 
ports from  each  of  the  groups  were  used  to  redesign 
many  aspects  of  the  ATS  and  arc  now  part  of  the  cur- 
rent implementation  plan. 


The  CPAT  developed  early  in  LACIE  was  instru- 
mental in  facilitating  the  transfer  of  LACIE  sample 
segment  classification  technology  to  the  ATS.  The 
CPAT  was  composed  of  personnel  from  NASA, 
USD  A,  and  the  Lockheed  Electronics  Company 
(LEC). 

Meetings  were  held  between  CPAT  and  ATS  per- 
sonnel starting  in  March  1977  to  review  the  LACIE 
classification  procedures  and  to  determine  the  ap- 
propriate design  specifications.  The  knowledge 
gained  by  the  ATS  staff  was  later  used  to  draft  an 
RFP  to  acquire  these  same  capabilities.  Personnel 
representing  the  ATS  and  the  Ford  Aerospace  & 
Communications  Corporation  (FACC)  jointly 
designed  and  implemented  a computer  system  capa- 
ble of  supporting  the  testing  and  evaluation  of 
LACIE  classification  algorithms,  as  well  as  other 
analytical  techniques  and  procedures.  Tl’.e  LEC  has 
been  contracted  to  augment  the  ATS  classification 
technology  initially  delivered  by  FACC  to  include 
Procedure  1 (P-1),  a key  procedure  developed  in 
LACIE. 

The  USDA  LACIE  personnel  were  instrumental 
in  transferring  the  knowledge  needed  to  implement 
many  of  the  LACIE  techniques.  As  stated  pre- 
viously, USDA  personnel  were  an  everyday  working 
part  of  the  LACIE  and,  through  their  exposure  to 
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daily  operations,  became  familiar  with  many  LACIE 
techniques.  The  successful  implementation  of  many 
of  the  LACIE  techniques  by  the  ATS  can  be  credited 
to  the  ATS  technical  staff  trained  in  the  LACIE. 


TECHNOLOGY  TRANSFERRED 

The  ATS  has  transferred  an  assortment  of  techni- 
cal and  analytical  capabilities  to  support  the  imple- 
mentation, testing,  and  evaluation  of  a crop  condi- 
tion assessment  system  and  a production  inventory- 
ing system.  Limited  ATS  developmental  work  has 
also  been  done.  Those  technologies  transferred  from 
the  LACIE  and  presently  undergoing  further  luge- 
area  testing  and  evaluation  are  itemized  and  sum- 
marized in  the  following  paragraphs. 


Sample  Segment  Classification  Algorithms 

During  the  LACIE,  a number  of  sampte  segment 
classification  procedures  designed  to  produce  a sam- 
ple segment  wheat  area  estimate  were  developed, 
tested,  and  evaluated  aVer  varying  agricultural  areas 
and  conditions.  The  ATS  has  adopted  the  following 
classification  procedures  from  the  LACIE:  (1)  the 
Analyst-Selected  Training  Fields  procedure,  (2)  the 
Designated  Crop  procedure,  and  (3)  Procedure  1 or 
the  Preselected  Training  Fields  procedure.  The  ATS 
philosophy  is  to  utilize  all  three  procedures  as  analyst 
options  for  classifying  a sample  segment.  Each  pro- 
cedure has  advantages  over  the  others  under  certain 
sample  segment  conditions.  For  example,  P-1 
worked  fairly  well  in  areas  having  small,  randomly 
distributed  fields  and  heterogeneous  signatures,  but 
it  was  less  optimal  (more  time-consuming)  in 
agricultural  areas  having  relatively  large  fields  and 
homogeneous  spectral  signatures.  In  the  latter  case, 
the  Designated  Crop  procedure  would  have  been  a 
more  optimal  analyst  procedure  to  implement. 

The  ATS  acquired  the  Integrated  Multivariate 
Data  Analysis  and  Classification  System  (IMDACS) 
from  the  FACC  in  part  to  implement  the  Designated 
Crop  and  the  Analyst-Selected  Training  Fields  pro- 
cedures. Although  IMDACS  was  not  designed  for 
the  ATS  but  rather  is  off-the-shelf  software,  ATS 
analysts  have  implemented  the  previously  men- 
tioned procedures  from  the  various  IMDACS 
capabilities.  The  LEC,  under  a separate  contract,  has 
expanded  processing  options  by  augmenting 
IMDACS  with  the  P-1. 


Sampling  Strategy  and  Production 
Aggregation  Software 

The  LACIE  implemented  a sampling  strategy 
designed  to  provide  end-of-season  wheat  production 
estimates  that  were  within  90  percent  of  true  produc- 
tion 9 of  10  years.  The  LACIE  sampling  strategy  has 
developed  from  one  which  allocated  sampie  seg- 
ments by  political  subdivision  to  the  latest  strategy 
‘hat  allocates  sample  segments  to  relatively 
homogeneous  agricultural  areas  called  agrophysical 
units  or  A PC's.  The  latest  sampling  strategy  design 
was  implemented  to  gain  sampling  efficiency;  i.e.,  to 
reduce  the  number  of  segments  required  to  achieve 
an  end-of-season  90/90  accuracy  goal. 

The  ATS  will  perform  a large-area  test  of  the 
LACIE-developed  sampling  strategies  and  produc- 
tion aggregation  algorithms.  The  ATS  will  test  and 
evaluate  the  APU  and  political  subdivision  ap- 
proaches for  sample  segment  allocation  and  produc- 
tion aggregation  in  large  areas  of  the  United  States 
and  the  U.S.S.R.  during  1978-79.  Although  the  ATS 
has  not  installed  the  LACIE-developed  production 
aggregation  software,  the  ATS  will  utilize  the 
algorithms  available  at  the  LACIE.  Future  install- 
ment of  these  algorithms  on  ATS  equipment  is  pres- 
ently being  considered. 

Currently,  the  ATS  is  evaluating  the  sample 
design  and  resulting  allocation  to  support  a crop  con- 
dition assessment  system.  The  sample  segments  will 
be  assessed  for  crop  condition  by  USDA  analysts 
using  various  MSS  data  transforms. 


Yield  Model* 

During  the  LACIE,  operational  yield  estimates 
were  derived  from  the  Center  for  Climatic  and  En- 
vironmental Assessment  (CCEA)  regional  yield 
models.  The  LACIE  Research,  Test,  and  Evaluation 
Group  also  evaluated  the  Kansas  State  University 
(KSU)  yield  model  for  future  applications.  The  ATS 
will  further  test  and  evaluate  these  models  over  large 
areas  in  the  United  States  and  the  U.S.S.R. 


Crop  Calendar  Models 

The  LACIE  utilized  the  CCEA  Crop  Calendar 
Model  to  determine  the  timing  of  wheat  growth 
stages.  The  crop  calendar  information  was  used  by 
LACIE  crop  analysts  in  the  spectral  analysis  of  the 
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MSS  data.  Understanding  the  relationship  between 
the  MSS  data  and  the  wheat  crop  calendar  made  the 
analyst  job  of  MSS  picture  element  (pixel)  labeling 
easier.  The  ATS  obtained  crop  calendar  results  from 
the  L ACIE  for  sample  segments  in  the  United  States 
and  the  U.S.S.R.  during  the  1978  crop  year. 

Additionally,  the  ATS  will  test  and  evaluate  the 
KSU  crop  calendar  model,  a subroutine  of  the  KSU 
yield  model.  The  KSU  and  CCEA  crop  calendar 
results  will  be  jointly  tested  and  evaluated  against 
ground-truthed  crop  calendar  information  obtained 
from  sample  sites  in  the  United  States. 


Vegetative  index  Approach 
to  Crop  Condition  Aeeetement 

The  ATS  will  further  test  and  evaluate  the  use  of 
vegetative  indexes,  such  as  the  Kauth-Thomas  green 
number,  for  crop  condition  assessment.  A vegetative 
index  number  is  transformed  from  the  raw  MSS 
digital  data  and  is  used  for  crop  vigor  assessment. 
The  LAClp  used  the  vegetative  index  approach  to 
monitor  crop  condition  and  soil  moisture  conditions 
in  & large  test  region  in  the  United  States.  These 
vegetative  index  numbers  and  their  interpretations 
will  support  the  operation  and  evaluation  of  the  ATS 
crop  condition  assessment  system. 

The  paper  by  May  et  al.  entitled  “The  Application 
Test  System:  Experiences  to  Date  and  Future  Plans” 
discusses  the  implementation  aspects  of  each  of  the 
LACIE-developed  techniques  transferred  to  the 
ATS. 


EVALUATION  OF  THE  TECHNOLOGY 
TRAN8FER  EXPERIENCE 

A surface-level  assessment  of  the  technology 
transfer  process  between  the  L ACIE  and  the  ATS  in- 
dicates a successful  transfer.  However,  a more 
tenuous  assessment  of  the  technology  transfer  proc- 
ess indicates  that  the  process  was  extremely  difficult, 
primarily  due  to  the  absence  of  an  established 
mechanism  within  the  LACIE  to  facilitate  the 
transfer  of  the  technology  short  of  a “turnkey”  ap- 
proach. This  approach  does  not  conform  to  the 
changing  needs  of  the  user.  The  ATS  approach  to 
technology  transfer  is  to  test  and  evaluate  tech- 
nologies the  ATS  assessed  to  support  USDA  infor- 
mation requirements  when  these  requirements 
became  known. 


Initially,  the  ATS  design  effort  was  to  support  the 
further  large-area  testing  of  a wheat  production  in- 
ventorying system  similar  to  the  LACIE.  With 
broadening  direction  from  USDA  management  in 
support  of  the  Secretary  of  Agriculture's  Initiative 
for  Aerospace  Technology,  the  system  has  to  be 
capable  of  responding  to  a number  of  information  re- 
quirements, including  the  early  warning  of  changes 
affecting  production  and  quality  of  renewable 
resources.  Since  the  ATS  approach  called  for  a 
system  design  that  was  flexible,  it  has  been  relatively 
easy  to  adapt  to  changing  USDA  information 
requirements. 

The  LACIE/ ATS  technology  transfer  experience 
clearly  identified  the  need  to  define  specific  end-user 
requirements  before  the  design  implementation  and 
testing  of  new  techniques  and  analysis  capabilities 
and/or  to  provide  » mechanism  within  a project  to 
incorporate  cha  sting  or  modified  user  requirements 
(i.e.,  technology  development  must  be  responsive  to 
end-user  requirements). 


CONCLUSIONS 

The  LACIE/ ATS  technology  transfer  experience 
is  not  an  example  of  optimal  technology  transfer 
design.  Certain  aspects  of  this  experience  were  ex- 
tremely beneficial.  First,  the  ATS  designed,  imple- 
mented, and  tested  a computer  system  capable  of 
supporting  the  testing  and  evaluation  of  LACIE 
technologies  as  well  as  technologies  transferred  from 
the  research  community  and  private  industry.  Sec- 
ond, the  ATS  incorporated  many  of  the  LACIE  tech- 
niques and  analytical  procedures  into  its  operations, 
such  as  classification  algorithms,  sampling  strategy, 
yield  models,  and  crop  calendar  models.  Third,  the 
ATS  is  staffed  by  personnel  trained  in  the  use  of 
LACIE  techniques  and  procedures. 

The  unavailability  of  specific  user  requirements 
before  the  start  of  the  LACIE  complicated  the  tech- 
nology transfer  experience  between  the  LACIE  and 
the  ATS.  As  a result,  the  ATS  is  testing  and  evaluat- 
ing those  LACIE  and  non-LACIE  techniques  and 
procedures  that  could  support  USDA  information  re- 
quirements. In  this  regard,  the  ATS  implementation 
of  a flexible  system  design  adaptable  to  changing 
user  requirements  is  proving  to  be  a cost-effective 
decision. 
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Functional  Definition  and  Design  of  a U8DA  System 

5.  M.  Evans,0  £.  R.  Dario,0  and  G.  L.  Dickinson 0 


INTRODUCTION 

During  the  initial  phases  of  LACIE  development, 
it  was  the  general  intent  of  the  U.S.  Department  of 
Agriculture  (USDA)  to  exploit  the  knowledge 
derived  from  the  LACIE  and  to  incorporate  the 
verified  technology  into  an  operational  system. 
Thus,  as  the  LACIE  evolved,  the  concept  of  testing 
the  technology  in  a near-operational  environment  for 
transfer  to  a user  system  also  evolved. 

This  paper  discusses  the  functional  definition  and 
design  of  a USDA  system  utilizing  the  LACIE  tech- 
nology available  as  of  June  1976.  The  organization 
and  methods  described  herein  are  focused  on  LACIE 
technology  in  terms  of  its  transfer  for  user  applica- 
tions. They  are  conceptual  only  and  do  not 
necessarily  reflect  the  system  that  is  being  imple- 
mented on  behalf  of  the  USDA. 


Conatralnta 

In  the  design  and  definition,  it  is  intended  that  the 
system  be  responsive  to  USDA  user  requirements 
and  utilize  the  most  cost-effective  technology 
developed  and  tested  during  the  LACIE.  This 
guideline,  as  stated  in  the  Management  Plan  for  the 
User  Advanced  System  Design  (ref.  1),  necessitated 
that  constraints  be  placed  on  the  formulation  of  a 
design. 

The  available  manpower  to  operate  a USDA 
system  was  determined  to  be  approximately  60  per- 
sons. To  effectuate  a system  utilizing  this  number 
and  limited  equipment  resources,  a 5-day,  2-shift 
operation  was  provided  in  the  design.  The  varied 
agricultural  disciplines  involved  and  the  probable 
lack  of  analysts'  familiarity  with  automatic  data  proc- 
essing (ADP)  techniques  were  factors  in  the  decision 
to  use  menu-driven  software  in  the  user  system, 
where  feasible. 
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Evolving  technology  in  processing  remotely 
sensed  data,  the  need  to  upgrade  equipment,  and 
changes  in  data  sources  and  user  requirements  re- 
quired that  the  system  be  flexible.  Security  precau- 
tions for  safeguarding  crop  estimate  data  were  car- 
ried through  each  section  of  the  design.  The  USDA 
guidelines  for  generating,  storing,  and  transporting 
sensitive  data  were  used. 

A 7-day  processing  time  from  receipt  of 
multispectral  scanner  (MSS)  data  through  report 
generation  was  levied  as  a requirement  for  system 
throughput.  This  necessitated  that  the  design  provide 
the  capability,  on  the  average,  to  process  a 117-  by 
196-pixsi  segment  in  1.5  hours. 

An  additional  design  constraint  provided  that 
Landsat  MSS  data  be  utilized  with  flexibility  for 
future  sensor  systems.  In  accordance  with  the  design, 
nonspectral  data  would  be  formatted  into  a gridded 
system  using  a 25-nautical-mile  grid  with  each  grid 
divided  into  quadrants. 

The  basic  concept  for  a flexible  system  was  that 
the  total  system  be  operated  through  tht,*  data  base. 
The  system  also  would  be  kept  modular  so  that 
changes  in  algorithms,  new  data  sources  and  addi- 
tional hardware  and  software  might  be  readily  inte- 
grated to  enhance  or  replace  established  components. 
Also,  the  use  of  standard  off-the-shelf  hardware 
would  provide  for  relatively  easy  upgrading. 

All  system  software  developed  locally  would  be 
written  in  the  COBOL  and/or  FORTRAN  languages, 
in  accordance  with  USDA  standards.  Exceptions  to 
this  rule  would  be  on  a case-by-case  basis  with  suffi- 
cient justification  to  support  them. 

An  analyst  “team"  approach  was  mandatory  in 
order  to  ensure  availability  of  regional  agricultural, 
ADP,  and  meteorological  expertise  for  the  process- 
ing of  Landsat  data.  The  team  would  be  constructed 
with  each  of  the  above-mentioned  disciplines  repre- 
sented, according  to  their  approximate  respective 
proportions  of  usage. 

All  software  for  analyst  use  would  be  tutorial  with 
emphasis  on  relieving  the  analyst  of  the  respon- 
sibility of  knowing  system  commands.  This  required 
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the  use  of  menus  for  appropriate  process  selection 
and  supplemented  error  condition  information,  with 
probable  solution  steps  displayed  interactively. 
These  constraints  are  described  in  greater  detail  in 
terms  of  functional  requirements  in  a study  (ref.  2) 
done  by  the  MITRE  Corporation  prior  to  the  USDA 
design  effort. 


OtyeetivM  and  Ooala 

The  USDA  goals  that  were  stated  in  the  LAC1E 
Memorandum  of  Understanding  dated  September  I, 
1974,  and  the  USDA  decision  to  commit  resources  to 
this  agreement  were  evolutionary  in  nature.  Simply 
stated,  these  goals  were 

1.  To  participate  in  the  development  of  a wheat 
estimation  system  through  the  exploitation  of  data 
collected  by  Earth-orbital  satellite  systems  or  by 
other  systems  operated  by  NASA  and  die  National 
Oceanic  and  Atmospheric  Administration  (NOAA) 
of  the  U.£  Department  of  Commerce 

2.  To  validate  and  assist  in  optimizing  the  LACIE 
technology  developed  by  NASA  and  NOAA 

3.  To  train  multidisciplined  USDA  analysts  in 
LACIE  techniques  and  related  technology 

4.  To  transfer  elements  of  the  optimized  tech- 
nology to  a USDA  operation?,  environment  based  on 
proven  cost  effectiveness 

5.  To  apply  experience  gained  in  the  LACIE  to 
assess  the  potential  of  other  feasible  projects  iden- 
tified by  the  USDA  Remote  Sensing  User's  Require- 
ment Task  Force 

While  the  User  Advanced  System  Design  met  all 
the  objectives  stated  above,  a set  of  more  detailed 
goals  was  established  to  guide  the  technical  defini- 
tion and  design.  These  objectives  provided  direction 
in  terms  of  system  configuration,  system  reliability, 
accuracy  of  results,  and  methodology  employed. 
They  are,  in  descending  order  of  importance,  timeli- 
ness. accuracy,  objectivity,  and  continuity. 

Microaccuracy  was  not  as  important  as  timeliness 
of  information.  This  was  based  upon  USDA  manage- 
ment's decision  that  a crop  estimation  system  must 
be  able  to  deliver  a wheat  production  estimate  by  late 
March  or  early  April  (ref.  3.  sec.  3.0). 

Accuracy  was  not  dismissed  as  unimportant  but 
was,  however,  treated  in  a practical  manner.  At  that 
poin;  in  time,  the  LACIE  accuracy  criterion  was 
90/90  (that  the  LACIE  U S.  Great  Plains  at-harvest 
estimate  be  within  10  percent  of  the  true  value,  with 
a probability  of  at  least  0.9).  It  was  recognized  that 


drastic  system  desip  modification  and  reconfigura- 
tion might  not  be  predicated  on  an  89-percent  ac- 
curacy level  for  a given  crop  year. 

U was  recognized,  also,  that  the  design  and 
methodology  utilized  should  not  negate  the 
agricultural,  economic,  geographic,  statistical,  and 
other  expertise  contributed  by  a USDA  analyst  with- 
in the  system.  On  the  other  hind,  procedures  were 
intended  to  ensure  some  continuity  in  the  wsy  data 
were  processed  so  that  subjective  input  would  not 
tend  to  distort  the  end  results. 


ORGANIZATION 

The  following  sections  briefly  describe  each  of  the 
organisational  elements  in  figure  1.  Definitions  ere 
primarily  concerned  with  basic  functions  and  types 
of  personnel  within  each  element. 


Project  Management 

Project  management  is  a policymaking  adminis- 
trative role  with  ultimate  responsibility  for  the  entire 
system.  The  system  is  complex  and  sophisticated, 
requiring  specialized  and  experienced  personnel  to 
perform  the  daily  tasks.  Top-level  management  in- 
terfaces with  the  system  through  departmental  ex- 
ecutive management. 


Technical  Staff 

The  technical  staff  is  administratively  controlled 
by  project  management  and  provides  management  a 
pool  of  resources  to  be  utilized  as  required.  The  tech- 
nical staff  represents  a specific  skill  mix;  i.e.,  com- 
puter specialists,  systems  engineers,  budget  analysts, 
economists,  systems  analysts,  soil  scientists, 
agronomists,  meteorologists,  and  remote-sensing 
scier.  lists. 

Members  of  this  group  are  responsible  for 
developing  and  testing  applications  and  special- 
purpose  software  and  performing  systems  mainte- 
nance. Support  to  perform  these  functions  is  re- 
quired oy  the  data  base  management  group. 

Specialists  on  the  technical  staff  are  responsible 
for  analysis  activity.  A regionally  oriented  team  con- 
cept is  planned.  Since  circumstances  and  politics 
could  cause  sudden  and  major  emphasis  shifts,  the 
organization  is  loosely  structured  to  permit 
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FIGURE  1.— UKDA  •nanittflaml  riMWMs. 


specialists  in  the  various  disciplines  to  contribute  as 
priorities  demand. 

The  technical  staff  supports  the  research,  opera- 
tions. and  systems  management  components  as 
resources  and  priorities  allow 


Research  and  Development  (RAD)  Staff 

Senior  scientists  from  all  disciplines  are  required 
to  perform  RAD  tasks  and  functions. 

Requests  for  research  in  a given  area  may  be  initi- 
ated by  any  User  Advanced  System  <UAS)  staff 
member,  but  project  management  would  be  the  ap- 
proving authority  for  the  research.  Additionally,  the 
RAD  staff  supports  production  operations  by  special 
studies  of  episodic  events  or  day-to-day  processing,  if 
required.  The  RAD  stafT  is  controlled  adminis- 
tratively by  project  management. 


Administrative  Staff 

The  two  primary  functions  of  the  administrative 
suff  are  personnel-related  services  and  management 
assistance.  The  personnel  functions  include  such 
responsibilities  as  payroll,  insurance,  and  general 
recordkeeping.  The  management  assistance  func- 
tions include  such  responsibilities  as  budget  prepara- 
tion, contract  services,  facilities  maintenance,  and 
purchasing.  The  administrative  staff  reports  directly 
to  the  Project  Manager. 


Systems  Management  Component 

The  primary  function  of  the  systems  management 
component  is  to  serve  as  a coordinating  and  integra- 
ting unit  for  advanced  system  responses  to  require- 
ments. Data  requirements  are  translated  into  specific 
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acquisition,  analysis,  and  processing  activities. 
Responsibilities  of  the  systems  management  compo- 
nent include  the  following. 

1.  Tracking  the  status  of  responses  to  specific  re- 
quirements through  the  use  of  a change  control 
board  or  panel. 

2.  Assessing  the  impact  and  potential  value  of  im- 
plementing new  requirements  or  techniques  into  the 
operational  system.  Through  discussions  and  studies, 
it  determines  the  additional  hardware  required;  com- 
puter software  support  available  or  required  to  be 
written;  data  available  to  meet  processing  require- 
ments; and  impacts  on  daily  operations,  on  analysis 
activities,  and  on  the  data  base.  Based  on  these  deter- 
minations, a management  decision  is  made  to  imple- 
ment or  not  to  implement  the  proposed  requirement 
or  technique.  If  approved,  the  systems  management 
component  coordinates  the  implementation  and  en- 
sures the  integration  of  s thoroughly  tested  module. 

3.  Serving  es  the  communications  Unk  to  the  en- 
vironment with  *hkh  the  operational  system  must 
interface.  Any  changes  in  legislative  policy  effecting 
change*  in  USDA  operations  are  coordinated  by  this 
group. 

4.  Providing  special  studies  or  reports  requested 
by  organizations  other  than  those  considered  part  of 
the  production  system  and  the  USDA.  For  example, 
a request  by  s member  of  Congress  or  s commercial 
organization  could  be  coordinated  with  the  USDA's 
Congressional  Liaison  Office  and  appropriate 
marketing  and  public  relations  offices  as  required. 
This  request  would  then  be  assessed  for  impact  and 
tracked  through  the  system  until  completion  of  im- 
plementation or  the  decision  to  not  implement 


Oporatlone  Section 

The  operations  section  is  responsible  for  the 
following. 

1.  Scheduling  and  controlling  all  day-to-day  ac- 
tivities of  the  production  system  requiring  the  use  of 
the  analysis  stations  snd  their  associated  general- 
purpose  computers  (or  special-purpose  hardware),  or 
other  related  services  such  as  keypunching 

2.  Hardware  maintenance,  computer  system 
operations  and  monitoring,  snd  tape  and  disk  pack 
library  establishment  snd  maintenance 

The  operation*  staff  is  administratively  controlled 
by  project  management.  The  section  has  s shift 
leader  and  an  aid  on  each  shift  to  review  system  per- 


formance data  and  ensure  that  corrective  actions  are 
taken. 


Vni  VMv  ACRWnilVIlOf 

The  responsibilities  of  the  data  base  administrator 
include  the  following. 

1.  Day-to-day  technical  control  of  the  production 
system  dau  base,  a targe,  integrated,  complex  struc- 
tute  serving  users  with  a wide  variety  of  data  and 
data  processing  requirements 

2.  Control  of  the  logical  snd  physical  dau  base 
structuring,  assuring  the  security  and  integrity  of  the 
dau  base  (including  recovery  mechanisms),  and 
granting  access  to  the  users 

3.  Control  of  the  purging  snd  sui^equent  releas- 
ing of  space  for  any  dau  item  to  be  removed  from 
the  system 

4.  Assessing  the  impact  of  user  requests  on  the 
entire  community  and  making  decisions  as  to  which 
capabilities  are  most  practical  or  critical  to  be  imple- 
mented, based  on  r anagement-essigned  priorities, 
imptemenution  costs,  resource  availability,  and 
other  considerations 

The  dau  base  administrator  reports  adminis- 
tratively to  project  management. 


COMPONENTS 

The  USDA  requires  a closed-loop  information 
system  (fig.  2).  This  loop  indicates  the  use  of  MSS 
snd  meteorological  dau  to  perform  the  identification 
end  mensuration  of  crop  type  and  condition.  Gener- 
ated reportt  are  then  transmitted,  along  with  other 
system  producu,  to  USDA  evaluators  who  refine  the 
information  for  a final  product  to  be  released  to  the 
public. 

The  closing  of  the  loop  allows  the  evaluators  to 
issue  requirements,  because  of  product  deficiencies 
or  changing  missions,  to  systems  management.  Man- 
agement identifies  impacts  and  develops  tha* 
changes  which  are  justified.  Requirements  from 
public  policy  could  also  be  input  to  systems  manage- 
ment. 

The  following  paragraphs  describe  each  of  the 
components  in  the  diagram  with  two  exceptions. 

1.  The  systems  management  component  has  been 
defined  in  the  preceding  section. 

2.  The  dau  base  component,  though  not  indicated 
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FIGURE  2. — USDA  system  product  and  information  flow. 


on  the  chart,  will  be  defined  since  it  provides  much 
of  the  interface  between  system  components. 

The  components  in  figure  2 are  composed  of  hard- 
ware, software,  and  procedures,  whereas  the 
orgrinizational  elements  in  the  preceding  section 
define  personnel  and  policy  areas.  The  interaction 
between  these  two  structures  is  evident  in  figure  1, 
with  the  data  base  component  separating  the  two. 


Data  Acquisition  Component 

The  functions  of  the  data  acquisition  component 
are  (1)  to  serve  as  the  focal  point  for  transmitting 
data  requests  to,  and  receiving  data  from,  sources  ex- 


ternal to  the  system;  and  (2)  to  perform  require- 
ments processing,  preprocessing  of  image, 
meteorological,  and  ancillary  data,  and  communica- 
tions processing.  With  respect  to  requests  for  specific 
full-frame  acquisitions  and  the  transmission  of  data 
from  the  NASA  Goddard  Space  Flight  Center 
(GSFC),  the  data  acquisition  component  is  required 
to  perform  the  following  tasks. 

1.  Store  data  on  large-capacity  random-access 
devices  as  they  are  received. 

2.  Perform  cloud-cover  and  quality  checks  on  the 
segments,  extract  those  which  passed  the  editing, 
and  place  them  on  the  analysis  station  data  bases. 

3.  Send  reports  describing  the  number  and  quality 
of  segments  to  the  analysis  component. 
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4.  Prepare  a data  packet  containing  digital  image 
and  ancillary  data  for  the  analysis  activity. 

5.  Obtain  and  store  NOAA  and/or  U.S.  Air  Force 
(USAF)  agricultural/meteorological  (AGROMET) 
and  meteorological  data.  The  transmission  of  data  is 
via  communication  link  on  a periodic  basis;  the  data 
are  temporarily  stored  on  a random-access  device. 
Extraction  of  the  desired  data  and  its  required 
manipulation  and  placement  in  the  data  base  is  done 
on  a non-real-time  basis 

6.  Handle  and  control  other  types  of  data  such  as 
historical  data,  recent  attache  reports,  and  research 
results  available  from  various  agencies. 

7.  Reformat  hardcopy  items  in  digital  form,  if  re- 
quired, and  place  them  in  the  data  base,  or  place 
hardcopy  material  in  the  system’s  reference  library. 


Evaluation  Component 

The  evaluation  component  provides  the  advanced 
system  with  the  user  agency  interface,  product 
evaluation  support,  and  the  means  for  initiating  new 
or  changed  requirements  based  on  product  evalua- 
tion. 

Standard  reports  produced  by  the  reporting  com- 
ponent are  available  to  the  user  agency  facility  im- 
mediately following  validation.  The  schedule  for 
generating  these  reports  is  consistent  with  the  user 
agency  schedule  for  release  of  official  crop  estimates. 
Generally,  the  transmittal  reports  estimate  crop  area, 
yield,  and  production  to  the  country  level,  but  the 
capability  to  provide  estimates  to  the  lowest  level  re- 
quired is  available.  Historical  data  reports  are  also 
available. 

The  system  provides  for  generation  of  nonstand- 
ard data  requests  to  meet  specific  user  agency  needs. 
Designated  user  agency  personnel  identify  specific 
data  needed,  such  as  crop  effects  from  episodic 
events  or  correlation  of  meteorological  data  with  esti- 
mates of  area,  yield,  and  production.  Users  may 
access  the  data  via  interactive  displays  or  hardcopy 
reports. 

User  agency  personnel  evaluate  results  for  ac- 
curacy, utility,  and  timeliness  with  respect  to 
preestablished  schedules.  System  results  are  com- 
pared with  data  from  other  USDA  sources  for  ac- 
curacy. Utility  evaluation  was  to  consider  the  com- 
pleteness, accessibility,  and  usability  of  system 
results.  In  addition,  production  system  support  per- 
sonnel performed  an  analytical  evaluation  of  results 
to  improve  sampling  strategy  and  processing 


techniques. 

For  each  crop  year,  a Product  Evaluation  Plan  is 
planned  covering  (1)  assessment  of  system  results 
relative  to  data  from  other  USDA  sources,  such  as  at- 
tache repels  or  foreign  publications;  (2)  assessment 
of  system  results  using  ground-truth  (or  analogous) 
data;  and  (3)  simulation  studies. 


Analysis  Component 

Responsibilities  of  the  analysis  component  in- 
clude generating  estimates  of  crop  acreage,  yield,  and 
production  at  all  specified  geographical  hierarchical 
levels  within  the  seven  foreign  countries.  Standard 
statistics  at  these  levels  are  computed  for  acreage, 
yield,  and  production  and  combined  with  historical 
statistics  to  provide  estimates  of  the  analysis  compo- 
nent performance  accuracy.  Computed  estimates 
and  statistics  are  stored  in  the  data  base.  The  analysis 
component  also  generates  estimates  for  specified 
geographical  areas  associated  with  episodic  events. 

Operational  requirements  include  specifications 
of  (I)  geographical  areas  for  which  periodic  and 
unscheduled  reports  are  requested,  (2)  sampling 
strategy,  hierarchical  definition,  and  sample  unit 
allocation  plan,  (3)  sizing  parameters  to  control 
length  of  tables  and  memory  allocation,  and  (4)  data 
collection  requirements  for  the  data  acquisition  com- 
ponent, which  consist  of  three  functional  elements: 

1.  Classification  to  estimate  wheat  acreage  for 
sample  segments. 

2.  Yield  to  estimate  wheat  yield  for  the  yield 
strata. 

3.  Crop  aggregation  to  combine  results  from  the 
classification  and  yield  elements  and  to  compute  esti- 
mates of  wheat  acreage,  yield,  production,  and  stand- 
ard statistics  at  specified  hierarchical  levels.  Software 
is  used  to  make  reasonableness  checks  to  assist  in 
producing  a valid  product. 

The  classification,  yield,  and  crop  aggregation 
elements  are  designed  to  use  maximum  analyst  in- 
teraction during  initial  operations  and  incorporate 
techniques  requiring  less  interactive  control  as  such 
techniques  are  verified.  The  goal  of  the  UAS  is  to 
make  the  transition  operationally  to  a system  per- 
forming the  major  amount  of  analysis  with 
minimum  analyst  interaction.  The  ratio  of  minimum 
to  maximum  interactive  data  processing  loads  is  in- 
fluenced primarily  by  performance  tolerances 
specified  by  the  analyst  or  the  systems  management 
component. 
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Oats  Bait  Component 

The  data  base  component  provides  the  data  inter* 
face  between  the  system  components.  Each  compo- 
nent receives  data  from,  and  places  its  results  in.  the 
data  base  for  access  by  other  components. 

High-volume  data  sets  such  as  sample  segment 
and  full-frame  imagery  for  use  as  an  entity  are  stored 
in  a sequentially  organized  data  base.  Storage  for 
other  data  sets  is  provided  in  the  data  base  manage- 
ment/query data  base,  which  permits  the  storage  and 
retrieval  of  data  in  a hierarchical,  network,  and/or 
chained  manner. 

The  integrated  data  base  reduces  the  storage  of  re- 
dundant items  and,  through  its  logical  structure,  pro- 
vides rapid  storage  and  retrieval  of  data  as  required 
by  the  various  system  components.  This  integrated 
structure  also  introduces  a common  thread  to  the 
majority  of  data  in  the  data  base;  i.c.,  a gridded, 
geographically  referenced  partition. 

The  Data  Base  Management  System  (DBMS) 
logically  and  physically  defines  the  data  base  and 
provides  storage  and  retrieval  mechanisms.  Since  the 
system  is  oriented  toward  interactive  analysis,  rapid 
data  base  access  is  crucial.  The  hierarchy  provided  by 
the  partitioned  logical  structure  contributes  toward 
meeting  this  goal  because  the  user  is  able  to  reference 
various  data  types  with  a common  attribute  and 
reduce  data  base  accesses  and  terminal  entries.  Con- 
current access  by  interactive  and  batch  users  pro- 
vides additional  flexibility  and  increases  system 
throughput.  The  data  base  is  the  responsibility  of  the 
data  base  administrator  with  software  support  from 
the  technical  staff. 


Reporting  Component 

All  scheduled  and  unscheduled  reports  are  pro- 
duced by  the  reporting  component.  These  reports, 
placed  in  the  data  base,  arc  made  available  to  the 
USDA  users  via  communications  link.  The  auto- 
mated reporting  process  has  a minimum  capability  to 
store  report  formats,  provide  reports  at  varying 
levels  of  hierarchy,  and  provide  proper  security  con- 
trol for  sensitive  data. 

The  reporting  component  supports  predefined 
formatted  and  unformatted  queries  initiated  by 
members  of  the  production  staff  or  by  the  USDA 
user.  The  query  results  are  presented  to  the  USDA 
user  in  the  same  manner  as  the  scheduled  reports  or 
to  the  production  system  user  in  hardcopy  or  displuy 


form. 

Both  software  and  procedural  checks  are  applied 
to  the  reports  prior  to  release  to  the  evaluation  com- 
ponent, with  checks  on  format  and  completeness 
being  performed  manually. 

All  software  which  interfaces  with  the  analyst  is 
tutorial  in  nature,  with  a menu  presentation  used  as 
often  as  feasible.  The  query  language  is  such  that 
non-computer-orien'ed  professional  personnel  could 
use  it  efficiently.  Appropriate  error  messages  and  re- 
quired corrective  measures  are  designed  for  clarity  to 
the  analyst. 

The  reporting  component  is  under  administrative 
control  by  the  project  management  and  supported  by 
the  technical  staff,  including  systems  analysts  and 
computer  specialists.  Analysts  also  have  access  to  a 
status  and  tracking  data  file  and  a production  system 
library,  which  are  described  in  the  following  sections. 


Status  and  Tracking 

The  system  design  provides  for  a status  and  track- 
ing data  file  to  be  available  for  the  various  manage- 
ment and  technical  groups  to  use  in  obtaining  infor 
mation  required  to  efficiently  manage  and  control 
the  production  system. 

The  status  and  tracking  information  is  provided 
by  the  various  components  of  the  production 
system.  Required  data  could  be  placed  in  the  status 
and  tracking  file  by  an  analyst  from  an  interactive 
terminal  or  by  a software  module  which  is  part  of  a 
process.  For  example,  the  software  module  which 
performs  clustering  writes  a record  to  the  status  and 
tracking  file  after  each  clustering  task.  The  record 
contains  information  identifying  the  segment,  date, 
and  time  of  processing. 

The  status  and  tracking  file  is  designed  to  provide 
data  concerning  all  aspects  of  the  system,  including 
system  throughput  and  processing  statistics  for  man- 
agement. Data  base  activity  is  reported  to  the  data 
base  administrator,  and  reports  on  nearly  all  phases 
of  the  production  are  provided  to  the  systems  man- 
agement component. 


Production  System  Library 

The  production  system  library  is  an  automated  in- 
dex of  all  documents,  film  products,  various  maps, 
and  other  hardcopy  ancillary  data  sources  required 
by  the  production  system.  An  analyst  could  enter  a 
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query  from  a terminal  and  learn  whether  or  not  a 
given  reference  is  currently  available.  !f  available,  the 
item  would  be  logged  out  to  the  requesting  analyst  by 
modifying  the  index  record  on  the  data  base. 


DATA  PROCESSING  SYSTEM 

The  USOA  system  design  provides  for  a complex 
Data  Processing  System  (DPS)  for  support.  The  DPS 
provides  the  data  base,  the  processing  capability,  the 
crop  analysis  displays  and  processing,  the  report 
generation,  and  the  mode  of  interface  with  the  user. 
The  DPS  would  be  modular  to  allow  phased  imple- 
mentation and  growth  to  accept  expanded  support 
requirements.  The  modular  concept  would  result  in 
the  expected  use  of  multiple  small-  to  medium-class 
computers,  with  subsystems  to  function  independ- 
ently. The  data  base  subsystem  would  control  data 
flow. 

A set  of  small  to  medium  computers,  related  pe- 
ripherals, and  operating  system  software  provides 
support  to  the  DPS.  Figure  3 presents  a feasible 
equipment  configuration  with  assigned  processing 
functions. 


Computer* 

All  computers  are  standard  products  with  re- 
quired interface  devices,  including  necessary  timing, 
logic,  and  buffering  to  facilitate  the  computer-to* 
computer  interface.  The  computer-to-computer  in- 
terface provides  the  capability  to  pass  the  up-to-date 
status  and  tracking  tables  between  processors.  The 
controlling  Central  Processing  Unit  (CPU)  flags  data 
requiring  analysis  or  data  base  support  in  the  status 
table.  All  computers  monitor  the  status  and  tracking 
tables  to  determine  when  processing  or  a change  in 
resource  allocation  is  required. 

The  computers  have  an  interrupt  structure  within 
a CPU  to  allow  control  transfer  to  a new  process. 
Changes  in  process  control  occur  as  a result  of  exter- 
nal or  internal  signals  with  interrupt  logic  able  to  re- 
spond to  either  response  requirement.  The  computer 
systems  have  self-diagnostics  under  operator  or  tech- 
nical engineer  control  and  are  available  to  support 
processing  at  least  85  percent  of  the  nominal  16-hour 
day. 


FIGURE  3.— Automatic  data  processing  support  equipment. 
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Operating  Syatama  and  8upport  Softwara 

The  design  provides  for  operating  systems  that  are 
vendor  standard  products.  No  uniquely  develooed 
code  would  be  implemented  into  the  system,  except 
for  the  control  of  nonstandard  interface  devices. 
Methods  of  interfacing  special  devices  or  different 
vendor  products  would  be  added  using  standard 
“hooks"  to  the  operating  system.  The  design  in* 
eludes  provisions  for  incorporating  future  revisions 
of  the  operating  system  and  user  system  unique 
operating  system  level  software  using  a standard 
system  build. 

Each  operating  system  supports  up  to  12  concur- 
rent jobs  in  a multiprograming  environment.  The 
jobs  are  scheduled  on  a priority  basis  with  the 
capability  to  change  job  priorities  dynamically. 

The  system  support  software  provides  for 
user/computer  interaction  at  the  appropriate  CPU's. 
An  easy-to-use  conversational  language  is  provided 
to  the  user  at  interactive  terminals.  All  applications 
software  modules  rely  on  system  software  to 
schedule,  control,  and  translate  messages  to  or  from 
the  interactive  terminals. 

Interactive  transaction-oriented  processing  is  pro- 
vided by  the  system  support  software.  The  transac- 
tion processor  maintains  logs  of  the  transactions  con- 
ducted at  the  terminals.  These  are  retained  until  the 
data  base  is  updated  and  reflect  activated  software 
processes  and  the  operational  revision  level  of  each. 

Each  operating  system  monitors  and  controls  the 
devices  assigned  to  the  respective  CPU's.  Devices 
within  a subsystem  may  be  reassigned  by  the  in- 
dividual computer  operators.  All  devices  can  be 
reassigned  from  the  operations  manager  console  via 
the  subsystem  controlling  CPU's.  Status  displays  of 
all  DPS  resources  are  maintained  for  display  at  the 
operations  manager  console  on  a scheduled  or  a de- 
mand basis. 


SIMULATION 

Simulation  was  initiated  in  June  of  1976  to  track 
and  verify  a design  (ref.  4)  for  the  USDA  system.  As 
new  hardware  or  design  approaches  were  identified 
and  quantified,  they  were  then  simulated  to  verify 
the  adequacy  of  their  approach.  This  allowed  USDA 
to  assess  computer  performance  prior  to  making  any 
capital  investments.  Since  time  was  of  essence,  a 
simulation  model  had  to  be  found  that  was  available 
locally  at  little  or  no  cost  to  the  Government.  A 


thorough  search  uncovered  an  IBM  proprietary 
model  already  installed  at  JSC  that  would  be  made 
available  to  USDA.  This  approach  also  had  the  ad- 
vantage of  providing  USDA  with  local  IBM  person- 
nel who  were  intimately  familiar  with  the  model, 
thus  eliminating  the  learning-curve  time  require- 
ments. 

Performance  prediction  and  design  optimization 
of  the  user  system  required  the  support  of  simulation 
modeling.  Simulation  was  required  also,  according  to 
the  Management  Plan  (ref.  I),  for  economic  analysis. 
Initial  tasks  of  simulation  were  identifying  pro- 
cedural and  feedback  relationships  among  functions, 
identifying  mqjor  modules  and  algorithms  within 
each  function,  and  identifying  module  flows  and 
resource  requirements  to  include  frequency  of  execu- 
tion. 

The  parameters  required  to  validate  a candidate 
configuration  were  hardware  configuration,  with  the 
relevant  performance  characteristics;  software  func- 
tions, with  their  relevant  resource  demand  charac- 
teristics; data  base  designs;  and  information  process- 
ing system  workloads  in  terms  consistent  with  the 
use  of  the  model.  The  task  of  simulation  modeling 
then  proceeded,  and  the  system  design  was  con- 
verted to  an  input  form  to  begin  simulation  of  both 
system  performance  and  system  throughput. 

System  Performance 

The  objectives  of  system  performance  modeling 
were  to  determine  the  critical  parameters  affecting 
elapsed  time  and  resource  utilization  for  each  proc- 
ess. This  included  determination  of  input/output  ac- 
tivity against  data  files  and  “bottlenecks"  impeding 
system  performance.  Another  goal  of  system  per- 
formance modeling  was  to  evaluate  special-purpose 
equipment,  such  as  classifiers  and  mass  storage 
devices. 

In  order  to  achieve  these  goals,  it  was  necessary  to 
define  the  proposed  hardware  characteristics,  system 
configuration,  software  design,  operating  system 
characteristics  with  specific  services,  and  data  base 
management  functions.  Included  irwthc  definitions 
were  the  size  and  rate  of  data  transfers  and  inter- 
module communication. 

System  Throughput 

The  objectives  of  throughput  modeling  were  to 
determine  the  time  required  to  process  a given  data 
cycle  and  generate  a specified  report.  This  would  in 
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turn  determine  the  performance  of  critical  sub* 
systems  necessary  to  meet  throughput  criteria.  The 
determination  of  the  time  required  to  process  a 
priority  episodic  event  in  a fully  loaded  system  was 
also  an  objective.  These  objectives  would  allow  the 
evaluation  of  the  processing  control  algorithm. 

The  requirements  for  meeting  these  goals  were  to 
define  the  processing  cycle  in  terms  of  each  process- 
ing step,  review  and  rework  cycles,  and  reporting 
periods.  In  support  of  the  processing  steps,  it  was 
necessary  to  define  data  collection  cycles,  quantities 
of  equipment  and  personnel,  and  work  schedules. 


Results 

The  simulation  of  a feasible  system  design  pro- 
vided timely  answers  to  system  design  questions, 
such  as  the  ability  of  a minicomputer  to  handle  the 
proposed  geometric  correction  of  MSS  data.  It  was 
determined  that  this  processing  function  could  con- 
strain the  types  of  computers  which  might  be  ap- 
propriate for  the  system.  However,  the  use  of  an  ex- 
ternal array  processor  reduced  CPU  requirements 
significantly  and  permitted  large  arrays  of  data  to  be 
maintained  in  memory  without  relying  on  page  or 
mapping  registers. 

Because  of  the  time  overlap  of  design  and  simula- 
tion, it  was  possible  to  elaborate  on  simulation  details 
as  the  design  proceeded  and  to  modify  the  design 


based  on  simulation  results.  One  mqjor  verification 
of  the  feasible  system  design  was  that  an  average 
sample  segment  processing  time  was  approximately 
1.8  hours,  which  supported  the  required  system 
throughput  and  associated  constraints  described  in 
the  Introduction  (ref.  S). 
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BACKGROUND 

Data  base  design  for  the  U.S.  Department  of 
Agriculture  (USD A)  Application  Test  System  (ATS) 
was  based  on  a combination  of  data  requirements  to 
meet  the  needs  of  end  users,  remote-sensing  analysts 
working  with  remote-sensing  or  crop-reporting  pro- 
cedures, management,  and  the  system  development 
team.  These  different  categories  of  planned  ATS 
users  sometimes  view  the  same  data  items 
differently  and  use  them  differently.  They  also  have 
differing  needs  for  access  to  the  data  for  processing 
or  informational  purposes.  Furthermore,  their  needs 
tend  to  change  at  times.  One  of  the  primary  concepts 
of  the  ATS  has  been  to  provide  a central 
geographically  oriented  data  base  to  serve  varied  ap- 
plication modes,  as  shown  in  figure  1 (also  see 
reference  1). 

LACIE  experience  with  data  needed  to  support 
the  crop  estimation  process  was  of  significant  value 
in  establishing  ATS  data  base  requirements.  Process- 
ing procedures  using  the  LACIE  Earth  Resources 
Interactive  Processing  System  (LACIE/ERIPS) 
provide  ready  access  to  digital  imagery,  fields  defini- 
tions, and  other  data  required  for  statistical  separa- 
tion of  spectral  classes;  these  data  are  managed  effi- 
ciently by  the  Information  Management  System 
(IMS),  a data  base  management  system  available  on 
the  IBM  360/370  series  computers  (refs.  2 and  3). 
Meteorological  data  used  in  estimating  crop  yields 
and  crop  calendar  adjustments  are  extracted  and 
processed  at  sites  remote  from  other  LACIE  ac- 
tivities. Processing  required  for  estimating  produc- 
tion, aggregating  results,  and  reporting  results  use 
still  another  set  of  computer  hardware  and  software, 
Interfaces  among  these  LACIE  components  (and 
other  data  sources,  both  manual  and  automated) 
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have  at  times  been  awkward,  time  consuming,  and 
difficult  to  control. 

Analysis  of  the  need  for  improved  data  logistics 
indicated  that  requirements  could  be  met  best 
through  implementation  of  a central  data  base,  con- 
trolled by  a generalized  data  base  management 
system.  This  approach  would  make  the  data  accessi- 
ble both  to  application  software  and  to  direct  query 
by  the  various  users.  Use  of  a data  base  management 
system  offers  the  potential  for  providing  greater  flex- 
ibility to  meet  changing  requirements.  Proper  design 
for  a central  data  base  also  provides  an  optimum  bal- 
ance among  data  consistency,  redundancy,  access, 
and  responsiveness  (ref.  4). 

The  purpose  of  this  paper  is  to  describe  the  ATS 
data  base  design  approach  and  resources.  Following  a 
summary  of  requirements  for  data  and  information, 
the  data  will  be  described  in  more  detail  by  category, 
with  emphasis  on  those  characteristics  which  in- 
fluenced the  design  most.  Then  the  remaining  steps 
of  the  design  process  will  be  discussed  briefly. 


User  Requirement* 

Current  USDA  priorities  for  the  use  of  remote 
sensing  have  been  defined  as  follows. 

1.  Early  warning  of  changes  affecting  production 
and  quality  of  commodities  and  renewable  resources 

2.  Commodity  production  forecasts 

3.  Land  use  classification  and  measurement 

4.  Renewable  resources  inventory  and  assess- 
ment 

5.  Land  productivity  estimates 

6.  Conservation  practices  assessment 

7.  Pollution  detection  and  impact  evaluation 
ATS  data  base  design  must  provide  support  for  re- 
port preparation  and  information  gathering  in  sup- 
port of  these  priorities.  For  example,  early  warning 
analysis  of  changes  affecting  production  and  quality 


FIGURE  I. — Application /data  base  Interface. 


for  a specific  crop  in  a specific  geographic  area  might 
require  periodic  reports  as  to  climatic  alarms  in  that 
area;  the  analyst  assigned  the  early  warning  analysis 
task  might  also  need  to  query  selected  weather  data 
parameters  for  that  area  over  some  period  of  time. 
These  needs  should  be  supported  by  ready  access  to 
consistent  data. 

Depending  on  the  application,  the  user  may  need 
access  to  ATS  data  by  geographic  area  or  by  some 
combination  of  geography  and  date  or  geography  and 
crop.  Weather  data,  for  example,  will  be  accessed 
only  by  geographic  location  and  date,  whereas  a yield 
estimate  is  a function  of  both  location  and  crop.  ATS 
data  base  design  must  permit  the  user  ease  of  access 
to  data  according  to  the  most  common  relationships 
in  which  the  user  views  the  data;  that  is,  the  data 
relationships,  as  well  as  the  data  itself,  must  be  part 
of  the  data  base  definition. 


Analyst  Requirements 

The  term  “analyst”  is  used  here  to  refer  to  an  in- 
dividual who  uses  remote-sensing  data  to  produce 
crop  estimates  and  crop  assessments.  Several 
different  processes  must  be  supported  to  assist 
analysts  with  one  or  more  of  the  following  tasks. 

1.  Imagery  classification.  In  addition  to  storing 
and  accessing  the  digital  imagery  and  classification 
data,  the  ATS  data  base  must  support  analyst  queries 
of  other  data  types,  such  as  meteorological  data  or 
soils  data  in  the  vicinity  of  the  segment  being 
analyzed. 

2.  Area  estimation. 

3.  Yield  estimation. 

4.  Crop  calendar  adjustment. 

5.  Climatic  alarm  detection. 
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6.  Production  estimation. 

7.  Sample  allocation. 

8.  Report  generation. 

9.  Results  evaluation. 

Similar  types  of  data  base  support  are  required  for 
these  different  tasks.  Data  relationships  are  also  im- 
portant to  the  analyst,  just  as  for  the  user,  although 
the  two  are  not  necessarily  interested  in  the  same 
relationships.  Users  may  need  to  access  data  on  the 
basis  of  administrative  boundaries,  for  example, 
while  analysts  may  need  access  to  the  same  data 
types  on  the  basis  of  proximity  to  a specific  sample 
segment. 

Management  Requirements 

Management  requirements  of  the  ATS  data  base 
cover  those  data  categories  needed  to  assess  the  cur- 
rent status  of  remote-sensing  processing  activities 
and  to  plan  future  activities.  Data  describing  process- 
ing backlog,  computer  system  status,  and  current  ac- 
tivity related  to  specific  crops  and  geographic  areas 
are  needed.  Status  data  must  cover  processing  of  im- 
agery data  and  meteorological  data  and  generation  of 
reports. 

General  Requirements 

Large  volumes  of  data  are  required  for  the 
analysis  of  remote-sensing  data.  The  data  category 
with  the  greatest  volume  is  the  digital  Landsat  imag- 
ery. Each  sample  segment  currently  used  in  crop 
assessment  and  estimation  comprises  91  728 
spectral-intensity  values,  defining  22932  pixels; 
header  data  bring  the  total  data  to  over  92  000  bytes 
(one  byte  for  each  intensity  value)  for  each  image. 
During  the  1978  and  1979  crop  years,  an  estimated 
2500  to  6000  images  will  be  required.  It  is  also  impor- 
tant to  provide  fast  display  on  the  color  cathode-ray- 
tube  (CRT)  screen  for  images,  class  maps,  and 
masks. 

Significant  volumes  of  meteorological  data  are 
also  required.  In  order  to  assist  analysts  in  yield 
estimation,  crop  calendar  adjustment,  and  image 
analysis,  meteorological  data  parameters  (for  large 
numbers  of  meteorological  stations'  and  grid  cells) 
should  be  retained  on-line  as  long  as  practical.  At 
least  90  to  120  days  of  current  meteorological  data 
and  10  years  of  historical  meteorological  data  are  re- 
quired. Historical  agricultural  data  and  crop  estimate 
reports  also  require  significant  data  volumes. 


Other  general  requirements  for  the  data  bate  in- 
clude minimum  redundancy,  data  consistency,  ease 
of  query  and  maintenance,  and  flexibility.  Data  re- 
dundancy Increases  storage  requirements  and  proc- 
essing time.  Redundancy  also  increases  the  risk  of 
data  inconsistency;  that  is,  when  a data  dement  ex- 
ists in  more  than  one  location,  the  risk  of  updating 
one  and  not  the  others  is  higher.  Inconsistency,  in 
turn,  reduces  the  usefulness  and  reliability  of  the 
data.  Flexibility  is  needed  to  accommodate  antici- 
pated changes  in  other  data  requirements. 


DATA  CATEGORIES 


Geographic  Entitles 

Geographic  entities  used  in  the  ATS  data  base 
structure  include  the  LACIE  geographic  hierarchy, 
agrophysical  units,  meteorological  stations,  and  grid 
cells.  Definition  of  the  relationships  among  these  en- 
tities is  a key  element  in  the  data  base  structure.  Most 
other  ATS  data  types  are  defined  with  relation  to  one 
or  more  of  these  geographic  entities. 

The  geographic  hierarchical  levels  currently  used 
by  LACIE  are  country,  region,  zone,  stratum,  and 
substratum.  The  specific  number  of  these  levels  and 
their  identification  with  political  or  administrative 
boundaries  vary  from  one  country  to  another.  For 
example,  in  the  United  States,  a zone  corresponds  to 
a state,  a stratum  to  a crop  reporting  district,  and  a 
substratum  to  a county;  in  the  U.S.S.R.,  a stratum 
corresponds  to  an  oblast  and  is  the  lowest  level. 
LACIE  codes  are  used  to  identify  the  hierarchical 
levels  to  the  ATS  data  base.  The  climatic  crop  region 
is  a specific  grouping  of  some  of  the  hierarchical  en- 
tities for  applying  certain  LACIE  yield  models  (ref. 
5). 

Meteorological  stations  are,  for  the  most  part, 
those  World  Meteorological  Organization  (WMO) 
stations  located  in  the  crop  areas  of  interest.  Standard 
WMO  codes  are  used  to  identify  the  stations;  sta- 
tions not  in  the  WMO  network  are  identified  by  call 
codes. 

The  grid  cell  entity  used  for  the  ATS  data  base  is 
taken  from  an  Air  Force  meteorological  data  grid, 
defined  as  a rectangular  mesh  on  a polar- 
stereographic  plane  (ref.  6).  When  projected  onto  the 
Earth's  surface,  the  length  of  a side  of  a grid  cell  is 
approximately  25  nautical  miles  at  middle  latitudes 
(precise  size  and  shape  vary  with  latitude  and 
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longitude).  Each  grid  ceil  is  identified  by  an  (14) 
pair,  representing  the  matrix  location  with  respect  to 
the  two  axes  on  the  polar-stereographic  plane.  The  J- 
axis  is  parallel  to  the  meridians  which  define  100° 
east  longitude  and  80°  west  longitude.  Algorithms  are 
available  for  converting  from  latitude  and  longitude 
to  an  (IJ)  pair  and  vice  versa. 

As  described  later  in  this  paper,  several  data  types 
pre  recorded  for  the  grid  cell  mesh.  For  some  of  these 
data  types,  each  cell  can  be  divided  further  into  quad* 
rants  (identified  as  A,  B,  C,  and  D);  a quadrant  is 
therefore  about  12.S  nautical  miles  on  each  side  at 
the  middle  latitudes.  For  some  purposes  of  the  ATS 
data  base,  each  zone-level  area  (or  stratum  level,  de- 
pending on  country)  in  the  geographic  hierarchy  is 
defined  as  the  collection  of  grid  cells  contained  in 
that  area;  this  definition  can  be  extended  easily  to 
higher  levels  of  the  hierarchy. 

Use  of  grid  cells  provides  (1)  a smaller  geographic 
unit  than  other  geographic  areas  used  in  LACIE  for 
recording  data  and  (2)  a convenient  method  for 
analyzing  results  over  otherwise  undefined 
geographic  areas. 

The  agrophysical  unit  (APU)  provides  definition 
of  the  two  remaining  ATS  geographic  entities.  An 
APU  is  defined  as  an  area  with  similar  soils,  climate, 
topography,  and  other  agronomic  factors  such  as 
land  use  intensity.  The  intersection  of  an  APU  and  a 
zone-level  or  stratum-level  geographic  area  is  iden- 
tified as  a refined  stratum.  In  the  ATS  data  base,  an 
APU  is  further  defined  as  a collection  of  grid  cells. 
One  might  choose  to  consider  the  APU,  as  imple- 
mented for  ATS,  as  an  irregular  polygon  which  can 
be  converted  readily  to  grid  cells. 


CropSampIn 

The  crop  sample  unit  used  in  the  ATS  data  base  is 
the  LACIE-defined  sample  segment  (ref.  7).  As  in 
the  three  phases  of  LACIE,  this  sample  is  an  area 
about  $ by  6 nautical  miles  (9.26  by  11.11  kilometers) 
in  size.  For  a specific  crop  of  interest,  sample  seg- 
ments are  assigned  to  each  geographic  area  for  which 
the  crop  is  to  be  analyzed.  In  LACIE  Phases  I,  II,  and 
111,  geographic  areas  to  which  sample  segments  were 
allocated  were  defined  by  administrative  boundaries. 
For  the  ATS.  as  in  the  LACIE  Transition  Year,  sam- 
ple segments  are  allocated  by  APU’s  and  then  appor- 
tioned to  the  refined  strata  comprising  each  of  the 
APU’s. 


Digital  Imagery  (LantfMt) 

Digital  imagery  data  for  each  sample  segment  are 
extracted  from  Landsat  scenes,  each  scene  being 
about  185  kilometers  square  in  area  on  the  Earth's 
■ surface.  The  basic  unit  of  imagery  data  is  a pixel,  or 
picture  element,  referring  to  one  instantaneous  field 
of  view  (about  1 acre  in  size)  as  recorded  by  the 
multispectra!  scanner  system  (ref.  7).  For  the  three 
LACIE  phases,  the  digital  imagery  data  for  a sample 
segment  contained  the  four  bands  of  multispectral 
data  for  1 17  lines,  each  line  containing  196  pixels.  All 
digital  images  currently  in  the  ATS  data  base  have 
these  dimensions.  The  data  base  design,  however, 
provides  changeable  limits  for  the  number  of  chan- 
nels, lines,  and  pixels  per  line  in  an  image. 


Classification  Data 

In  addition  to  the  digital  imagery,  several  data 
types  which  are  either  used  or  generated  during  the 
classification  process  are  retained  in  the  ATS  data 
base.  These  include  fields  data,  classification  maps, 
dot  definitions,  and  masks.  Fields  data  are  consistent 
with  LACIE  definitions;  a label  is  assigned  to  each 
field,  up  to  10  vertices  are  permitted  for  each  field, 
and  a field  class  is  generated  for  each  field.  A class 
(or  classification)  map  has  the  same  number  of  lines 
and  pixels  as  the  corresponding  image  and  each  pixel 
is  assigned  a class;  three  class  maps  are  permitted  for 
each  segment.  Dot  definitions  are  consistent  with  re- 
quirements for  Procedure- 1 dots  (Procedure  I is  a 
specific  procedure  for  assigning  pixel  classifications). 
In  the  expectation  that  masks  may  someday  be  re- 
quired for  excluding  pixels  assigned  to  the  two 
classes,  designated  other  (DO)  and  designated  un- 
identifiable (DU),  provisions  have  been  made  for 
storing  these  masks  in  the  data  base. 


Meteorological  Data 

Meteorological  data  in  the  ATS  data  base  will  in- 
clude daily  and  historical  parameters  both  for  WMO 
reporting  stations  and  for  the  (14)  grid  cell  mesh. 
Daily  meteorological  parameters  now  available  for 
WMO  stations  include  maximum  temperature, 
minimum  temperature,  and  24-hour  precipitation. 
Monthly  summaries  are  prepared  for  climatic 
regions,  as  required  for  input  to  first-generation 


1088 


* 


LACE  yield  model*,  end  will  be  prepared  for  in* 
dividual  station*  as  historical  data  for  a 1-year  period. 
Provision  is  made  for  future  expansion,  both  in  types 
of  meteorological  data  collected  and  in  the  extent  of 
global  coverage. 

Future  plans  also  call  for  interpolation  of  station 
data  to  provide  the  same  daily  meteorological 
parameters  for  (14)  grid  cells  in  crop  areas  of  in* 
teres t.  Historical  summaries  will  also  be  prepared 
and  will  begin  at  that  point  in  time  when  the 
capability  is  first  available. 

Agronomic  Data 

Agronomic  data  in  the  ATS  data  base  define  ma* 
jor  crops  and  their  densities  in  the  areas  of  interest. 
Data  on  common  cropping  practices,  irrigation  and 
drainage,  predominant  soil  taxonomy,  and  nominal 
crop  calendars  have  been  estimated  and  recorded 
using  various  information  sources. 

1.  Maps.  Operational  Navigational  Charts 
(ONC’s,  scale  1:1 000000)  published  by  the  Defense 
Mapping  Agency  are  the  basic  maps  for  data  deriva- 
tion. Soils  maps  of  the  same  scale,  developed  at 
South  Dakota  Slate  University  under  contract  to 
USDA,  are  also  used  (refs,  i and  9). 

2.  Overlays.  Transparent  overlays  to  the  ONC 
maps  containing  agricultural/nonagricultural  delinea- 
tions, sample  segment  locations,  toils,  APU  bound- 
aries, and  (U)  grid  cell  delineations  are  used  in 
recording  the  data. 

3.  imagery.  Digital  Landsat  imagery  is  used  to 
assist  in  the  definition  and  recording  of  some  data 
parameters  and  in  the  refinement  of  other  estimates. 

In  addition  to  the  grid-oriented  agronomic  data, 
information  describing  soil  characteristics  has  been 
incorporated  into  the  data  base.  Developed  at  Iowa 
State  University  under  contract  to  USDA,  these  data 
contain  many  encoded  soil  properties  (such  as  parti- 
cle size,  mineralogy,  available  water  capacity,  per- 
meability, salinity,  and  land  use  suitability)  for  each 
soil  series.  From  the  many  encoded  properties,  those 
which  appear  to  be  of  value  in  crop  assessment  were 
extracted  for  the  ATS  data  base.  These  soils  data  are 
queried  by  the  crop  analyst  to  aid  in  classifying  imag- 
ery and  in  crop  assessment. 

Crop  Assossmont  Reports 

The  crop  assessment  process,  including  genera- 
tion of  reports,  can  require  data  from  most  of  the 


other  categories  maintained  in  the  dete  base. 
Depending  on  the  specific  analysis  being  performed, 
reports  of  the  following  types  would  be  required  on 
demand. 

t.  Crop  area,  yield,  and  production  estimates  for 
current  crop  year 

2.  Climatic  alarms 

3.  Water  resources 

4.  Land  resources 

Both  tabular  and  graphic  forms  are  required.  Re- 
tention periods  for  data  in  the  various  reports  pro- 
duced by  ATS  will  vary  according  to  security  require- 
ments. Generally,  reports  generated  by  the  ATS  will 
be  retained  on-line  in  the  date  base  for  at  least  2 crop 
years. 

Historical  Dete 

Historical  meteorological  data  and  historical  crop 
estimates  (generated  by  LACIE  and  by  ATS)  have 
been  described  in  previous  paragraphs.  Historical 
crop  estimates  generated  by  the  USDA  Statistical  Re- 
porting Service  (SRS)  and  Foreign  Agricultural  Ser- 
vice (FAS)  are  maintained  for  specified  crops  and 
areas  of  interest. 

Statue  Data 

Processing  of  data,  both  digital  imagery  and 
meteorological,  is  tracked  from  the  tin:*  the  dbte 
enter  the  ATS.  Processing  status  summaries  provide 
information  regarding  what  data  are  available,  com- 
pleteness of  results,  and  work  backlog. 


DATA  CHARACTERISTICS 
Soureee 

ATS  data  sources  are  in  general  the  same  as 
LACIE  data  sources  (table  I).  For  example,  Landsat 
digital  imagery  and  meteorological  date  by  station  are 
extracted  either  from  LACIE  sources  or  from 
LACIE  date  flies,  and  map  overlays  to  locate  sample 
segments  and  to  delineate  agricultural  land  use  are 
the  same  as  those  used  for  LACIE.  In  many  cases, 
however.  ATS  has  established  its  own  data  sources; 
this  is  particularly  true  in  the  case  of  date  recorded 
for  the  (IJ)  grid  cells— the  ATS  “gridded  data”  such 
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Table  I.— A TS  Data  Sources 


RtcuB  typt 

Dam  /ferns 

Source 

Country,  crop 

Country,  crop,  and  relationship 

Encoded  manually 

Oeographic  hierarchy 

Hierarchical  geographic  levels 

Encoded  manually  from  ONC  maps 

Sample  segment  description 

Sample  segment  identification 

Encoded  manually  from  ONC  maps,  segment 
overlays 

Reference  frames 

Location,  active  image  acquisitions 

Extracted  from  L ACIE  digital  imagery  data 

Soil  (predominant) 

Encoded  manuaBy  from  ONC,  soils  overlays 

Classification  data  (climatic)  (alarms. 

Generated  by  application  software  and  analyst 

fields,  etc.) 

definition 

Sample  segment  acquisition 

imagery  header  data 

Extracted  from  L ACIE  imagery 

Crop  development  stage* 

Nominal— encoded  manually 
Adjusted— generated  by  application  software 
and  analyst  observation 

Image  line 

Pixel  intensities 

Extracted  from  LAC1E  imagery 

DU  mask 

Designated  unidentifiable  pixels  for 

Generated  by  application  software  and  analyst 

acquisition 

definition 

Dots 

Dot  information 

Generated  by  application  software  and  analyst 
definition 

Clan  map* 

Pixel  classification  for  image 

Generated  by  application  software  and  analyst 
definition 

DO  mask 

Designated  other  pixels  for  segment 

Generated  by  application  software  and  analyst 
definition 

Fields 

Field  vertices, classification 

Generated  by  application  software  and  analyst 
definition 

Evaluated  tegmenta 

Segment  results  from  classification 

Generated  by  application  software  and  analyst 
definition 

Crop  eatimatea 

Area,  yield,  production  by  geographic 

LACIE— transferred  from  Crop  Assessment 

location 

Subsystem  data  base 
ATS— generated  by  application  software 

Meteorological  (met)  station 

Met  station  location 

Encoded  manually  from  ONC  maps,  overlays 

Daily  met— elation 

Daily  meteorological  parameters 

Loaded  periodically  from  met  data  provided  by 
National  Oceanic  and  Atmospheric  Admin- 
istration (NOAA)  for  t ACIE 

Station  crop  data 

Crop  calendar  adjustments 

Generated  by  application  software 

Climatic  crop  region 

Yield  models 

Encoded  manually 

Yield  monthly  reports 

Yield  estimates 

Generated  by  application  software 

Met  summary— monthly 

Input  to  yield  model* 

Generated  by  application  software 

API)  description 

APU  location,  agricultural  area 

Encoded  manually  from  ONC  maps,  overlays 
for  APU  and  agricultural  areas 

Refined  stratum 
Historical  dal* 

Geographic  location 

Generated  by  load  program  for  APU  description 

Agricultural 

Crop  statistics 

Extracted  from  USDA  (SRS,  FAS)  data  files 

Meteorological 

Meteorological  statistics 

Extracted  from  NOAA  data  files 

Crop  estimates 

Estimates  generated  from  remote- 

Summarised  from  aggregated  crop  estimates 

sensing  data 

generated  by  LACIE.  ATS  before  purging 
from  on-line  files 

Climatic  alarms 

Stored  on  analyst  instruction 

Daily  met— grid  cell 

Daily  metaorokqical  parameters 

Generated  by  application  software 

Agronomic— grid  cell 
Geographic  hierarchy 

Agronomic  factors 

Encoded  manually  from  ONC  maps,  coir  r 
imagery,  overlays 

Soils— grid  cell 

Soil  taxonomy,  features 

Encoded  manually  front  ONC  maps,  color 
imagery,  overlays 

Soil*— general 

Soil  characteristics 

Loaded  from  soils  tape  developed  under  USDA 
contract 

1090 


m toils  and  agronomic  date.  At  shown  in  table  I, 
many  of  the  data  are  also  generated  by  application 
software:  this  applies  particularly  to  classification 
Oata, 


VoMitt 

Volumei  of  dau  to  be  stored  in  the  ATS  data  base 
initially  are  shown  in  table  II.  In  order  to  cellmate 
data  volume,  several  assumptions  were  necessary 
First,  it  was  assumed  that  the  area  to  be  analysed 
comprises  six  refined  strata  in  two  states  of  the  U £. 
f aring  wheat  growing  area  (Montana  and  North 
Dakota)  and  one  APU  in  the  U&&R.  spring  wheat 
growing  area.  It  was  also  assumed  that  the  digital  im- 
agery format  would  be  the  seme  as  for  Landsat-2. 
regardless  of  which  satellite  provides  the  data. 
Another  assumption  wss  that  both  station  and  grid- 
ded  meteorological  dau  would  be  sorted.  These 
assumptions  can  be  translated  roughly  into  the 
following  upper  limits:  266  sample  segments.  1800 
acquisitions.  250  meteorological  stations,  and  3450 
pid  cells.  The  estimates  shown  in  table  II  were  based 
on  these  limiu. 


Tabu  U.— A TS  Data  Volume  Zstimatef 


Fa romrlft 

Oslo  mtumr,  nxtobym,  ft*  — 

facets 5* 

S.f  ocutSS 

Classification  dau 

mmm 

24$ 

310 

Support  dau 

— 

35 

35 

Software,  working 

I 

ttorifc 

— 

50 

50 

Toul 

— 

330 

305 

Assumed  disk  load 

factors 

60  percent 

sso 

— 

660 

70  percent 

470 

— 

56$ 

■u  pcrcim 

415 

- 

4*5 

‘fewffipiKxu  LuiSui-1  fareui  IS  feuds,  H?  tin*.  By  tSfe 

(MUM 

pet  unfit  ugnstu 


AT8  DATA  BA88  OBStQN 
Configuration 

The  ATS  hardware  configuration  is  based  on  a 
Digital  Equipment  Corporation  (DEC)  PDP  11-70 
mainframe.  Hardware  features  indude  256K  words 
(512K  bytes)  of  main  storage,  packs,  and  four  9-track 
tape  drives.  The  computer  operates  under  the  I AS 
operating  system,  a standard  DEC  software  product 
described  in  the  paper  entitled  “The  Application  Test 
System:  Technical  Approach  and  System  Design"  by 
Demon  at  al. 

Dau  base  software  indudes  file  management  soft- 
ware for  large  sequential  flies  and  a dau  base  man- 
agement system  (DBMS)  for  the  remaining  data. 
Digital  imagery  and  other  high-volume  dau  are 
handed  as  sequential  files  by  die  standard  I AS  F‘le 
Control  Services.  This  approach  permits  an  image  to 
be  displayed  on  the  color  consoles  with  minimal  ac- 
cess time.  It  also  facilitates  efficient  handling  of  the 
classification  process. 

The  DBMS  used  for  other  ATS  files  is  IDMS-tt,  a 
propriet  y product  of  CuUinane  Corporation.  IDMS 
was  iniuJIy  developed  for  use  on  the  IBM  360/370 
series  computers  and  later  converted  to  run  on  the 
PDP  11-70.  IDMS-11  supports  both  hierarchical  and 
network  types  of  dau  structures,  as  specified  in  the 
CODASYL  Dau  Base  Task  Group  Report  (ref.  10). 

IDMS-11  provides  separate  language  facilities  for 
dau  definition  (DDL)  and  for  data  manipulation 
(DML),  both  of  which  are  language  extensions  of 
COBOL.  However,  the  system  can  be  used  easily 
with  the  other  languages  (FORTRAN  and  macro  as- 
sembler) which  support  CALL  sutemenu  and  it,  in 
fact,  so  used  by  the  ATS  development  staff.  Several 
concurrent  users  can  be  supported  by  the  system. 

The  entire  collection  of  record  types  which  com- 
prise the  dau  base  is  defined  to  IDMS-11  in  a schema 
using  the  DDL.  The  schema  defines  all  data  tie- 
menu.  record  types,  physical  dau  storage  mapping, 
and  set  rtiationahips  in  the  dau  base.  The  user  can 
access  the  dau  base  only  through  a subschema,  a 
subset  of  the  schema  predefined  to  include  all  dau 
and  dau  relationships  needed  for  a specific  dau  base 
application. 

In  IDMS-11,  the  set  relationship  defines  most  of 
the  logical  relationships  among  the  various  record 
types.  Each  set  is  a named  collection  of  two  or  more 
record  types— one  “owner"  record  type  and  one  or 
more  “member"  record  types.  Any  record  type  can 
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serve  as  a member  or  owner  record  in  any  number  of 
set  relationships.  Set  relationships  can  be  used  to 
define  complex  structures  among  the  record  types  in 
the  data  base  (ref.  10). 

IDMS-1 1 also  provides  journaling  of  all  data  base 
accesses  which  result  in  changes  to  the  data  base. 
Together  with  the  dump  facility,  this  feature  will  be 
used  to  provide  data  base  recovery  in  case  of  a 
system  “crash."  Special  recovery  procedures  are  also 
available  for  crash  occurrences. 

Privacy  provisions  are  not  as  complete.  Each  ap- 
plication is  assigned  a subschema,  which  can  only  be 
accessed  by  using  a specified  user  identification  code 
(UIC).  A subschema  is  constrained  to  use  only  the 
record  types  needed  for  the  specific  application,  and 
this  constraint  can  be  extended  to  the  data  t'ement 
level  Most  of  the  security  provisions  are  dependent 
upon  operating  system  capabilities. 

Query  Capability 

In  addition  to  providing  access  to  the  ATS  data 
base  by  means  of  application  software,  the  crop 
analyst  has  direct  access  to  data  through  a query 
capability.  For  example,  the  analyst  might  want  to 
review  weather  conditions  at  WMO  stations  nearest 
a specific  sample  segment  for  10  days  prior  to  the 
most  recent  acquisition  date.  An  example  of  a ter- 
minal display  resulting  from  this  type  query  is  shown 
in  figure  2.  Other  query  capabilities  exist  for  viewing 
soils  data  and  various  segment-related  data  at  the 
analyst  terminal. 


STATION 

DATE 

VAX  TEMP  ( F) 

— 
MIN  TEMP  ( F) 

— 
PRECIP  (mm) 

XXX-YY 
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58 

73 

0 
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58 

75 

0 
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0 
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64 

81 

6 
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59 

80 

10 
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61 

80 

0 

30-MAY-77 

65 

82 

0 
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65 

82 

0 
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68 

S3 

0 

2-JUNE  77 

62 

79 

20 

xxx-zz 
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54 

70 

0 
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55 

71 

0 
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55 

72 

0 
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55 

73 

0 

28MAY77 

58 

78 

5 

29  MAY  77 

54 

78 

20 

30-MAY-77 

56 

72 

5 

FIGURE  2.— Query  example. 


Query  capabilities  developed  for  ATS  use  the 
same  data  base  facilities  as  application  software.  The 
initial  query  packages  have,  in  fact,  been  developed 
and  implemented  as  COBOL  and  FORTRAN  ap- 
plications. These  query  packages  require  that  the 
user  respond  to  prompts  by  furnishing  specific 
parameters  to  be  used  in  data  retrieval. 

ATS  Data  Baaa  Design  Approach 

Data  base  design  for  ATS  was  an  iterative  process. 
Data  elements  were  identified  first  on  the  basis  of 
LACIE  docu  mentation  and  discussions  with 
analysts.  Record  types  for  the  data  were  then  pro- 
posed and  the  resulting  structure  reviewed  with  the 
analysts.  Several  cycles  of  review  and  revision  pre- 
ceded the  current  data  base  definitions. 

Record  Types 

ATS  record  types  were  designed  to  accommodate 
a!!  data  categories  required  for  crop  analysis.  Data 
elements  were  identified  and  grouped  according  to 
usage,  dependency,  and  source.  Consideration  was 
given  to  usage  both  by  application  software  and  by 
means  of  direct  query  by  the  crop  analyst  operating 
at  the  console.  Table  III  lists  the  record  types 
(defined  without  regard  to  the  specific  implementa- 
tion) , approximate  record  lengths,  and  record  occur- 
rences. 

Structure 

Record  types  defined  for  the  ATS  data  base  are 
shown  in  the  data  structure  diagram  in  figure  3.  Key 
geographic  entity  record  types  are  bounded  by 
heavier  lines  in  the  diagram  to  emphasize  their  im- 
portance. It  should  be  notea  that  the  names  shown 
for  these  record  types  are  not  precisely  those  used  in 
the  schema  definition  for  the  data  base  because  of 
the  need  to  abbreviate  in  the  schema.  Data  relation- 
ships (not  always  the  same  as  set  relationships  in  the 
schema  diagram)  are  also  shown;  that  is,  an  arrow  in 
the  diagram  indicates  ease  of  access  from  one  record 
type  to  another,  but  not  necessarily  through  use  of  a 
pointer.  A single  arrow  in  one  direction  defines  a 
one-to-one  relationship;  a double  arrow  in  one  direc- 
tion defines  a one-to-many  relationship.  Omission  of 
an  arrow  in  one  direction  indicates  that  the  need  to 
access  data  in  that  sequence  is  not  expected. 
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Table  HI.— ATS  Record  Types 


Table  III. — Concluded 


Record  type 

Length, 

bytes 

Occurrence 

Country 

32 

One  record  for  each  country 

Crop 

24 

One  record  for  each  crop 

Country-crop 

32 

One  record  for  each  combina- 
tion of  country  and  crop 

Geographic  hierarchy 

60 

One  record  for  each  stratum  or 
substratum 

Sample  segment  (SS) 
description 

160 

One  record  for  each  sample 
segment 

SS  acquisition 

384 

One  record  for  each  sample 
segment  acquisition 

Image  line 

800 

One  record  for  each  line  (4 
bands)  of  a sample  segment 
image 

DU  mask 

200 

One  record  for  each  line  of  a 
segment  acquisition 

Dots 

20 

One  record  for  each  dot  defined 
for  a segment  under  Pro- 
cedure 1 

Class  map 

200 

One  record  for  each  line  of  a 
segment  class  map  (3  class 
maps  for  each  segment) 

DO  mask 

200 

One  record  for  each  line  of  a 
segment 

Fields 

80 

One  record  for  each  field 
defined  for  a segment  for 
each  of  2 crop  years 

Met  station 

128 

One  record  for  each  met  station 

Station  crop  data 

24 

One  record  for  each  computer- 
generated  crop  calendar  ad- 
justment at  a met  station 

Daily  met— station 

SO 

One  record  for  each  day  for 
each  met  station 

Climatic  crop  region 

128 

One  record  for  each  climatic 
crop  region 

Met  summary — 
monthly 

50 

One  record  for  each  climatic 
crop  region  for  each  month 

Yield  results 

450 

One  record  for  each  climatic 
crop  region  for  each  report 
generated 

APU  description 

64 

One  record  for  each  APU 

Grid  cell  quadrant 

128 

One  record  for  each  grid  cell 
quadrant 

Full  grid  cell 

64 

One  record  for  each  full  grid 
cell 

Agronomic— grid  cell 

ISO 

One  record  for  each  grid  cell 
quadrant  for  each  crop 

Daily  met— grid  cell 

96 

One  record  for  each  day  for 
each  <ul,  grid  cell 

Agronomic-grid  cell 

150 

One  record  for  each  full  grid 
cell 

Soils — grid  cell 

40 

One  record  for  each  grid  cell 
quadrant 

Soils— general 

960 

One  record  for  each  soil  tax- 
onomy family 

Record  type 

Length, 

bytes 

Occurrence 

Historical— 

80 

One  record  for  each  reporting- 

agricultural 

level  unit  for  each  year 

Historical— crop  esti- 

60 

One  record  for  each  reporting- 

mates 

level  unit  for  each  year 

Historical— met 

64 

One  record  for  each  met  station 
for  each  month 

Status  and  tracking- 
imagery 

— 

Undetermined 

Status  and  tracking— 

— 

Undetermined 

met 

« 

Evaluated  segments 

60 

One  record  each  time  a seg- 
ment is  classified  and 
evaluated 

Aggregated  results 

300 

One  record  for  each  reporting- 
level  unit  for  each  report 
period 

Schema  Design 

Once  the  inherent  data  relationships  were  iden- 
tified, emphasis  shifted  to  design  approach  using 
techniques  recommended  for  th“  ATS  data  base 
management  system,  IDMS-11  (ref.  11).  Additional 
record  types  had  to  be  defined  wherever  a many-to- 
many  relationship  occurred.  Location  mode,  record 
key,  and  set  relationships  were  defined  for  each 
record  type.  In  some  instances,  a degree  of  data  re- 
dundancy was  retained  with  the  object  of  providing 
more  access  paths  and,  as  a result,  possibly  simpler 
query  structure  (ref.  12). 

The  ATS  data  base  schema  diagram  was  then 
developed  on  the  basis  of  the  foregoing  definitions. 
Figure  4 shows  the  schema  diagram  developed  for 
the  grid  cell  area  of  the  data  base.  Each  block  repre- 
sents a record  type,  defining  the  record  name,  record 
identification,  size  (in  bytes),  location  mode,  loca- 
tion key,  disposition  of  duplicates,  and  area  name. 
Connectors  between  blocks  represent  set  relation- 
ships and  are  annotated  with  set  name,  linkage 
(types  of  pointers),  membership  option  (for  storage 
and  removal),  and  logical  order  within  each  occur- 
rence of  the  set.  Record  types  outside  the  grid  cell 
area  for  which  set  relationships  are  defined  with 
record  types  inside  the  area  are  also  shown,  but  with- 
out annotation  inside  (he  blocks. 

The  final  design  step  prior  to  coding  consisted  of 
mapping  the  logical  definitions  to  physical  storage 
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KHil'RK  .1. — ATS  d»U  biw  Mructurt  diagram. 


units.  Record  types  were  grouped  into  areas  accord- 
ing to  the  likelihood  that  they  would  be  used  by  the 
same  applications.  Areas  were  assigned  to  physical 
storage  units  so  as  to  provide  the  necessary  storage 
space  with  the  least  likelihood  of  overflow. 


EXPERIENCE  TO  PRE8ENT 


Planned  Phases 

Implementation  of  the  initial  ATS  data  base  was 
planned  to  cover  the  first  two  Transition  Years — 
FY78  and  FY79.  Emphasis  during  FY78  is  on  dem- 
onstration of  the  usefulness  of  query  capabilities  to 
the  image  analyst,  crop  analysts,  members  of  man- 
agement, and  other  potential  users.  Usefulness  of 
crop  data  representation  on  a gridded  geographical 
basis  for  the  APU  wiil  also  be  analyzed.  Emphasis 
during  FY79  wiil  be  extended  to  analysis  of  other 


aspects  of  the  data  base,  including  different  ap- 
proaches to  imagery  storage  and  retrieval,  interfacing 
more  application  software  with  the  data  base,  and  in- 
clusion of  other  gridded  data. 


Initial  Phase 

Nineteen  record  types  defined  in  the  ATS  data 
base  schema  were  selected  for  the  initial  develop- 
ment. primarily  on  the  basis  of  usefulness  and  logical 
load  sequence.  Subschemata  and  load  programs  were 
devc’oped  for  these  record  types.  Data  were  collected 
or  recorded  and  loaded  into  the  data  base.  Simple 
query  programs,  developed  for  some  of  the  data  ex- 
pected to  be  most  useful  to  the  analysts,  are  currently 
being  tested. 

Use  of  IDMS-ll  has  provided  solutions  to  many 
of  the  problems  in  implementing  ATS,  as  hoped. 
However,  some  problems  remain.  Generalized  query 
capabilities  of  the  system  are  not  yet  to  the  desired 
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FIGURE  4. — ATS  sc  hem*  diagram. 


stage  or  development,  and  problems  remain  in  this 
area.  A query  language  for  data  retrieval  or  update  in 
an  on-line,  ad  hoc  environment  is  lacking.  This 
necessitates  development  of  menu-type  query  pro- 
grams for  each  type  of  query  needed.  Overhead,  pri- 
marily with  respect  to  disk  storage  requirements,  is 
significant.  Until  implementation  was  well  under- 
way, the  ratio  of  actual  data  to  total  storage  require- 
ments was  not  realized.  At  the  present,  however,  it  is 
not  planned  to  reduce  overhead  storage  requirements 
by  reducing  the  number  of  pointers  because  an  in- 
crease in  processing  time  would  probably  result. 
Some  operational  problems  have  occurred,  mostly 
because  the  system  is  new  to  the  staff. 

The  decision  to  use  a generalized  data  base  man- 
agement system  to  manage  a central  data  base  in  sup- 
port of  remote-sensing  crop  assessment  appears  to  be 
sound.  IDMS-11  appears  to  be  a good  minicomputer- 
base  system  for  this  purpose.  The  USDA  ATS  staff 
expects  to  expand  the  ATS  data  base  concept  in 
FY79  and  succeeding  years. 
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The  Application  Teat  System:  Technical  Approach 

and  System  Design 

J.  L.  Benson,0  D.  R.  McClelland ,a  J.  D.  Tarbel 0 andR.  F.  Pumellb 


INTRODUCTION 

The  purpose  of  this  paper  is  to  provide  insight 
into  the  technical  approach  which  was  applied  to  the 
system  design  of  the  U.S.  Department  of  Agriculture 
(USDA)  Applications  Test  System  (ATS).  This  in- 
cludes identification  of  requirements,  assessment  of 
remote-sensing  contributions,  evaluation  of  existing 
techniques,  and  cost-effective  development  of  a 
system  design  which  utilizes  techniques  and  pro- 
cedures consistent  with  requirements. 

For  many  years,  scientists  and  engineers  have 
studied  and  proposed  the  potential  roles  of  remote 
sensing  in  the  management  and  exploration  of  Earth 
resources.  It  is  currently  estimated  that  operational 
use  of  Landsat  data  to  derive  agricultural  crop  infor- 
mation has  a potential  benefit  of  millions  of  dollars 
to  the  United  States  alone  (ref.  1).  The  experience 
gained  during  the  LACIE  should  result  in  the 
development  of  operational  systems  for  processing 
Landsat  data. 

A major  function  of  the  LACIE  has  been  the 
development,  testing,  and  accuracy  assessment  of 
techniques  derived  to  extract  agricultural  informa- 
tion from  Landsat  imagery.  The  project  has  demon- 
strated that  techniques  for  classifying  Landsat  data 
have  developed  to  the  point  where  it  is  feasible  to 
define  systems  for  testing  the  LACIE-devetoped 
technology  in  a user  application  test.  LACIE-proven 
technology  has  provided  the  basis  for  deriving  infor- 
mation appropriate  to  a specific  user;  for  example, 
the  Foreign  Agricultural  Service  (FAS)  of  the 
USDA. 

The  USDA  established  the  User  Systems  Planning 
and  Applications  Test  Group  (USPATG)  with  the 
ground  rule  of  using  LACIE  technology  to  define  a 
system  within  specific  USDA  requirements.  The 
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USPATG  defined  the  user  requirements  for  a pro- 
cessing system  which  could  evolve  from  the  LACIE 
and  meet  the  USDA  criteria  for  an  operational 
system  in  the  future. 


TECHNICAL  APPROACH 

Ford  Aerospace  & Communications  Corporation 
(FACC)  participated  in  a joint  study  effort  with  the 
USPATG  to  develop  a system  design  for  large-scale 
processing  of  Landsat  data.  The  study  resulted  in  a 
series  of  reports  (refs.  2 to  7),  the  most  significant  of 
which  was  a feasible  design  for  a USDA  system. 

The  USPATG  study  team  followed  the  classical 
approach  of  designing  a system.  The  design  approach 
took  the  logical  steps  of 

1.  Identifying  USDA  requirements 

2.  Assessing  possible  remote-sensing  contribu- 
tions 

3.  Evaluating  existing  processing  techniques  and 
procedures 

4.  Developing  a system  design  which  utilizes  the 
techniques  and  procedures  to  meet  user  require- 
ments in  a cost-effective  manner  (ref.  S) 

5.  Providing  limited  demonstration  of  an  end-to- 
end  system  approach 

6.  Updating  system  design  and  developing  an 
operational  data  system 

Along  with  the  user  requirements  for  data  con- 
tent, FAS  established  general  guidelines  for  the 
system.  The  guidelines  were  prioritized  for  support 
of  such  design  approach  tradeoffs  as  timeliness  of 
results,  ease  in  developing  the  system,  cost  of  operat- 
ing the  system,  and  accuracy  of  results. 


U8DA  8Y8TEM  DE8IQN  STUDY 

The  identification  of  USDA  user  requirements 
was  the  first  step  toward  designing  a specific  USDA 


1097 


processing  system.  The  design  was  to  provide  the 
functional  capabilities  required  for  the  inventory  and 
reporting  of  agricultural  crops  and  was  to  include  the 
capability  to  be  transferred  to,  and  to  interface  with, 
existing  user  facilities.  The  complete  USD  A system 
(designated  ATS)  was  to  provide'  functional 
capabilities  for  system  management,  data  acquisi- 
tion, analysis,  reporting,  and  evaluation  (refs.  2 and 
8).  The  ATS  provides  the  processing  capabilities 
necessary  for  the  transfer  and  evaluation  of  required 
LACIE  technology. 

The  design  of  the  ATS  emphasized  system 
transferability  characteristics  (i.e.,  use  of  high-level 
computer  programing  languages)  as  well  as  the 
ability  to  readily  accommodate  change.  The  resultant 
design  was  consistent  with  the  mqjor  system  con- 
straints; i.e.,  timeliness  of  results,  modularity,  total 
“off-the-shelf’  components,  cost  effectiveness,  and 
accuracy.  The  design  also  stressed  the  development 
of  an  operational  system  responsive  to  USD  A user 
requirements.  Thus,  the  ATS  employed  rather  than 
developed  state-of-the-art  technology.  The  system 
relies  on  off-the-shelf  components  of  limited 
specialization  and  is  capable  of  responding  to  state- 
of-the-art  developments  in  hardware  and  software 
technology  through  modular  changes.  This  allows  for 
easy  expansion  of  the  ATS  to  provide  a worldwide, 
multicrop  information  system. 

Finally,  the  ATS  was  designed  for  ease  of  utiliza- 
tion. It  is  intended  for  use  by  skilled  resource 
analysts  who  normally  will  not  be  remote-sensing 
specialists.  It  was  also  important  that  the  system  sup- 
port a non-labor-intensive  operation.  This  implies 
that,  where  possible,  operations  are  to  be  automated 
with  manual  intervention  kept  to  a minimum. 

The  study  performed  by  FACC  demonstrated 
that  the  strenuous  system  requirements  could  be  met 
with  the  following  state-of-the-art  systems  compo- 
nents. 

1.  Minicomputers 

2.  Interactive  image  processors 

3.  Low-cost  array  processors 

4.  High-fidelity  color  displays 

5.  Integrated  data  base  management  systems 

Based  on  the  study  and  on  other  industrial  sur- 
veys, the  USD  A issued  a Request  for  Proposal 
(RFP)  for  120-day  delivery  of  the  ATS.  The  system 
was  to  conform  to  the  earlier  guidelines,  but  addi- 
tional performance  and  capabilities  requirements 
were  imposed,  including  the  following. 

1.  The  ATS  must  be  capable  of  incorporating 
from  one  to  five  analyst  stations. 


2.  A 117-  by  196-pixel,  four-channel  image  must 
be  classified  into  eight  classes  in  less  than  10  seconds, 
using  the  maximum  likelihood  classification  rule. 

3.  A 512-  by  512-pixel  image  must  be  similarly 
classified  in  less  than  60  seconds. 

4.  A 117-  by  196-pixel,  four-channel  image  must 
be  clustered  into  30  classes  in  less  than  30  seconds. 

5.  The  ATS  must  process  full-frame  Landsat  im- 
agery. 

6.  The  ATS  must  provide  an  integrated  data  base 
system  for  the  management  of  massive  volumes  of 
data. 

7.  The  ATS  must  include  an  integrated  interactive 
query  against  the  data  base. 

8.  The  ATS  must  provide  extensive  display 
capabilities,  comprehensive  analyst  support  func- 
tions, and  pattern  recognition  functions. 


THE  APPLICATION  TE8T  8Y8TEM 

The  FACC  provided  the  processing  system 
shown  in  figure  1 to  the  USD  A for  its  ATS.  The  ATS 
represents  a cost-effective,  expandable  system  (ref. 
8).  The  host  processor  of  the  ATS  is  a Digital  Equip- 
ment Corporation  (DEC)  Programed  Data  Processor 
Model  11-70  (PDP  11-70).  The  PDP  11-70  is  a dual- 
bus computer  capable  of  data  rates  from  0.8 
megabytes  on  the  massbus,  which  is  required  in  a 
data-driven  system.  The  effective  utilization  of  the 
cache  memory  buffer  provides  an  effective  instruc- 
tion cycle  time  of  300  nanoseconds.  The  main 
memory  in  the  ATS  host  computer  is  512  kilobytes, 
expandable  to  4 megabytes. 

The  processing  load  incurred  by  the  clustering  and 
classification  of  image  data  is  met  by  the  floating 
point  system’s  AP120B  programmable  floating-point 
array  processor.  The  API20B,  a “pipeline”  type  pro- 
cessor, is  configured  to  provide  the  results  of  an  addi- 
tion and  a multiplication  every  333  nanoseconds, 
with  an  expression  capability  every  167  nano- 
seconds. The  use  of  the  AP120B  and  well-balanced 
system  software  enabled  the  ATS  to  meet  the 
stringent  system  throughput  requirements. 

Image  display  and  analyst  interaction  are  provided 
at  each  of  the  three  analyst  stations.  The  ATS  can 
support  up  to  five  stations  if  expansion  requires.  The 
image  display  equipment  is  an  International  Imagery 
Systems  (IIS)  Model  70  with  nine  512  by  512  8-bit 
refresh  memories  and  three  graphics  planes.  It  in- 
cludes two  512  by  512  color  monitors  with  a cursor 
under  trackball  control.  The  ATS  was  delivered  ini- 


1098 


FIGURE  l . — ATS  configuration. 


tially  with  one  analysis  station  and  has  been  ex- 
panded to  three  analyst  stations. 

The  on-line  storage  of  data  for  the  ATS  is  pro- 
vided by  two  300-megabyte  disks.  The  3330-type  disk 
drives  are  capable  of  read/write  operations  at  a data 
rate  of  1 .2  megabytes  per  second.  The  system  was  in- 
stalled with  the  disks  on  the  PDP  11-70  unibus  and  a 
read/write  rate  of  600  kilobytes  per  second.  The  disk 
will  be  relocated  to  the  massbus  and  upgraded  to  a 
full  1.2-megabyte-per-second  input/output  capability 
in  late  1978. 

The  key  to  efficient  hardware  performance  in 
response  to  analyst  commands  is  the  FACC  Integra- 
ted Multivariate  Data  Analysis  and  Classification 
System  (IMDACS),  which  has  been  under  continu- 
ing refinement  for  multispectral  scanner  and  seismic 


data  applications  for  S years.  The  basic  structure  of 
1MDACS  is  shown  in  figure  2.  The  IMDACS  oper- 
ates  on  the  PDP  11-70  under  the  DEC  Interactive 
Applications  System  (IAS),  providing  the  user  the 
capability  to  select  and  execute  the  major  processing 
functions  interactively  via  the  alphanumeric/ 
graphics  terminal.  The  tutorial  menu  prompts  dis- 
play to  the  user  the  processing  options  that  are 
available  and  the  definitions  of  input  parameters  to 
be  specified  for  the  processing  function.  All  analyst 
transactions  are  logged  and  may  be  output  to  the  line 
printer  upon  conclusion  of  the  processing. 

The  application  software  is  structured  along  func- 
tional lines  to  support  the  processing  steps  required 
in  performing  digital  image  analysis.  The  major  soft- 
ware functions  are  summarized  as  follows. 
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• TAM  TO  DIM 
TRANSFER 

• PREPROCESS  CHANNEL 
HISTQORAM 

• PREINITIALIZE  DlflAL A V/ 
ENHANCEMENT  PARAMETERS 

• CHANNEL  COMBINATION 
AND  RATIO 

• GEOMETRIC  CORRECTIONS 


• TAPE /DISK  FILE 
TO  OISPL  AY 

• BLACK/WHITE  DISPLAY 

• COMPOSITE  COLOR 
DISPLAY  (FALSE  ON 
TRUE) 

• linear/nonlinear 

CONTRAST  STRETCH 

• FIXED  FUNCTION 
KEY  MULTIPLE 
IMAGE  DISPLAY 

• CLASS  MAP  DISPLAY 

• CLUSTER  MAP  DISPLAY 

• IMAGE  ROTATION 

• ARITHMETIC 
FUNCTIONS 


• RECTANGULAR  MODE 
FIELD  DEFINITION 

• POL  YOON  MOOS  FIELD 
OEFINITiON 

• STORE.  RETRIEVE. 
OELETE  MAXIMUM 
OF  100  FIELDS 

• FIELD  FILE 
REPORTS 

P FIXED  FUNCTION 
KEY  FIELD  OVERLAY 

• FIELD  ANNOTATION 

• SINGLE  PIXEL 
FIELDS 

P LINE  FIELDS 

P PIXEL  READOUT 

p GRAPHICS 


P PARAMETRIC 
STATISTICS 
COMPUTATION 

p btoraoe; 

RETRIEVAL  OF 
SIGNATURES 
p SIGNATURE 
FILE  EDITINO 
P FIELD/CHANNEL 
SELECTION 
p SIGNATURE 
DISPLAY  ON  LINE 
PRINTER 

P FACTOR  ANALYSIS 
p FEATURE  SELECTION 


P HARDWARE 
CLASSIFICATION* 

P SOFTWARE 
CLASSIFICATION1* 
FOR  UP  TO  24 
CHANNELS 
P CLASS  RESULTS 
AND  THRESHOLD 
VALUES  OUTPUT 
TO  DISK 

P CLASS  RESULTS 
AS  SYMBOL  MAP 
TO  LINE  PRINTER 
P DELTA  CLASSIFIER 


P ADAPTIVE 
CLUSTERINO 
P ITERATIVE 
CLUSTERINO 
p CLUSTERINO 
FOR  UP  TO  IS 
CHANNELS 
P CLUSTER  RE 
SUIT  ANO 
THRESHOLD 
VALUES  TO 
DISK 

P CLUSTER  RE 
SULTSAS 
SYMBOL  MAP 
TO  LINE 
PRINTER 


■MAXIMUM  LIKELIHOOD  ANO  MIXTURE  OENSITY  CLASSIFICATION  IN  API 206 
•’MAXIMUM  LIKELIHOOD  CLASSIFICATION 


FIGURE  2. — IMDACS  softwsre  structure. 


1.  Input  Command  Processor.  The  applications 
software  supervisor  and  common  point  of  interface 
for  all  IMDACS  processors 

2.  LOAD.  Provides  the  capability  to  load  digitally 
formatted  imagery  data  from  computer-compatible 
tape  (CCT)  to  disk;  also  included  are  the 
preinitialization  of  image  enhancements  and  display 
options,  preprocessing  of  histogram  tables,  and  chan- 
nel combinations  (linear,  ratio,  or  normalization) 

3.  IMAGE.  Provides  the  capability  to  format, 
enhance,  and  display  imagery  from  selected  data 
channels  interactively 

4.  FIELD.  Provides  the  capability  to  define,  an- 
notate, and  save  irregular-shaped  fields;  also  pro- 
vided are  file  maintenance  utilities  and  fixed  func- 
tion key  capabilities  for  automatic  recall  and  display 
of  previously  defined  fields  and  annotations 

5.  STATS.  Provides  for  the  computation,  display, 
and  storing  of  spectral  signatures  for  defined  fields; 
also  included  are  related  signature  manipulation 
capabilities  and  the  feature  selection  function 

6.  CLASS.  Performs  maximum  likelihood  and 
mixture  density  classification  of  defined  fields  and 
outputs  classification  results  in  the  form  of  class  map 
files 

7.  CLUSTER.  Performs  adaptive  and  interactive 


clustering  and  outputs  cluster  results  in  the  form  of 
cluster  map  files 

Communication  among  the  processors  is  facili- 
tated by  a unified  file  structure.  For  example, 
statistics  files  can  be  built  by  either  the  statistics  pro- 
cessor or  the  clustering  processor  and  can  be  used  to 
initialize  either  clustering  or  supervised  classification 
processes.  Thus,  in  addition  to  classical  pattern 
recognition  processing  sequences.  IMDACS  can  con- 
trol new  procedures  such  as  “small  fields,"  a L ACIE- 
devdoped  area  classification  procedure.  In  the  small- 
fields  procedure,  training  and  test  regions  or  picture 
elements  (pixels)  are  labeled  by  the  analyst.  Statistics 
of  the  training  regions  are  computed,  and  the  result- 
ing cluster  statistics  are  then  labeled  in  accordance 
with  the  labeled  training  data  which  arc  spectrally 
nearest  the  particular  cluster  mean.  Mixture  density 
classification  is  then  initialized  with  the  cluster 
statistics,  and  signatures  are  grouped  by  class  auto- 
matically. 

The  following  image  processing  performance  time 
periods  have  been  measured  on  the  ATS; 

1.  117  by  196  pixels,  four  channels,  eight-class 
runs  in  8.8  seconds 

2.  512  by  512  pixels,  four  channels,  eight-class 
runs  in  57.8  seconds 
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3.  117  by  196  pixels,  four  channels,  30-duster 
runs  in  16  seconds 

The  performance  times  are  achieved  with  a com* 
bination  of  high-speed  input/output  and  high-speed, 
special-purpose  peripherals.  This  performance,  com- 
bined with  the  IAS  multitask  capability  and  the  IM- 
D ACS  throughput  efficiency,  provides  systems  capa- 
ble of  testing  and  evaluating  various  technologies 
and  operational  procedures  for  the  processing  of 
remote-sensing  data  in  a specific  user  environment 

System  capabilities  for  supporting  nonimagery 
data  processing  functions  and  for  providing  ancillary 
data  support  for  image  processing  are  provided  by 
the  Culliane  Integrated  Date  Management  System 
(IDMS-1 1)  and  the  FACC  Query  and  Report  Writer, 
both  of  which  have  been  implemented  on  the  PDP 
11-70  under  the  IAS.  The  IDMS  was  developed  in 
strict  compliance  with  the  CODASYL  standards. 


SUMMARY  AND  CONCLUSIONS 

The  incorporation  of  new  technology  into  the 
user's  operations  is  critical  to  the  development  of  any 
application  system.  The  ATS  is  an  example  of  a 
system  with  this  capability,  where  LACIE  tech- 
niques and  procedures  were  merged  with  USDA  re- 
quirements to  define  the  design  approach.  The  goal 
of  the  definition  and  design  study  was  to  couple 
overall  feasibility  with  an  extensive  and  diverse  pro- 
cessing capability  which  minimized  manpower  re- 
quirements. The  design  was  translated  into  ATS  re- 
quirements; the  ATS  was  implemented  according  to 
these  requirements;  and  evaluation  report  criteria 
were  defined  for  t*  mology  transfer  (ref.  9).  The 
ATS  is  now  successfully  supporting  USDA  activities 
in  Houston,  Texas. 

The  ATS  is  modular  and  can  be  expanded  easily 
and  modified  piecewise  as  requirements  may  change 
because  of  changes  in  quantity  or  quality  of  input 
data  or  because  of  the  desire  and  the  ability  to  extract 
more  and/or  relevant  information. 


REFERENCES 


1.  A Cost  Benefit  Evaluation  of  the  LANDS  AT  Follow-On 
Operational  System.  OSFC  X-903-77-49  (Greenbelt,  Md), 
Mar.  1977. 

2.  Benson,  J.  L„  el  al.:  A System  Sub-System  Deats n for  the 
USDA  Which  Will  Provide  a User  Advanced  System.  SISO- 
TR633,  Rev.  A,  Ford  Aerospace  A Communications  Corp. 
(Houston),  Jsn.  1977. 

3.  Benson,  J.  1.;  and  Tarbet,  I.  D.:  Technique  Validation  Ap- 
proach Document  for  thq  USDA  Advanced  System  Study. 
SISO-TN823,  Ford  Aerospace  A Communications  Corp. 
(Houston).  Jan.  1977. 

4.  Benson.  J.  L.;  and  Tarbet,  J.  D.:  Detailed  Techniques  Flow 
and  Timing  Analysis  for  the  USDA  Advanced  Syatems 
Study.  StSO-TN82$,  Ford  Aerospace  A Communications 
Corp.  (Houston),  Jsn.  1977. 

5.  Benson.  J.  L.;  and  Tarbet,  J.  D.:  A Model  to  Optimire  Selec- 
tion of  System  Elements.  SISO-TN826,  Ford  Aerospace  A 
Communications  Corp.  (Houston).  Jan.  1977. 

6.  Benson.  J.  L..  and  Tarbet.  J.  D ; Report  of  Simulation  Assess- 
ment for  the  USDA  Advsnced  System  Study  SISO-TN827, 
Ford  Aerospace  A Communications  Corp.  (Houston).  Jsn. 
1977. 

Benson.  J.  L.:  Minicomputer  Data  Base  Management  System 
Comparison  Evaluation.  SISO-TR629,  Ford  Aerospace  A 
Communications  Corp.  (Houston),  Aug.  1977. 

8.  Benson.  J.  L , el  al.:  Functional  Requirements  Specification 
for  the  USDA  Advanced  System  Study.  SISTR626.  Rev.  A. 
Ford  Aerospace  A Communications  Corp.  (Houston).  Jan 

1977. 

9,  Tarbet.  J,  D..  Bradford.  L.  H.;  and  Purnell.  R.  F.:  On  the 
Transfer  of  Remote  Sensing  Technology  to  an  Operational 
Data  System  Proceedings.  Machine  Processing  of  Remotely 
Sensed  Data.  Purdue  University  (West  Lafayette.  Ind  ).  June 

1977. 


1101 


N80 


15524 


Resource  Modeling:  A Reality  for  Program  Cost 

Analysis 

L D.  Fours0  and  R.  L.  Hunt0 


INTRODUCTION 

The  ever-important  question  of  monetary 
resources  required  for  the  operation  of  a government 
program  can  be  presented  in  several  ways.  This  re- 
port conveys  the  approach,  implementation,  opera- 
tion, and  utilization  of  a model  to  establish  capital  in- 
vestment and  operational  costs  based  on  their  inter- 
relationships, dependencies,  and  alternative  actions. 


BACKGROUND 

From  its  inception,  the  LACIE  had  a stated  ob- 
jective to  determine  the  cost  effectiveness  of  utilizing 
satellite  and  surface-derived  data  to  monitor  crop 
production  and  assess  the  impacts  of  agricultural  and 
meteorological  conditions  affecting  potential  produc- 
tion. The  Office  of  Management  and  Budget  (OMB) 
wanted  to  know  the  cost  of  such  an  operational 
system.  Senior  U.S.  Department  of  Agriculture 
(USDA)  management  needed  to  know  the  costs  as- 
sociated with  the  implementation  and  operation  of 
this  type  of  system  to  make  decisions  on  future  com- 
mitments to  the  effort. 

The  determination  of  all  cost  factors, 
interrelationships,  and  countless  decision  alterna- 
tives posed  a complex  problem.  The  straight  analyti- 
cal approach  would  accomplish  the  identification  of 
cost  factors  and  interrelationships,  but  to  calculate 
the  coats  based  on  the  interrelationships  and  count- 
less configurations  and  decision  alternatives  still 
posed  a monumental  task.  Thus,  the  concept  of 
developing  a model  to  assess  the  costs  provided  a 
logical  and  viable  approach. 


atJ.S.  Dcpsnmctii  of  Agriculture,  Houston , Texas. 


The  concept  of  modeling  to  provide  information 
on  which  to  base  decisions  it  not  new,  although  each 
model  has  unique  attributes  that  are  dependent  on 
the  environment  to  be  modeled.  The  cost  model 
developed  for  use  in  the  USDA  Applications  Tot 
System  (ATS)  environment  was  designed  using  basic 
cost  accounting  principles  integrated  with  unique 
cost  attributes.  The  model  is  a multiple  of  major  cost 
dements  comprised  of  interrelated  components  that 
contribute  directly  or  indirectly  to  the  total  estimated 
costs.  Three  major  cost  dementi  have  been 
categorized  into  capital  investments  and  operational 
costs  and  summarized  into  the  standard  government 
accounting  classification  object  classes. 

The  modd  provides  a tool  for  management  to 
analyze  potential  impacts  of  alternative  scenarios  in 
a timely  and  efficient  manner.  The  initial  use  of  the 
modd  was  to  provide  estimates  of  the  resource  re- 
quirements, investments,  and  operational  costs  asso- 
ciated with  a future  operational  USDA  crop  assess- 
ment program.  The  modd  has  been  continually 
modified  to  meet  changing  requirements  and  pres- 
ently provides  investment  and  operating  costs  by 
designated  scenarios,  personnel  staffing  reports, 
budget  projections  by  decision  package,  and  required 
automatic  data  processing  (ADP)  information  for 
OMB  reports. 


MKTHODOLOGV 

The  information  generated  from  the  modd  can  be 
presented  in  various  wsys,  depending  on  the  in- 
tended use.  The  output  formats  were  dictated  by  an 
andysis  of  the  various  users  and  consideration  of  the 
user's  purpose  for  requiring  the  data.  Two  major  eco- 
nomic considerations  are  reflected  in  the  output 
from  the  modd.  The  first  is  the  manner  in  which  to 
present  s 10-year  cost  projection  encompassing  a 
system  life  of  8 years.  The  second  consideration  is 
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that  of  the  "sunk  coat"  concept.  The  rationale  used  in 
adapting  to  these  considerations  is  presented  in 
respective  order. 

One  method  of  promoting  the  cost  projections  is 
the  accounting  concept  of  depredation,  which  amor- 
tizes the  cost  of  capital  investment  over  the  life  of 
the  system.  This  is  viewing  the  investment  as  a pre- 
paid operating  expense;  however,  a major  disadvan- 
tage is  that  this  does  not  reflect  the  projected  actual 
cash  flow  in  respect  to  time.  Another  method  is  the 
"present  value"  rule,  which  equates  the  fliture  capital 
and  operating  expenditures  to  the  present-day  value 
of  dollars.  The  technique  of  discounting  the  future 
cash  flow  with  respect  to  the  time  incurred  at  an  ap- 
propriate rate  of  interest  is  used  to  make  the  adjust- 
ments to  equate  present  value. 

In  accordance  with  the  OMB  requirements,  the 
latter  method  was  used  to  derive  the  present  value  of 
resource  costs  over  the  10-year  lifespan  of  the  pro- 
posed production  system.  The  required  discount  in- 
terest rate  of  10  percent  will  be  used  (ref.  1).  The 
present  value  has  been  calculated  and  is  reflected  in 
the  summary  and  detailed  reports. 

The  question  of  “sunk  cost"  lies  in  the  definition 
and  adaptation  as  it  applies  to  the  environment  being 
modeled.  Sunk  costs  are  non  recoverable  resources 
that  have  been  consumed  as  the  result  of  a prior  deci- 
sion and  have  no  direct  operational  benefit  (ref.  2). 
Sunk  costs  are  not  altered  by  a change  in  the  level  or 
nature  of  an  activity  and  have  no  bearing  on  current 
investment  decisions. 

The  utilization  of  satellite  remote  sensing  is  tech- 
nology oriented,  and  the  development  of  this  tech- 
nology is  so  dynamic  that  extensive  research, 
followed  by  application  development,  is  necessary  to 
exploit  potential  capabilities.  Therefore,  the  costs  re- 
lated to  LACIE  research  and  development  and  Land- 
sat  are  considered  sunk  costs  and  were  not  included 
in  vosting  the  future  USDA  system.  The  costs  associ- 
ated with  the  application  development  and  test 
phases  were  included  since  the  techniques,  pro- 
cedures, capabilities,  and  equipment  would  be  of 
direct  benefit  to  the  establishment  of  a future  opera- 
tional system. 


in  the  model  development  to  guide  the  inclusion  and 
manipulation  of  the  various  cost  factors. 


Qtntnrt 

1.  A timespnn  of  10  years  was  used,  representing 
the  procurement  of  hardware  with  respect  to  the 
phase-in  of  the  operational  system  and  the  remaining 
life  expectancy  of  the  system. 

2.  The  LACIE  costs  are  classed  as  sunk  costs  and 
therefore  are  not  included  in  the  total  cost  for  the 
operational  system. 

3.  Costs  associated  with  the  procuring  and 
launching  of  a satellite  are  not  to  be  included  in  the 
total  cost.  However,  the  cost  of  the  product  (digital 
image  data)  as  provided  by  NASA  is  included  in  the 
total  cost. 

4.  Current  General  Services  Administration 
(GSA)  facility  rental  rates  are  used  for  each  potential 
location. 

5.  Departmental  and  agency  budgeting  policies 
were  followed  to  derive  various  cost  factors  used  in 
the  resources  calculations. 

6.  Personnel  salaries  are  projected  based  on  actual 
and  projected  positions  and  will  be  inflated  S percent 
each  year  for  cost-of-living  increases. 


Hardware 

The  required  computer-related  hardware  will  be 
purchased. 


Software 

1.  Operating  system  software  will  be  purchased. 

2.  The  application  programs  will  be  developed 
ana  implemented  as  a joint  effort  by  contractors  and 
USDA  personnel. 

3.  The  conversion  programs  will  be  developed 
and  programed  by  USDA  personnel. 


DataBaaa 


ASSUMPTIONS 

i 

In  order  to  establish  model  requirements,  assump- 
tions were  made  to  guide  the  collection  and  evalua- 
tion of  pertinent  data.  These  assumptions  were  used 


1.  The  design  and  implementation  of  the  data 
base  will  be  accomplished  by  USDA  personnel. 

2.  The  digital  image  processing  system  design 
provides  for  one  or  more  resident  geographically 
oriented  dau  bases. 
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Personnel 

1.  Total  manpower  requirements  will  be  dictated 
by  management  and  LSDA  ceiling  limitations. 

2.  Operational  manpower  requirements  will  be 
assessed  based  on  hardware  configurations. 

3.  Startup  personnel  will  be  fully  trained  in  the 
exf  ..ri  mental  environment  and  transferred  to  the 
production  system,  thus  eliminating  consideration  of 
mqjor  training  costs. 


Support  Services 

1.  The  receiving  station  and  preprocessing  of 
satellite  data  to  USDA  requirements  will  remain  at 
the  NASA  Goddard  Space  Flight  Center  (GSFC). 

2.  The  GSFC  will  provide  imagery  data  in  accord- 
ance with  USDA  requirements. 


Facilities 

1.  The  operational  system,  equipment,  and  per- 
sonnel will  be  located  in  a USDA  facility. 

2.  Facilities  will  require  a site-preparation  charge. 

3.  Security  and  utility  services  will  be  accounted 
for  in  the  facility  rental  rates. 

4.  Charges  for  utilities  for  second-  and  third-shift 
operations  will  be  based  on  trends  of  actual  charges 
incurred  try  the  existing  USDA  computer  facilities. 


ENVIRONMENT 

Initial  model  development  and  operation  was  per- 
formed using  a Digital  Equipment  Corporation 
(DEC)  computer  11-45.  Since  the  procurement  of  a 
DEC  11-70  by  USDA,  the  model  has  been  trans- 
ferred and  is  operational  on  the  DEC  11-70.  The 
FORTRAN  programing  language  was  used  because 
it  lends  itself  to  the  concepts  of  modular  programing 
through  the  use  of  subroutines  and  is  more  efficient 
in  data  manipulation  and  calculation. 


MODEL  DESIGN  AND  DEVELOPMENT 

The  approach  for  design  and  development  of  the 
cost  model  was  to  identify  the  cost  categories  in  the 
form  of  stated  objectives.  The  objectives  are  a series 
of  cost  elements  that,  when  combined,  provide  the 


total  cost.  Figure  1 provides  a graphic  view  of  this 
statement. 

Each  cost  element  consists  of  components  that 
contribute  directly  or  indirectly  to  the  costs.  These 
components  are  identified,  the  interrelationships  are 
determined,  and  the  components  are  formulated  into 
a model. 

The  model  has  been  developed  to  process  cost 
trade-offs  dependent  on  alternative  management 
decisions  and  to  assess  cost  variations  resulting  from 
incorporation  of  new  technology,  optional  system 
configurations,  changes  in  volume  of  meteorological 
and  satellite  imagery  data  to  be  processed,  and  fre- 
quency of  processing  reports.  The  resulting  reports 
from  the  model  provide  the  data  to  derive  a range  of 
expected  costs. 

The  objectives  which  formed  the  base  for  the 
model  are  the  major  elements  that  contribute  to  the 
cost  of  the  system.  When  reported,  they  are  grouped 
into  investment  and  operating  costs.  The  cost  catego- 
ries are  identified  as  Hardware,  Software,  Conver- 
sion, Data  Base,  Relocation  Expenses,  Personnel, 
Facilities,  ADP  Services,  Support  Services,  Research 
and  Development,  Administrative  Support,  and 
Other. 

The  basic  concept  of  the  model  is  for  each  major 
cost  element  (stated  objective)  to  perform  as  a sepa- 
rate program  in  calculating  costs.  Each  program  con- 
tains data  dependency  relationships,  algorithms  for 
data  calculation,  and  predefined  interrelationships 
between  cost  element  programs.  The  relationship  of 
one  cost  element  to  another  within  the  model  as  they 


COMPONENTS  COMPONENTS  COMPONENTS 


FIGURE  I.—  Design  and  development  of  the  cost  model. 
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provide  results  to  a summary  report  is  depicted 
graphically  in  figure  2. 

Data  are  input  to  the  model  via  computer  ter* 
minal,  although  data  may  or  may  not  be  entered  for 
each  cost  element.  The  baseline  data  are  maintained 
in  the  model's  data  file. 

Each  program  (cost  element)  extracts  the  ap- 
propriate data  from  the  file  and  performs  predefined 
functions.  Some  data  are  passed  from  one  program  to 
another  and  are  dependent  on  a predefined  relation* 
ship,  thus  providing  the  inputs  necessary  for  the 
receiving  program  to  perform  its  calculations.  Tables 
are  used  to  test  alternative  assumptions  and  to  pro- 
vide cost  factors. 

The  results  from  each  cost-element  program  are 
summarized  into  investment  and  operating  costs  by 
year.  In  addition,  the  yearly  costs  are  discounted  to 
present  value  and  are  summarized  in  a report. 
Detailed  procedures  for  the  development  of  the 
model  are  found  in  a USD  A LACIE  document,"  Ap- 
proach to  Cost  Analysis”  (ref.  3). 

The  results  obtained  from  the  model  (I)  are  used 
to  assess  and  influence  the  design  and  development 
aspects  of  the  USDA  ATS;  (2)  provide  management 
with  a tool  that  can  increase  the  competence  of  man- 
agement decisions;  (3)  guide  management  in  deci- 
sions on  scheduling  equipment  procurement;  and  (4) 
are  used  to  assess  and  influence  future  manpower 
and  budget  planning. 


ALTERNATIVE  CONSIDERATIONS 

The  total  costs  are  based  on  the  combination  of 


FIGURE  Z.— Cost  model. 


the  cost  elements  bound  by  the  stated  assumptions. 
The  sensitivity  of  these  cost  elements  as  they  affect 
the  total  cost  is  tested  through  alternatives.  Each 
alternative  represents  some  degree  of  impact  on  the 
costs.  Several  of  these  alternatives  are  presented  here 
to  provide  an  understanding  of  model  capabilities. 

Alternative  comparison  capability  is  provided 
through  a Compare  Routine.  This  routine  compares 
the  Summary  File  created  by  the  model  for  various 
alternatives  against  a designated  base  file  and  outputs 
a summary  deviation  report  by  cost  element.  Several 
tables  provide  cost  factors  and  algorithms  for  iden* 
tifted  alternatives. 

Alternative  hardware  configurations  are  tested 
through  the  establishment  of  a file  for  each  con- 
figuration. The  model  is  then  run  for  each  configura- 
tion, and  the  summary  totals  are  input  to  a Compare 
Routine.  This  routine  prepares  a report  on  the  devia- 
tions from  the  base  configuration  as  determined  by 
the  system  design  personnel. 

Alternative  personnel-manpower  approaches  are 
tested  by  varying  the  numbers  and  types  of  posi- 
tions, creating  a file  for  each  alternative.  The  alterna- 
tives for  personnel  are  closely  associated  with  the 
alternative  hardware  configuration  and  management 
decisions  on  the  extent  of  goals  and  countries  to  be 
monitored. 

Other  alternative  considerations  by  cost  element 
are  shown  in  the  form  of  a decision  tree. 


Software 

Costs  vary  based  on  the  method  of  procurement, 
as  shown  in  figures  3 and  4. 


FIGURE  3.—  Procurement  of  systems  software. 
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Support  Servlet* 

Costs  arc  impacted  based  on  the  method  of  acquir- 
ing  source  data.  The  Alternatives  Table  provides  cost 
factors  and  algorithms  regarding  method  of  com- 
munication and  cost  of  data.  Figures  5.  6,  and  7 
depict  the  various  types  of  support  services  con- 
sidered. 


Facilltlos 

Costs  for  facilities  vary  depending  on  the  poten- 
tial location  and  extent  of  modifications  (fig,  8).  The 
Alternatives  Table  provides  the  GSA  lease  rates  for 
six  different  geographical  locations.  The  table  can  be 
updated  at  any  time,  hence  not  confining  it  to  any 
particular  location. 


MODEL  CONSTRUCTION 

The  model  is  constructed  of  12  separate  cost 
modules  linked  by  a main  summary  program 
module.  In  addition,  tables  related  to  alternative  ap- 
proaches are  called  by  the  various  modules  to  pro- 
vide data  for  the  calculation  of  alternative  costs. 
Subroutine  modules  provide  the  data  manipulations, 
calculations,  and  outputs  for  the  budget  projections, 
personnel  staffing  reports,  and  comparison  of  costs 
for  alternatives.  Figure  *>  is  a simplified  flow  diagram 
of  the  model.  The  government  accounting  class 
codes  arc  incorporated  for  budget  class  determina- 
tion through  direct  entry  or  internal  programing  (ref. 
4). 


MODEL  OPERATION 

Each  of  the  cost-clement  modules  has  its  own  data 
tiles  plus  any  additional  files  passed  to  the  module 
from  another  module.  The  various  module  fiLs  urc 
updated  based  on  a scenario  to  be  tested.  To  provide 
an  understanding  of  the  operation  of  the  cost-ele- 
ment modules,  each  will  he  discussed  in  relation  to  a 
simplified  data  flow  diagram  and  the  stated  objective 
of  that  module.  Examples  of  the  detailed  outputs 
from  the  modules  are  found  in  appendix  A.  All 
modules  output  detailed  data  and  summary  totals  to 
the  Budget  and  Summary  Files,  respectively.  Appen- 
dix B provides  examples  of  the  Investment  and 
Operational  Cost  Summary,  Detail  Budget  Projection 


FKtttRE  4.—  Procurement  of  application  software. 


KldlRE  S. — Types  of  communication  media  available. 


Htitlf  *.—•  Acquisition  of  prcprocrsscd  satellite  data. 


11(11  RE  7.—  Acquisition  of  meteorological  data. 
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FIGURE  9.—  Simplified  How  diagram  of  cost  model. 


FIGURE  10.—  Hardware  Module. 


Report,  Budget  Projection  Summary  Report,  Person- 
nel Staffing  Profile  Report,  and  Skill  Level  Summary 
Report. 

Hardware  Module 

The  objective  of  the  Hardware  Module  (fig.  10)  is 
to  identify  hardware  components  to  be  procured  by 
year  and  calculate  the  total  cost  per  year,  total  overall 
cost,  and  other  operational  impacts.  The  input  of 
alternative  hardware  component  configurations 
allows  for  a cost-effective  analysis  between 
configurations  and  their  impacts  on  operations.  Basic 
data  related  to  the  hardware  components  are  input 
by  year  of  scheduled  acquisition  and  are  processed, 
generating  a detailed  hardware  cost  report  and  output 
files  for  use  by  the  Facility,  Software,  ADP  Services, 
and  Administrative  Support  Modules. 

Software  Module 

The  objective  of  the  Software  Module  is  to  iden- 
tify the  type  of  software  and  calculate  the  total  soft- 
ware costs  based  on  hardware  to  be  procured  and  on 
defined  software  requirements.  If  application  soft- 
ware requirements  are  not  defined,  then  the  module 
calculates  cost  based  on  a percentage  of  the  hardware 
costs  for  that  year.  A report  is  generated  containing 
detailed  costs  by  year  and  summary  costs. 

Conversion  Module 

The  objective  of  the  Conversion  Module  is  to 
record  and  provide  costs  associated  with  the  conver- 
sion of  data  files  and  application  software  from 
LACIE  to  the  USDA  environment.  This  module  pro- 
vides for  direct  input  of  defined  conversion  require- 
ments and  estimated  costs.  The  output  is  a detailed 
listing  of  the  costs  and  summary  totals. 

Data  Base  Module 

The  objective  of  the  Data  Base  Module  is  to  record 
and  provide  costs  associated  with  the  development, 
implementation,  data  collection,  and  purchase  of 
Data  Base  Management  software  programs.  This 
module  provides  for  direct  input  of  defined  func- 
tions and  estimated  costs.  The  outputs  include  a 
detailed  listing  of  costs  by  function  and  year,  plus 
summary  totals. 
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P«r*onn«l  Module 

The  objective  or  the  Personnel  Module  (fig.  1 1)  is 
to  identify  the  skill  levels  required  and  salaries  asso- 
ciated with  each  position  within  those  skill  levels. 
The  manpower  level  and  skills  are  analyzed  based  on 
functions  to  be  performed  and  the  performance  goals 
as  defined  in  the  Management  Plans  (refs.  5 and  6). 
Each  position,  grade,  step,  and  salary  is  input  for  the 
year  required.  The  program  calculates  the  succeeding 
year's  salary  based  on  a cost-of-living  percentage  in- 
crease. Promotions  are  accounted  for  by  entering  the 
new  salary  in  the  year  of  the  anticipated  promotion. 
A detailed  listing  is  printed  and  detailed  data  are  out- 
put to  a personnel  subroutine,  which  provides  a 
detailed  staffing  profile  report  and  a summary  of 
positions  by  skill  level.  Totals  are  passed  to  the  files 
or  modules  for  further  calculation  of  cost  impacts. 


Other  Investments  Module 

The  objective  of  the  Other  Investments  Module 
(fig.  12)  is  to  establish  other  initial  costs  incurred  in 
implementing  an  operational  system.  Detailed  cost 
components  identified  in  this  module  include  tele- 
phone installation,  furniture  procurement,  site 
preparation,  etc.  Telephone  installation  and  furniture 
costs  are  derived  from  algorithms  using  data  passed 
from  the  personnel  module  and  influenced  by  the 
Alternatives  Table.  Input  of  known  costs  can  be 
made  through  direct  entry  to  the  module  file. 


o m data 


FIGURE  II.—  Personnel  Module. 


FIGURE  12.- Other  Investments  Module. 
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Relocation  Module 

The  objective  of  the  Relocation  Module  (fig.  13)  is 
to  establish  the  cost  to  relocate  personnel  and  equip- 
ment. depending  on  the  number  of  personnel  and 
alternative  actions.  A table  is  updated  to  provide  the 
alternatives  with  regard  to  time  of  relocation  and 
number  of  personnel  to  be  relocated.  The  cost  associ- 
ated with  the  equipment  relocation  is  a direct  input. 
The  calculations  performed  involve  algorithms 
utilizing  data  from  the  Personnel  Module,  the  Alter- 
natives Table,  and  cost  factors  derived  from  analysis 
of  current  moving  costs.  The  outputs  include  a 
detailed  report  of  relocation  costs  and  detailed  and 
summary  data  passed  to  the  Budget  and  Summary 
Files,  respectively. 


KAVONNCl  AIICTNATIVM 

AIOOULE  DATA  TAILS 


FIGURE  13.—  Relocation  Module. 
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ADP  Sendees  Module 


The  objective  of  the  ADP  Services  Module  (fig. 
14)  is  to  establish  the  cost  of  services  directly  related 
to  the  support  of  ADP  operations.  Primarily,  these 
costs  are  for  equipment  maintenance  and  rental. 
Equipment  rental  data  are  entered  directly  into  the 
file,  whereas  the  calculation  of  maintenance  costs  is  a 
function  of  data  passed  from  the  Hardware  Module 
and  accumulated  for  a total  yearly  cost  plus  the 
added  cost  for  each  succeeding  year  of  the  equipment 
life.  Since  facility  space  is  impacted  by  rental  equip- 
ment, physical  space  data  are  passed  to  the  Facility 
Module  for  further  processing.  A detailed  report  is 
printed  and  the  respective  data  are  passed  to  the 
Budget  and  Summary  Files. 


Facility  Module 

The  objective  of  the  Facility  Module  (fig.  IS)  is  to 
establish  the  size  and  cost  of  facilities  required  to 
house  the  personnel,  equipment,  and  work  areas  as- 
sociated with  a USDA  environment.  Three  modules 
provide  input  data  to  this  module,  which  has  direct- 
entry  capability.  Additionally,  the  Alternatives  Table 
is  accessed  to  obtain  dollar  rates  for  various  locations 
based  on  type  of  space*.  The  calculation  of  the  space 
for  personnel  is  based  on  GSA  allowances. 
Algorithms  are  the  basis  for  establishing  costs  using 
the  table  factors  and  additional-shift  utility 
allowances.  A detailed  report  is  printed  containing 
the  total  square  feet  of  facility  required  by  type  of 
space  and  the  cost  for  that  space. 


Support  Service*  Module 

The  objective  of  the  Support  Services  Module  is  to 
identify,  record,  and  calculate  costs  incurred  for  ser- 
vices supplied  by  other  governmental  organizations 
and  private  enterprise  in  support  of  crop  assessment 
operations.  Two  key  alternatives  impacting  costs  are 
tested  in  this  module.  The  utilization  of  satellite  com- 
munications versus  courier  service  represents  signifi- 
cant variances  in  costs.  The  calculation  of  satellite 
communication  costs  is  dependent  on  the  volume  of 
data  and  time  of  transmission.  The  volume  of 
satellite  digital  data  required  also  impacts  the  cost  of 
buying  the  data  and  is  based  on  workloads  associated 
with  each  geographic  area  to  be  monitored. 
Therefore,  algorithms  using  the  data  volume,  which 


MODULI  DATA 


FIGURE  14.—  ADP  Services  Module. 


FIGURE  IS.—  Facility  Module. 


is  a direct  input,  calculate  both  the  cost  of  buying 
satellite  data  and  the  cost  of  transmitting  the  data. 
The  Alternatives  Table  supplies  the  algorithm  fac- 
tors, depending  on  the  alternative  to  be  tested.  Other 
capabilities  of  this  module  include  direct  entry  of 
known  required  services  and  associated  costs.  The 
output  is  a detailed  report  with  data  passed  to  the 
Budget  and  Summary  Files. 


Re**areh  and  Development  Module 

The  objective  of  the  Research  and  Development 
Module  is  to  record  and  provide  costs  associated 
with  defined  research  and  development  functions. 
Input  is  direct  through  creation  of  several  files,  and 
alternative  cost  approaches  are  integrated  into  the 
total  costs. 
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Administrative  Support  Module 


The  objective  of  the  Administrative  Support 
Module  (fig.  *6)  is  to  identify,  calculate,  and/or  pro- 
vide the  total  costs  associated  with  the  adminis- 
trative support  functions  of  a USDA  operational  en- 
vironment. Key  cost  components  established  in  this 
module  are  personnel  benefits,  travel,  training,  sup- 
plies. telephone,  and  administrative  overhead  costs. 
Factors  used  in  formulas  to  derive  the  costs  are  based 
on  historical  trend  data  of  the  department.  Personnel 
benefits  are  derived  as  a percentage  of  the  total  per- 
sonnel salary  costs  passed  from  the  Personnel 
Module.  Training,  supplies,  and  telephone  costs  are  a 
function  of  the  number  of  personnel  as  passed  from 
the  Personnel  Module.  The  administrative  overhead 
costs  are  calculated  on  the  total  operational  costs 
from  the  Summary  File  and  then  added  to  the  total 
cost.  Other  known  administrative  costs  may  be  input 
directly. 

The  main  summary  program  is  the  controlling 
program  of  the  cost  model.  Through  input  data,  it 
determines  which  table  files  to  access  and  which  files 
to  open  for  output  and  calls  the  subroutine  to  process 
the  data. 

The  main  summary  program  accepts  the  input 
data,  sets  up  the  calling  parameters,  calls  the  ap- 
propriate subroutine  to  process  the  da»a,  and  stores 
any  returned  data  or  parameters.  The  validity  check- 
ing is  performed  in  the  subroutine.  After  all  the  input 
data  have  been  processed,  the  main  summary  pro- 
gram produces  the  summary  report.  The  source  code 
for  the  main  summary  program  is  given  in  appendix 
C. 

The  main  summary  program  does  not  use  over- 
lays, since  it  and  the  associated  subroutines  execute 
in  26  000  bytes  of  core.  The  main  summary  program 
uses  standard  linkages  o(  the  CALL  and  parameter 
list  to  interface  with  the  subroutines. 


SUMMARY 

The  utilization  of  the  cost  model  has  provided 
data  to  OMB,  senior  USDA  M lagement.  and  Proj- 
ect Management  and  major  inputs  to  the  budget 
process  for  1977,  1978,  1979,  and  1980.  The  modular 
concept  of  the  model  simplified  its  design,  imple- 
mentation, and  operation.  Approximately  3 man- 
months  were  involved  in  the  design,  collection  of 
cost  factor  data,  and  development  of  the  interrela- 
tionships, algorithms,  and  alternative  test  capa- 


FIGCRE  16.—  Administrative  Support  Module. 


bilities;  programing,  implementation,  and  testing  re- 
quired 3 man-months.  The  model  was  operational  by 
July  1976  and  provided  the  detailed  ADP  cost  infor- 
mation for  the  OMB  requested  report  on  projected 
expenditures  for  fiscal  year  (FY)  1977.  A special 
OMB  presentation  in  September  1977  required  a 
detailed  analysis  of  costs  to  be  expected  in  an  opera- 
tional system.  The  model  was  used  to  generate  the 
data  and  provided  a range  of  expected  costs  depen- 
dent on  alternative  management  decisions.  The 
model  derived  the  information,  together  with  a 
detailed  Resource  Analysis  Report  which  suc- 
cessfully answered  OMB’s  questions  concerning 
costs  (ref.  6).  The  Resource  Analysis  Report  was  up- 
dated in  1977  and  amended  in  1978. 

During  1977,  it  became  apparent  that  the  exten- 
sion of  the  cost  model  into  the  budget  area  would  ex- 
pedite and  increase  the  accuracy  of  the  budget  projec- 
tions. Approximately  2 man-months  of  design  and 
programing  were  required  to  implement  the  budget 
routines  into  the  model.  The  mode!  was  used  to 
assess  the  cost  impacts  of  various  hardware  design 
configurations  and  influenced  the  selection  of  a cost- 
effective  design  and  related  specifications  used  in  the 
procurement  of  the  current  system  configuration. 
The  implementation  of  the  budget  routine  caiego- 
rizes  and  accumulates  the  cost  components  into 
government  accounting  classes  and  provides  both 
detailed  and  summary  budget  projections.  The 
budgets  submitted  for  1978,  1979.  and  1980  were 
directly  calculated  by  the  model.  With  the  initial  im- 
plementation of  zero-base  budgeting  (ZBB)  for  FY 
1979,  the  model’s  alternative  test  capabilities  easily 
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provided  the  budget  levels  for  the  ZBB  decision 
packages. 

From  early  1977  through  the  present,  the  model 
has  been  used  to  assess  cost  impacts  and  provide  per- 
sonnel  staffing  profiles  associated  with  alternative 
management  decisions.  It  has  been  instrumental  in 
adding  competence  to  the  management  decisions  in 
budgeting,  project  goals,  manpower  planning,  and  in- 
vestments in  procuring  equipment,  software,  and 
support  services. 
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Appendix  A 

Examples  of  Detailed  Module  Output 


DETAIL  ED  REPORT  • 

HARDWARE 

NAME 

OTY  PRICE  CURRENT 

FV2 

— FV  10 

DISK  CONTROLLER 

IS 

81 

SI 

SI 

• 8.1 

dim  UNITS 

M 

ISO 

list 

ISS.S 

—•100 

MAO  TARE  CONTROLLER 

14 

US 

US 

US 

—•US 

MAO  TARE  DRIVE 

26 

ISO 

2S0 

ISO 

— — 140 

graphic  tcrm/copier 

12 

TB 

76 

TB 

— ♦ 00 

CARO  ROR/PUNCH 

» 

16.0 

iso 

160 

— 160 

CARO  READER 

• 

6.6 

so 

oo 

— so 

LINE  fR inter  uoolp 

1 

316 

316 

ao 

•••'•SUBTOTAL 

total  hardware  

PRESENT  VALUE 

lORji- — 


DETAILED  REPORT  - SOFTWARE 

NAME 

CURRENT 

FV* 

FV* 

FV  10 

computer  os** 

JSC 

JSO— • 

0.0 

00 

ANALYST  STATION  SM 

380 

TOO  — 

00 

00 

COMMUNIC  INTERFACE 

SO 

*00  — 

00 

oo 

HOST  COMF  INTERFACE 

so 

0.0  — 

00 

oo 

HOOT  INTERFACE 

00 

00  — 

00 

o « 

ARRAY  FROC  SW 

TOO 

*0.0  — 

tjjSP 

APPLICATION  SOFTWARE 

ns# 

*ro  — •, 

23* 

SOFTWARE  DOCUMENTATION 

»» 

00 

7SBUOOET  AOJ 

00 

oo 

TOTAL  SOFTWARE 

7060 

*S3T 

PRESENT  VALUE  r-"S?S 

4000—* 

3206 

111* 

DETAILED  REPORT  - CONVERSION 


MANPOWER  RESOURCES 
CPU  TEST  TIME 
TEST  $TR  DEVICE 
78  BUDGET  AOJ 
TOTAL  CONVERSION 
PRESENT  VALUE 


CURRENT  FY  2 FV  9 FY  10 

110  20  0 *0.0  00  ^ 

oo  ioo — 

0 0 

0 0 

0 0 0 0 

&£rrxr  38  0 — * 0 0 0 0 


DETAILED  REPORT  • DATA  BASE 

NAME 

CURRENT  FV  2 FV* 

FV  10 

DATA  BASE  MGMT  PKG 

00  *S0 

00 

o o *5>^o.o 

► "0 

30 

DB  MOD  SW 

^00 

76 BUDGET  ADJ 

00 

DATA  BASE  COST 

00 

PRESENT  VALUE 

S# 

00 

OETAILED  REPORT  * RELOCATION 


ITEM  CURRENT  FV  2 FV  0 FV  10 

EQUIP  PACK  A SHIP  0 0 0.0— » OOjVfS^ 

RELOCATION  BENEFITS  318 

RELO  HOUSEHOLD  GOODS  17  5 

TOTAL  RELOCATION  0 70  0 

PRESENT  VALUE-^r'ofcR^A-r  82  7— *111  1 29  7 

CriA*  * 
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DETAILED  REPORT  • OTHER  COSTS 


DETAILED  REPORT  - ADP  SERVICES 


ITIM  CURRENT  TV  2 — — FYS  TV  10 

TELEPHONE  INSTALLATION  0.2  0.4  — — - 1.0  01 

FURNITURE  COSTS  07  12.1 -321  0.0 

OIK  RACKS  70  70— —JA  09 

LANDSAT  OATA  SAJ  CHO  OO  00  -KSlHI  0.0 


MAINTENANCE 

RENTALS 

FILM  IMAGE  PROCESSOR 
KEVPUNCH/VCRIFIER 
MRS  PROC  EQUIP 
7SIUOOET  A04. 

TOTAL  SERVICES 

PRESENT  VALUE  rgjTR 


CURRENT 

SS.S 

FV  2 — 

1460 

— -»FV® 
1146.4 

FV  10 
11701 

oe 

OO 

00 

00 

OO 

s.s 

6.1 

JJA 

*0 

so 

00  - 

"5rj 

■ — U!o 

"TlTOB 

MSB 

1200.2 

BIOS 
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Appendix  B 

Examples  of  Detailed  Data  Summary  Reports 


i 


i 

i 

c 
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Appendix  C 

Source  Code  for  Mein  Summary  Program 


CST5IM  TS  THC  MAIN  CALLING  PROGRAM  * " 

foh  iNt  com1  susieh 

boh  huhsi 


* 

V 

* 


DIMENSION  hwC08T(10),lNUTt(7«»),lttl,(*),CVCUNA(iU)#MtC0N»(i0), 
oicus  r ( i o ) , AGcra  r c iu  ) , rdcost  u 0 j , sscust  ( 1 0 r,  akcosttio)  , — 

TOTOP( 10)  ,‘1'iPtf  AL(  10)  , l'lt*VAL(  10)  ,12PV Ab( 10) ,GIC06T( 10) , 

' AOSDFTI  10),  I SOFT! 1 0 1 , KNC0ST ( 1 0 J ,~3«COSTITO ? VDBCPSTXIU) , — 

Nt»U0  (10),  I0CUST  ( 1 0 ) , T 1C0ST  ( 1 0 ) , r ACOST  (AO),  PHCuSf  ( 1 0 ) , A OA  [t6  ( 9 ) 

ICOOR(6),THLRUO),lBLK(10),TsLOCrO),'I?RTCR(6)7I»)OMm  

1MTEGEM  COMCO 

REAL  A NCOS!  - * 

DATA  ICOOH/'CO’ , 'UK' , 'lb’ , ' , ' MO* , ' Ob' / 


DATA  TDuii7  rLu  AT* , • 


, -nu- , 

rt', •OD'V'fc' 


rj — 


CALL  ABBA ON  (S,'LP *') 

CALL  ASS  A UN  (H,»n^)  ~ 

CALL  ASSiGN  (1,'LOC.CD') 

CALL  A5SAGN‘*(9,*PEKS.uUt')  ' ~ " 

CALL  ASSIGN  110, 'ACCT.CO' ) 

read  (i,iOOi)"LocT*,coKCcr,"rrtOft  

1001  FOMMAl  (ill,  /BA1) 

READ  (1,1002)  IbLh  - 

1002  FOKMAT  (10F4.0) 

READ  Cl,  1002)  IBLF  ‘ 

AbAD  (1,1002)  lobU 

CALCOATE  TIDATES)  ~ 

IF  (COMCb  .Nb.  1)  GU  10  2 

DU  J J>1,6  - - 


APHTCN(J)  s ICOUK(J) 

3 TONTINUE 

IF  (CUMCO  .Nb.  2)  GO  10  4 
' 00*  4 R * 1,6 

APHfCM(K)  s 1U0M(A) 

4 CONTINUE 

IF  (LUCT  .Nt.  V)  AFMCNl  s tOCI 
GO  TO  (10,20,30,40,60,60), LOCT 
IF  (LOCI  .to.  0)  GO  10  9 
WHITE  (0,2001)  LOCI 

2001  KJhMAT  ('  ftNUNG  LOCAIAOn  COOt  *,11) 
Sl’OP 


S IPrtCNT  * if KCN'i  ♦ 1 

GO  TO  (10, 20, 20,40,VVV),lPnCNl 
20  CALL  ASSIGN  (2, 'KC.lttL' ) 

WHITE  (S,V01)  AFRTCm 
/01  FUFMAi’  ('1', 'KANSAS  CI1|  ’,0*2) 
IbON  s 2 
GO  TO  70 

io  Call  assign  (7, 'Ficul. loL' ) 

WHITE  (9,702)  APKlCn 


UVTUnDl  1 

OHUUNAL 


U'.U  ' lY  "l' 
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702  FJHAAT  ('I'f'riNkT  CULLlwS  *,0A2) 
iLUN  * / 

GO  TO  70 

40  CALL  ASSIGN  14,  ' NASh.ibti1  ) 
tortIT£  (5,703)  IPRlCrt 
*04  FoKNAl  ( * 1' , *wA&«*  U.C.  *,bA 2) 

Slum  « « 

00  TO  70 

30  CALL  ASSlow  (4, 'ritriO.idL' ) 
tortlib  (5,704)  IPHTCh 
/ 04  FJKMAi  (*l,,,NLto  OriLbANS  *,SA2) 

iLUN  * 4 

00  TO  70 

SO  CALL  ASSIGN  14,* OfMKK i • T#L  * ) 

IbUM  ■ 2 
00  TO  70 

00  CALL  ASSIGN  (2, '0lHtK2.Tbb* ) 

iLUN  * 2 

70  KL.AO  (ILUN , 1004)  TbL 
1004  FUKrtAt  (4F5.*,2F4.1,F5.4,*4.1,F 7.5,F4.1) 
tori 1Tb  (5,2010)  AOAlfcS 

Mrillb  (5,2004)  lNOi'b  

2004  FOKriAT  ('0',7bA1) 

CALL  MtoO  (MNC0AT,l«toPV,lHI»C5i,i5UFt,MriC0*i) 

Caul  S*  i«  ( StoCObX , ISFPV , iSFCS T ,ntoCOi> ( ) 

CALL  CQNVbh  (CvCUSi , TcVPV , f C VLSI ) 

CALL  UBASfei  (OdCUST,TObPV , iDbCSl) 

CALL  KbriSOn  (KhCuST, lPKPV,iKHC5i ,NFUS) 

CALL  HbbO  (HbCOST , 'i'HbPV  , TKbCSl , NPUS , 1 bL* ) . 

CALL  UXHbri  (OfCuSI  , T01KV  , f UTCS'i  ,NPUS , Tbtirt , ioLU) 

Call  ADPsbH  (AuCJST  ,1aOPV',  iaUCsi  , Auaot  ir,  huCusT ) 

CALL  FAC  (t  aCuST,T FAP  V , i'FACST , IbuFT , AuSwf  1,1  dL , NKU6 , ibLfr  ) 
CALL  KANO  ( HOCUS f , TriOVV , I'KOCSi' ) 

CALL  SUPSbK  (SSCOST ,TS6PV , TSSCbl ,ibb,C*JNCu) 

00  40  J*1 , 10 

XUTOHCU)  * AOCOST(J)  ♦ FACOSTIU)  ♦ riuCOj>T(J) 

* ♦ cscSsicj)  ♦' VriTusicj) 

40  CONTINUb 

CALL  ADMIN  ( AMCUST , TAMPV  , 1 AMCST , NPUS , lUTot'  ,HritOdl , TbLF  ) 

00  00  L«l, 10 

ttCOSilb)  = htoCusl(b)  * StoCuSKL)  ♦ CYCUbiU)  ♦ 

» DttCOST  Cl)  ♦ ribCUdi'(b)  ♦ UlCuSi  (L) 

lUCObfCLT  z TriCOSI  (L)  * AuCusiCu)  + FaCoSKD"  V" 

* KULuST(L)  ♦ AMCUST(L)  4 SSCjsKl) 

olCuST(u)  * HCuSi(L)  ♦ luCuSiu.) 

00  CUNTlNUt 

CALL  PribVAL  (liCuSi  ,1  1PV Al,G1PVmL,  flCM  ) 

CALL  PKbV aL  (fUCJSf ,T2KVAL,G2FtfAL,iOCSi ) 

CALL  PHbVAL  (GXCl)Si,ITPvAL  ,G4PVaL,GTCST) 

4)J  100  J*5 ,10 

TsCST  * ISCSI  ♦ TOCUSl ( J } 

100  CONIlNOb 

TsCST  « iSCSI  / o.O 
WrilTb  (5,2002) 


1117 


2002 


* 


2010 

2003 

* 


2005 


% 


F OH MAT  ('!'  ,J6X,  'PROJECTED  RbaOuNCtS  FLw  Trtc.  UKLRAi  TIMAL ' # / / 
'O', f 40 , ' PRODUCT ION , ARtA,  UfcLD  fcSTlMAllUrt  aiSTfcrt • , / / J 
MrilTE  (5,2tfl0)  iDAtfcS 
FOKMAT  CO*, 'HUN  UATL  *,5A2) 

WRITE  ($,2002) 

rOHMAT  ( ' 0 ' , 4X, ' iNVfcalMtrrl  CU6fa ' , I JO , ' CUKRiM * , AX , ' F i 2',tA 
•FT  3'~,4X,'VY  ASAX,'*!*  l*rAX;*fT  6',%X,'rY  6*, -.A, 

•f  t »',JX,'FX  10*  , IX ' TOTAL  ' , 3A, 'P*  V.  * ) 

' WRITE  (»,20O»)  Htf‘C06T,TftNC*l , f HWPV ,»WCUSl  , ISFCai ,T6r  P V , 
CVCU8T,TCVCST,1CVPV, ONCOST, 10ttCST,iUbPv,Kr.CUai,lKe.Ci>T,lne.KV, 
OTCOftt , T0TC6T  ,‘T6TP  V,flCuSf,ilCa  * , Trl  P V a l , v,  l p L 
FORMAT  (*0*  ,OX,'HARONAHfe'  , 13  / , 12F6. 1 , / , 


“r<T',6X, 


a jFTiXWt '%  m,‘l  2Fi  .T777 


* *0* ,6X, 
V '0V,6X, 

* ' 0 ' ,6X, 

• r0‘,6X, 

♦ ' 0 ' ,6X, 


CON  tf tRSION * , T J7 , 1 2F« . 1 , / , 

OAT  A*  6 ASE ' , 1 77 1 2F  « . 1 , / , 

KbLOCAl ION  tAPfcNStS'  ,Ti/,12F6.1,/ 
01HfcH',iJ7,i*V6.i,/, 

TOTAL* ,137, 11F6.1,/, 


■*“'o‘,Va,' 


(T  V ' i OFO.I  , 8X  ,M» .!,//) 


WRITE  (5,2006) 

20$6  FuRMAYT'  U^AA,  ' JFfcRA  l iUrt»L  CUSl  a ' ) 


WRITE  (5,200/)  PRC06l,TPKPVfA«CU51, TAMPv, 
• " AoCOSI,  TaDPV,FACGST,1FAPV,R0CU&T  ,TKUPV, 


* S5C0ST,I8SPV,TuCUSI,T5Caf , T2PVAU,G2PVml 

206?  F OHM  AT  TH)1  , 6A , ' PERSON  wtL ' ,7  J?  , 1UF*I  . 1 , 6A  ,Vb  . 1 , / , 

» ' 0 ' , 6X , ' ADMiNlSTRATi Yb ' ,TJ7 , 10FV. 1 ,6A,fr  0. 1 , / , 

*0  * ,6X',  rADP  SERVICES*  ,T37 , 10F8.1, OX , F8 . 1, /,  - - 

6 'O' S6X, 'FACILITIES' ,TJ7 , 10) 6. 1 , 0X ,Fb . 1 , / , 

^~H)  'VEX  ^RESEARCH* t DEVELOPREHT  *“,737, 1T)F3 ; 1/BX  ,TS71  ,7 , 

• 'O' ,6X, 'SUPPORT  SfcRVICbS' , T3  ) , &0F«. i , HA ,F0. 1 , / , 

- tt  " *T) ' 76X,  • TOTAL ■ , JIT,  1 1F8 . 1 , / , - - ■*  - 

6 'D',9X,'P  V', 137, 10*6. 1,6X,F6.1,//) 

WRITE  (5,2008)  GTC0ST,ITP7AL  

2006  FORMAT  ( ' 0 ' ,4X, 'GRAND  TOTAL ' ,'T  J7 , ioF6. T , / , 

f 'OTAX,  'TOTAIT  FTT,'*  ,T3T, 1076. 1 ) 

GO  TO  (110,120, 130, 140), IPnCnT 

" ITO  WRITE  15,11 107 71 PR TCM"  ..  . 

1110  FORMAT  (//,'OHJRl  COLLINS  ',0A2) 

• - - Co  TO  200 


120  WRITE  (5,1120) ,1PHTCM 
112<T  " FORMAT  (//,»0RAnSAS  CITY  • ,6A2) 
GO  TO  210 

"130 — TiTR  ITE“T5*,7r  3 OTTrPFfTTM  * ‘ 

1130  FORMAT  (//,'ONEW  ORLEANS  ' ,bA2) 

— canro  720  

140  WR1TF  (5,1140), 1PKTCM 

TIAIT  rORBA'T  (//',  rOTwXSfl  IT.C.',6A2) 

GO  TO  230 

7rro cell  assign  cr,  tttcol7oui  'i 

WRITb  (6,2020)  IPHTCH 
7020*  FORMAT  C FORT  COLLI SS  \6A2) 

GO  TO  500 


710’  CALL  ASSIGN  (8,'KC.OUT*) 


Ri;PIi(M)U  !HI!  Ji  v np  j-jj,. 
ORIGINAL  PAGK  lb 
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WRITE  (8, 2030)  IPRTC* 

2030  FORMAT KANSAS  CITY  »,6A 2)  

GU  TO  500 

22CT  CAL£T  ASSIGN  (8, ‘NEWQ.GUi * ) ' ' 

WRITE  (8,2040)  IPrtfCM 

2040  FORMAT  CT N'tw  ORLEANS  SoAi)  - . - 

GU  TO  500 

230  CALL  ASSIGN  (8,  ’Wi>C.OUT* ) 

WRITE  (8,2050) 

2050  FORNAX  («  Wa5HINGT0»  DC*  ,6A2> 

500  WRITE  (8,2050)  nffCGSl  ,THUCST,  i'HttPv  , SmCOSI ViSFCst , IS*  Rtf , 

* C vCOSI , TCVCST ,TCVPV  »DbCuS'f  , 1D8C8T , fD&FV  , kECGST  ,Ti«£C5l , 

* TKEPV  , UTCUSX , lores  1 , IUTR V , 11 COST  , TXCSX , T i ir  V AL , o IF  V AD , 

* PftCOST , TPkP V , ANCOST , T'ARPV , ALCoSt , XADR  V , f aCGsT,  IFAPV  ,KuC0ST 

* TRORV  , SSCOS f , TSSP  V , TOCOS  T , i 5Csl , T 2F  tf  AD , *2F VmL , CTCOS i 

20bu  Format  (bU2tb.i/),  iotiu  a.  i/) ,2(ms,i/)) 

call  close  <») 

GO  TO 

898  WRITE  (5,2004)  I.10TL 

IF  (LOCI  ,N£.  5)'  GO  fu  999  

rewind  i 

READ  (1,1001)  LuCT,COMCO,lWOiE 
OU  150  K«l,10 

TSUFl(R)  * 0.0 


MNCOST(R) 

s 

0.0 

hlPUS(fc)  * 

0 

.. 

AUSOFi(R) 

z 

o.o 

PRCUSX(R) 

S' 

0,0 

rtwCOSHM 

•m 

o.o 

SWCOSTCK) 

«*► 

0.0 

CVCUSlU) 

s 

0.0 

uBCGSKr) 

«w 

0.0 

RtCOSl (R  ) 

Ml, 

0.0 

uTCUST(R) 

S| 

0.0 

AUCOSKR) 

•m 

0.0 

FaCGSUr) 

s 

0,0 

KOCUSUN) 

0.0 

l'iCUSl(R) 

s 

v .0 

TUCOST(R) 

MM 

Mft 

0.0 

GlCuST(R) 

s 

0.0 

AMCOSKK  ) 

s 

0.0 

SSCUST(R) 

' s 

U.O 

TOTOP(R)  * 1 

o.o 

liPVAL(K)  s 0.0 
ITFVAL(K)  * 0.0 
T2PVHL(K)  * 0.0 


150  • CUNTINOt 

TsC5T»o;<r 

THUCSTeO.O 

rffspr«T.ir 

TSFCST  90.0 
tSFPV  90.0 
TCVCST  sO.O 
TCVPV  90.0  “ 
TUBCSX  30.0 
toOT  uT.1T 
IPKC5T  30.0 
TPRW  =0.0 
TrttCST  =0.0 
TkEPV  30.0'- 
TOTCSr  =0.0 

wTprw. nr 

TADCST  sO.O 

n\>pv~=V7o 

TFACST  =0.0 
TfApV  =0.“o' 
TKOCSf  =0.0 
TffDPV~iT  0'.D 

TANCST  30.0 

TAMPV  sO.O 
TiCST  =0.0 
CVPvAL  =0.0 
T0CS1  =0.0 
G2PVAL*  =0.0' 
GU  TO  5 
yyy  stop 
END 
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The  Application  Tost  System:  Experiences  to  Date 

and  Future  Plane 

G.  A.  May,*  P.  Ashbum ,*  andH.  L Hansen* 


INTRODUCTION 

The  Application  Test  System  (ATS)  was  designed 
to  test  and  evaluate  the  latest  technology  in  acquisi- 
tion, storage,  retrieval,  analysis,  and  application  of 
remotely  sensed  data  for  application  feasibility  by 
the  U S.  Department  of  Agriculture  (USD A).  The 
purpose  of  this  paper  is  to  describe  the  ATS  analysis 
component  focusing  on  methods  by  which  the  varied 
date  sources  are  used  by  the  ATS  analyst  An  integral 
part  of  the  ATS  is  the  team  of  USDA  multidiscipli- 
nary analysts  who  analyze  and  interpret  varied  data 
sources  including  remotely  sensed  data.  The  ATS 
analysts  have  agricultural  backgrounds  with  educa- 
tion and  experience  in  a wide  spectrum  of  dis- 
ciplines. 

Material  will  be  presented  in  two  parts.  Analyst 
training  and  initial  processing  of  data  within  the  ATS 
will  be  discussed  first  in  the  section  entitled  “Ex- 
periences to  Date.”  The  second  section,  entitled 
“Future  Plans,"  will  discuss  short-  and  long-term 
plans  for  the  ATS. 


EXPERIENCES  TO  DATE 


LACIE  Phase  III  Activities 

During  Phase  III  (1977  crop  year),  USDA  analysts 
operated  and  evaluated  an  interactive  computer- 
linked  classification  system  developed  by  the 
LACIE.  The  system  was  evaluated  in  terms  of 
classification  accuracy  and  segment  throughput  effi- 
ciency. USDA  analysts  gained  experience  in  analyz- 
ing Landsat  multispectral  scanner  (MSS)  data  on  an 
interactive  image-processing  system.  Their  image- 
processing  experience  played  a large  role  in  the 
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design  and  implementation  of  the  ATS  interactive 
image-processing  system.  Many  of  the  analyst- 
detected  inefficiencies  in  the  LACIE  system  were 
considered  and  corrected  in  the  design  of  the  ATS 
processing  system. 

The  two  main  components  of  the  LACIE  image- 
processing  system  were  the  General  Electric 
IMAGE-100  (MOO)  and  the  Earth  Resources  In- 
teractive Processing  System  (ERIPS).  Hie  main  pro- 
cessing procedure  used  to  analyze  data  on  this 
system  was  Procedure  1 (P-1).  P-1  was  developed  to 
provide  estimates  of  the  percentage  of  a segment 
devoted  to  wheat  production.  Many  of  the  problems 
encountered  in  segment  classification  during  LACIE 
Phase  I (1975)  and  Phase  II  (1976)  were  overcome 
by  implementing  P-1.  Details  of  this  procedure  are 
discussed  in  reference  1.  The  analyst  used  the  1-100 
for  displaying  images  and  classification  maps,  select- 
ing and  labeling  training  fields,  and  evaluating  and 
reworking  the  classification  results.  All  clustering 
and  classification  were  completed  on  ERIPS. 

Many  problems  evolved  because  of  the  configura- 
tion of  the  I-100/ERIPS  system.  Interfacing  prob- 
lems created  a time  delay  between  initial  processing 
and  the  receipt  of  results.  It  was  hoped  that  the  time 
lag  would  be  a day  or  two,  but  experience  indicated 
an  average  time  lapse  of  a week.  Because  of  this, 
analysts  had  to  analyze  and  track  up  to  nine  seg- 
ments at  a time,  greatly  decreasing  analyst  efficiency. 
Inherent  interfacing  problems  within  the  1-100/ 
ERIPS  system  have  been  eliminated  by  the  ATS  due 
to  its  dependence  on  a single  image-processing  com- 
puter. It  must  be  noted  that  the  I-100/ERIPS  system 
was  never  meant  to  be  an  example  of  an  opera- 
tionally optimal  interactive  image-processing  system  V(, 
but  rather  was  purposely  pieced  together  to  deter- 
mine  whether  interactive  image  processing  could  im- 
prove segment  classification  results. 

The  problems  and  inefficiencies  found  in  the 
I-100/ERIPS  system  could  be  divided  into  five  major 
categories:  (1)  the  wrong  capabilities  were  stressed. 
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(2)  methods  were  needed  to  ease  the  man-machine 
interface,  (3)  unnecessary  data  were  provided  to  the 
analysts,  (4)  additional  capabilities  were  needed,  and 
(5)  design  performance  needed  improvement 

The  USDA  analyst  team  analyzed  on  the 
MOO/ERIPS  system  selected  segments  from  the 
United  States,  Canada,  and  the  U.S.S.R.  The  varied 
wheat  conditions  throughout  the  three-country  study 
area  enabled  analysts  to  become  familiar  with  vary* 
ing  cultural  practices,  weather  conditions,  farming 
methods,  and  how  these  variable  conditions  affect 
wheat  growth  and  spectral  response. 

It  soon  became  apparent  that  a single  analyst- 
processing procedure  was  not  optimal  for  classifying 
Landsat  data.  P-1  worked  fairly  well  when  used  in 
areas  having  small,  randomly  distributed  fields  and 
heterogeneous  signatures,  but  it  was  inefficient  in 
agriculture  areas  having  relatively  large  fields  and 
homogeneous  spectral  signatures.  USDA  analysts 
recommended  that  optional  processing  procedures 
should  be  developed  for  varying  agricultural  condi- 
tions within  a segment  in  order  to  make  optimum 
use  of  analyst  time. 

An  outcome  of  the  classification  procedures  prob- 
lem was  the  application  of  the  direct  crop  option, 
which  is  currently  being  implemented  by  the  ATS. 
This  procedure  gives  the  analyst  the  capability  of 
outlining  a desired  field  and  obtaining  the  area  with- 
in that  field  directly.  Therefore,  an  areal  estimate  of  a 
specific  crop  within  a segment  can  be  obtained 
quickly  and  does  not  require  intermediate  clustering 
and/or  classification  algorithms  as  is  the  case  with 
the  analyst-selected  training  fields  and  the  P-1 
options. 

The  I-100/ER1PS  activity  allowed  the  USDA 
analysis  to  conduct  several  research  pilot  studies. 
On?  of  these  studies  focused  on  the  early-season  esti- 
mate problem.  An  early-season  spring  wheat  area 
estimate  was  made  on  a total  of  17  segments:  1 1 from 
Canada;  3 from  the  U.S.S.R.;  and  3 from  the  United 
States.  An  early-season  wheat  area  estimate  is 
defined  as  an  area  estimate  of  wheat  within  a scene 
that  is  obtained  prior  to  Landsat-detectable 
emergence  of  all  the  wheat  grown  within  that  scene. 
A majority  of  the  early-season  spring  wheat  esti- 
mates were  made  from  acquisitions  acquired  the  first 
week  of  May  1977.  A few  estimates  were  made  from 
acquisitions  acquired  during  the  fall  of  1976.  For  the 
17  segments  analyzed,  the  mean  difference  between 
the  early-season  estimates  and  the  best  at-harvest 
estimates  was  1.8  percent.  Additional  information  on 
this  study  is  reported  in  reference  2. 


Another  study  conducted  on  the  MOO/ERIPS  was 
the  use  of  the  vegetative  index  for  crop  identifica- 
tion. Vegetation  indexes  are  computed  from  the  raw 
multispcctral  scanner  digital  data  and  are  used  to 
determine  vegetation  density,  greenness,  and 
physiological  condition  within  a given  area.  Con- 
siderable research  on  the  vegetation  index  conducted 
by  LACIE  (ref.  3)  and  other  government  agencies 
(ref.  4)  revealed  that  it  can  be  successfully  used  to 
detect  drought  and  monitor  plant  and  soil  moisture 
conditions;  however,  few  studies  have  examined  the 
use  of  the  vegetative  index  approach  for  crop  iden- 
tification. One  objective  of  the  study  on  the 
MOO/ERIPS  was  to  investigate  the  usefulness  of  the 
vegetative  index  for  schemes  in  crop  identification 
and  acreage  estimation.  The  results  of  this  limited 
study  show  that  the  vegetation  index  can  be  used 
cost  effectively  to  identify  crops  and  natural  vegeta- 
tion (ref.  S). 

The  experience  gained  from  MOO/ERIPS  proved 
to  be  invaluable  to  the  USDA.  The  analysts  received 
training  in  operating  an  interactive  system  and  were 
given  the  opportunity  to  process  and  analyze 
remotely  sensed  data.  The  immediate  payoff  has 
been  in  the  design  and  implementation  of  the  USDA 
ATS. 

AT8  Processing  Activities 

The  USDA  analysts  conducted  the  first  opera- 
tional tests  of  the  ATS  interactive  image-prjcessing 
system  in  December  1977.  Originally,  72  U.S.S.R. 
segments  were  selected  for  this  test,  but  only  38  seg- 
ments were  actually  processed.  (Various  circum- 
stances, including  cloud  cover  restrictions,  prevented 
the  analysis  of  the  remaining  34  segments.)  The  data 
were  acquired  between  seedbed  preparation  and 
wheat  emergence. 

The  designated  crop  option  discussed  earlier  was 
used  for  analyzing  these  segments.  The  wheat  was 
only  partially  emerged  on  the  imagery  and,  therefore, 
it  would  have  been  difficult  to  obtain  meaningful 
estimates  using  the  conventional  clustering  and 
classification  procedures.  The  designated  crop  pro- 
cedure enabled  the  analyst  to  obtain  an  area  estimate 
of  ail  the  wheat  fields  within  the  segment,  even 
though  the  spectral  signatures  within  the  fields  were 
inconsistent  due  to  the  partial  emergence  of  wheat. 
The  analyst  relied  on  his  interpretation  of  the 
spectral  signatures  within  the  sample  segment  to 
determine  the  percentage  of  the  segment  planted  to 
wheat. 
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Analyst  procedures  have  been  documented  and 
are  currently  being  used  in  the  training  or  additional 
image  analysts.  Each  analyst  has  used  these  pro- 
cedures as  part  of  a self-training  course  on  the  ATS 
equipment.  Any  new  analysts  to  come  aboard  will  be 
required  to  participate  in  a structured  training  course 
consisting  of  all  aspects  of  remote  sensing  with 
emphasis  on  processing  and  analysis  of  the  data. 


FUTURE PLANS 

Future  plans  as  well  as  current  programs  for  the 
ATS  are  to  develop  the  ATS  Crop  Condition  Assess- 
ment (CCA)  Group  that  will  measure  the  impact  of 
abnormal  conditions  (e.g.,  excessive  moisture, 
drought,  winterkill)  affecting  crop  production. 
USDA  analysts  will  assess  the  impact  of  events  by 
using  remotely  sensed  data  and  conventional  data 
sources  now  used  by  USDA  foreign  commodity  ex- 
perts. The  CCA  Group  will  focus  its  efforts  on 
assessing  the  impact  of  events  in  countries  where 
crop  shortages  and  surpluses  have  a major  impact  on 
world  commodity  markets  and  prices.  Important 
world  crops  such  as  wheat,  barley,  rye,  corn,  soy- 
beans, sunflowers,  rice,  cotton,  peanuts,  and 
sorghum  will  be  included  in  the  crop  condition 
assessment  program.  During  crop  years  1978  and 
1979,  ATS  personnel  will  be  developing  the  CCA 
Group  to  assess  the  condition  of  wheat  in  important 
foreign  producing  countries. 

Currently,  USDA  plans  are  to  have  the  CCA 
Group  assess  the  impact  of  events  detected  and  re- 
ported to  the  CCA  Group  by  the  Joint  Agricultural 
Weather  Facility  (JAWF).  The  JAWF  is  composed 
of  personnel  from  the  National  Oceanic  and  At- 
mospheric Administration  (NOAA)  and  the  USDA. 
The  JAWF  will  monitor  and  detect  abnormal  events 
using  meteorological  and  ancillary  data  sources. 
Unusual  events  detected  by  JAWF  will  be  reported 
to  the  CCA  Group  in  a timely  manner  to  hasten  the 
reporting  of  impact  assessments  to  key  USDA  com- 
modity experts  and  decisionmakers. 

The  CCA  Group  is  composed  of  two  important 
components,  the  data  base  and  analysis/reporting. 
The  remainder  of  this  paper  will  discuss  the  format 
and  operations  of  these  components. 

Data  Baaa  Component 

Both  historical  and  current  multispectral  imagery, 
meteorological,  and  agricultural  data  are  required  to 


support  the  CCA  Group.  An  efficient  and  fast 
system  for  storage,  retrieval,  and  analysis  of  the  data 
is  crucial  for  such  a large-scale  project.  The  ATS  ap- 
proach to  this  data  handling  and  analysis  problem 
was  to  develop  an  automated,  geographically 
oriented,  gridded  data  base.  The  data  base  is  ex- 
panded as  more  countries  and  crops  are  added  to  the 
CCA  unit. 

The  entire  agricultural  and  potential  agricultural 
universe  is  divided  into  grid  cells.  Each  grid  cell  has  a 
unique  latitude  and  longitude  address  and  therefore 
can  be  singularly  addressed  by  an  “I”  and  “J”  iden- 
tification. Each  cell  is  2S  by  2S  nautical  miles  and  can 
be  further  subdivided  into  quadrants.  The  following 
is  a brief  list  of  the  data  stored  within  each  cell. 

1.  Country,  region,  zone,  and  strata  locations 

2.  Five-  by  six-nautical-mile  sample  segment 
locations  and  associated  data 

3.  Crop  types 

4.  Percent  agriculture 

5.  Current  and  historical  daily  meteorological 
data,  including  maximum/minimum  temperature, 
precipitation,  snow  cover,  and  wind  velocity 

6.  Soil  data  (quadrant  level),  including  surface 
texture,  depth,  slope,  drainage,  available  water- 
holding capacity,  and  moisture 

7.  Yield  models 

8.  Crop  calendars 

9.  Historical  agricultural  statistics  including  area, 
yield,  and  production 

10.  Agronomic  data,  including  irrigation  type  and 
percentage,  fertilization  method  and  percentage, 
tillage  practices,  and  cultivation  practices 

The  analyst  may  access  these  data  interactively 
while  working  at  the  analyst  station.  The  data  will  be 
presented  as  maps  and/or  tables  on  both  the  cathode- 
ray  tube  (CRT)  screens  and  the  line  printer.  The 
maps  will  be  displayed  at  different  scales  according 
to  the  geographical  size  of  the  area  being  displayed. 
An  example  of  a data  information  product  is  a map 
showing  the  irrigation  distribution  and  density  for  a 
designated  area  specified  by  the  analyst.  A support- 
ive taole  will  appear  with  the  map  specifying  the 
types  of  irrigation  within  the  designated  area. 

One  of  the  major  tasks  of  the  ATS  is  to  construct  a 
data  base  for  each  country  that  has  at  least  one  of  the 
major  commodity  crops  listed  previously.  During 
crop  year  1978,  data  bases  are  being  constructed  for 
Montana,  North  Dakota,  and  a selected  area  in  the 
U.S.S.R. 
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Analysis  Component 

The  CCA  analysis  component  will  utilize  vegeta- 
tive index  numbers  to  measure  health  and  vigor  or 
the  crop  or  crops  of  interest.  The  vegetative  index 
number  is  a transformation  of  the  MSS  data  into 
various  descriptive  components.  One  component 
measures  greenness  and  is  commonly  known  as  a 
“green  number.”  These  numbers  measure  die  ap- 
proximate amount  of  green  biomass  in  the  scene  and 
the  relative  vigor  or  health  of  that  green  biomass. 

Currently,  the  ATS  is  testing  under  varying 
agricultural  conditiors  six  different  green  numbers. 
Upon  completion  of  this  testing  exercise,  the  ATS 
will  implement  a green  number(s)  that  best  detects 
crop  vigor  or  condition.  The  six  green  numbers  in- 
clude the  Ashburn  Vegetative  Index  (AVI),  the 
Kauth-Thomas  Vegetative  Index  (KVI),  the  Perpen- 
dicular Vegetative  Index  (PVI),  the  Transformed 
Vegetative  Index  (TVI),  the  Leaf  Area  Index  (LAI), 
and  the  Difference  Vegetation  Index  (DVI).  Existing 
literature  on  these  vegetative  index  numbers  (refs.  3 
to  S)  has  been  reviewed  and  considered  by  the  ATS 
evaluators. 

The  green  numbers  will  be  used  in  combination 
with  meteorological  data  to  assess  crop  condition. 
Lookup  tables  showing  the  relationship  between 
green  numbers  and  (1)  soil  moisture,  (2)  crop  calen- 
dar, and  (3)  yield  will  be  developed  for  specific 
geographic  areas.  These  tables  will  aid  the  analyst  in 
his  assessment  of  crop  condition.  The  method  by 
which  the  green  numbers  will  be  used  for  crop  condi- 
tion assessment  follows. 

The  first  step  in  using  green  numbers  will  be  to 
view  the  current  Landsat  images  of  selected  sample 
segments  for  purposes  of  creating  a map  or  image  of 
the  natural  vegetation  area  (NV)  and  areas  contain- 
ing the  desired  crop  (DC).  The  AVI  will  be  used, 
where  possible,  to  automatically  create  the  NV  map. 
Average  green  numbers  will  be  calculated  and  stored, 
from  each  of  the  vegetative  index  algorithms,  for  the 
entire  sample  segment,  for  the  NV  map,  and  for  the 
DC  map.  Green-number  isoline  maps  will  be  plotted 
and  interpreted  for  crop  condition. 

Historical  Landsat  imagery  will  be  acquired  for 
the  same  segments  discussed  in  the  previous 
paragraph.  Green  numbers  will  be  calculated  and 
stored  for  this  historical  imagery,  following  the  pro- 
cedure described  for  current-year  imagery.  The 
analyst  will  compare  and  evaluate  the  green  numbers 
derived  from  the  historical  and  current  imagery,  the 
NV  map,  and  the  DC  map  to  determine  the  current- 


year  crop  condition.  The  crop  calendar,  soil 
moisture,  current  and  historical  meteorological  data, 
and  yields  derived  for  these  segments  will  be  in- 
cluded in  the  analysis.  The  primary  goal  of  this  is  to 
determine  and  assess  the  amount  of  change  in  the 
crop.  This  assessment  will  address  a change  in 
quality,  areal  extent,  yield,  and  production.  A report 
will  then  be  generated  documenting  this  assessment. 

During  1978,  ATS  analysts  will  perform  the  steps 
just  discussed  for  purposes  of  assessing  the  condition 
of  the  wheat  crop  in  Montana,  North  Dakota,  and 
one  selected  area  in  the  U.S.S.R. 

Yield  models  will  be  required  to  support  the  CCA 
Group.  For  1978  and  1979,  two  principal  wheat  yield 
models  are  of  interest  to  the  ATS.  They  are  the 
LACIE-tested  CCEA  model,  developed  by  the 
Center  for  Climatic  and  Environmental  Assessment 
(CCEA)  of  NOAA  (refs.  6 and  7),  and  the  Kansas 
State  University  (KSU)  model  (refs.  8 and  9).  These 
two  models  will  be  implemented,  tested,  and  evalu- 
ated by  the  ATS.  Results  from  the  CCEA  model  are 
produced  at  30-day  intervals;  the  KSU  model  pre- 
dicts yields  at  10-day  intervals. 

The  ATS  will  implement,  evaluate,  and  apply 
other  crop  models  as  they  are  developed  and  docu- 
mented in  the  research  community. 

During  1978,  the  ATS  will  implement  and  operate 
a wheat  crop  calendar.  The  crop  calendar  subroutine 
of  the  KSU  spring  wheat  yield  model  will  be  the  pri- 
mary crop  development  model.  Model  results  at 
selected  weather  stations  are  interpolated  to  the  grid 
cell  units  of  the  data  base.  The  model  is  run  every  10 
days  with  daily  meteorological  data. 

The  ATS  will  implement,  evaluate,  and  apply 
other  crop  development  models  as  they  are 
developed  and  documented  in  the  research  com- 
munity. 

During  1978,  the  ATS  will  implement  and  run  the 
Versatile  Soil  Moisture  Budget  (VSMB)  model.  The 
results  are  used  in  the  KSU  yield  model.  The  VSMB 
subroutine  will  be  extracted  from  the  KSU  yield 
model  and  run  as  a separate  program. 

SUMMARY 

The  ATS  is  chartered  to  implement,  test,  and 
evaluate  technologies  and  capabilities  for  their  ap- 
plication feasibility  by  the  USDA.  The  analysis  and 
application  of  remotely  sensed  and  other  data  is  an 
important  component  of  the  ATS.  Therefore,  the 
remote-sensing  analyst  must  be  highly  qualified  and 
trained  in  order  to  support  this  component. 
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The  USDA  analysts  had  an  opportunity  to  gain 
experience  on  an  interactive  image-processing 
system  during  LACIE  Phase  III.  Landsat  data  in  5- 
b>  6-nautical-mile  format  from  the  United  States,  the 
U.S.S.R.,  and  Canada  were  analyzed  and  wheat  area 
estimates  determined.  The  varied  wheat  conditions 
allowed  the  analyst  to  study  different  agronomic  and 
cultural  practices.  These  differences  necessitated  that 
more  than  one  processing  procedure  be  developed  to 
handle  the  varied  agricultural  conditions.  To  partially 
solve  this  problem,  the  ATS  is  currently  implement- 
ing three  processing  options,  each  developed  for 
specific  agricultural  situations. 

While  working  on  the  interactive  system,  the 
USDA  analysts  developed  a list  of  recommendations 
and  changes  to  the  system.  This  list  of  items  was  con- 
sidered during  the  development  of  the  ATS  and  has 
resulted  in  a system  with  capabilities  and  enhance- 
ments that  are  a direct  outcome  of  the  USDA  analyst 
experience  gained  during  LACIE  Phase  III. 

Future  plans  for  the  ATS  call  for  the  development 
of  the  ATS  CCA  Group.  The  CCA  Group  will  detect 
in  a timely  manner  changes  affecting  production  and 
quality  of  commodities  and  will  assess  the  impact  of 
the  change.  The  ATS  is  tasked  to  develop  and  inte- 
grate the  elements  of  the  CCA  Group.  These  ele- 
ments are  the  central  data  base  and  the  analysis  com- 
ponent which  utilizes  Landsat  data  and  yield,  crop 
calendar,  and  soil  moisture  models. 

The  ATS  personnel  will  develop  the  data  base  and 
analysis  procedures  for  this  system.  The  yield,  crop 
calendar,  and  soil  moisture  models  will  be  trans- 
ferred from  LACIE  and  implemented  in  the  ATS. 

In  1979  and  the  early  I980’s,  the  CCA  Group  will 
be  expanded  to  include  additional  crops  and  crop- 
producing  regions  of  m^jor  importance  in  world 
trade.  The  ATS  will  coordinate  with  add  rely  on  the 
research  community  to  develop  the  technology 


needed  to  support  ATS  objectives.  Developed  tech- 
nology will  be  transferred  to  the  ATS  for  implemen- 
tation, testing,  and  evaluation  prior  to  its  incorpora- 
tion into  an  operational  early  warning  system. 
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